This repository has been archived by the owner on Jun 29, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 47
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@pothos if this hypothesis works, could you also update for other platforms? |
Restarting etcd or the kubelet container from the Docker daemon introduces a race because the "docker logs" command will terminate and then the systemd unit will also restart the container in parallel to the Docker daemon. This race causes problems like terminating the container again or failing to start it, but it also introduces additional issues because the ExecStartPre and ExecStopPost commands run while the container is running, which means the state can get corrupted. Even if the systemd unit would not restart in parallel, it is not wanted to restart the container from the Docker daemon because the systemd unit should first run the ExecStartPre/Post commands, e.g., for the "etcd-rejoin" script. Only restart the containers from the systemd unit, not from the Docker daemon.
Yes, looks the kubelet was also affected, updated it there, too. |
This should be mentioned in the upgrade guide. |
The etcd folder was modified while the etcd service was running which could lead to an earlier restart of the service because the service conditions for existing files are finally met and it may conflict with any commands in the ExecStopPost directive. First stop the etcd service unit before touching the files and only start the unit after the modification is done.
Still got a failure and tried one more improvement in another commit |
The shared etcd CLC systemd unit has additional conditions for startup which were missing in the aws and packet etcd systemd units. It seems that these conditions help to make the bootstap more robust. Port the startup conditions over to aws and packet to align the platforms (which also makes debugging more clear).
invidian
reviewed
Jun 22, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
surajssd
approved these changes
Jun 23, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Only restart containers from systemd
Restarting etcd or the kubelet container from the Docker daemon
introduces a race because the "docker logs" command will terminate and
then the systemd unit will also restart the container in parallel to
the Docker daemon. This race causes problems like terminating the
container again or failing to start it, but it also introduces
additional issues because the ExecStartPre and ExecStopPost commands
run while the container is running, which means the state can get
corrupted. Even if the systemd unit would not restart in parallel, it
is not wanted to restart the container from the Docker daemon because
the systemd unit should first run the ExecStartPre/Post commands, e.g.,
for the "etcd-rejoin" script.
Only restart the containers from the systemd unit, not from the Docker
daemon.
etcd: stop the service before modifying the etcd folder
The etcd folder was modified while the etcd service was running which
could lead to an earlier restart of the service because the service
conditions for existing files are finally met and it may conflict
with any commands in the ExecStopPost directive.
First stop the etcd service unit before touching the files and only
start the unit after the modification is done.
aws|packet: align etcd unit file with the shared controller module
The shared etcd CLC systemd unit has additional conditions for startup
which were missing in the aws and packet etcd systemd units. It seems
that these conditions help to make the bootstap more robust.
Port the startup conditions over to aws and packet to align the
platforms (which also makes debugging more clear).
How to use
Testing done
Looking at CI