-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openshift router pod (docker container) doesn't reap zombie processes #1242
Comments
@pweil- fyi. |
Sigh |
Hrm, well that's troublesome since we're checking for a pid file and sending finish signals in the restart if we found it:
|
Some thoughts:
cc @pmorie |
In general, the 3rd option is most sane. We have systemd container for this, also supervisor and a few others. None of them are ideal. The important thing is that haproxy must know it is being restarted. There may be a golang library already for acting as pid 1 - you should investigate that. |
HAProxy knows it is being restarted, the old pid is stopped and is defunct. I think the issue is the parent process (the router plugin) doesn't care and isn't gathering the exit code so it lingers until the parent process stops (which running under init would do if it inherited the child). But, I can do this without anything crazy or OS specific code if I have the restart script pass back the old pid and do something like this in a go routine:
if that seems reasonable. Preliminary testing seems to work fine. Full test code: master...pweil-:router-reaper I haven't run across any "do it for you" solutions. I did find a similar issue that led me to the SIGCHLD solution that was OS specific. I will look again since I was mainly searching for reaping with our process acting as pid 1. |
There's a golang process manager written by brad fitz. It's worth a look.
|
checking out dinit, and baseimage-docker. Will check that one out too. |
BaseImage docker is a terrible pattern
|
|
My preferred solution here would be a simple addition to our Go binaries which detect when they are PID 1 and enable a simple minimal behavior for reaping (Wait on child processes in a goroutine) and proper signal handling. Docker may already have this from the old dockerinit code - I'd be surprised if someone hasn't handled it. @mrunalp you aware of anything like that? |
@smarterclayton delved into that last week in lib container. Should be possible to put something together. I will take a closer look at the router tomorrow and see what we can do. |
@smarterclayton @mrunalp - that doesn't seem too difficult. I will wait to see if Mrunal knows of something already done. I looked at runsit, it seems to do the trick but makes setting up the image a little less obvious. Still solvable but if we'd prefer a simpler solution (I would) then I won't spend more time diving in. @mrunalp - let me know if there is anything I can do to help |
@pweil- I am going to prototype what @smarterclayton suggested; basically make the process act as a simple init that reaps zombies. If it works reliably, then we can use that else look for a real init. |
@mrunalp - this may not be the ideal implementation since it doesn't catch /any/ child process but it seemed to work fine for me as a general solution. master...pweil-:router-reaper . It was my prototype for option 2 in the comment above |
Fixt |
…service-catalog/' changes from ae6b643caf..50e234de83 50e234de83 origin build: add origin tooling 092d7f8 Fix typos and resource names in walkthrough e2e logs (openshift#1237) d25bd11 Archive the old agenda doc, link to new one (openshift#1243) 6192d14 fix lint errors (openshift#1242) d103dad Fix lint errors and regenerate openapi (openshift#1238) e9328d3 Broker Relist (openshift#1183) b0f3222 Correct the reasons and messages set on the ready condition during async polling (openshift#1235) d2bb82f Re-enable the href checker (openshift#1232) 2c29654 Use feature gates in controller-manager (openshift#1231) 699eab9 switch build to go1.9 (openshift#1155) 7529ed8 broker resource secret authorization checking (openshift#1186) 50d9bdf v0.0.20 chart updates (openshift#1228) REVERT: ae6b643caf Use oc adm instead of oadm which might not exist in various installations. REVERT: 66a4eb2a2c Update instructions... will remove once documented elsewhere REVERT: 1b704d1530 replace build context setup with init containers REVERT: ee4df18c7f hack/lib: dedup os::util::host_platform and os::build::host_platform REVERT: 1cd6dfa998 origin: Switch out owners to Red Hatters REVERT: 664f4d318f Add instructions for syncing repos REVERT: 2f2cdd546b origin-build: delete files with colon in them REVERT: cdf8b12848 origin-build: don't build user-broker REVERT: ebfede9056 origin build: add _output to .gitignore REVERT: 55412c7e3d origin build: make build-go and build-cross work REVERT: 68c74ff4ae origin build: modify hard coded path REVERT: 3d41a217f6 origin build: add origin tooling git-subtree-dir: cmd/service-catalog/go/src/github.com/kubernetes-incubator/service-catalog git-subtree-split: 50e234de836b5e7c9e3d7d763847b99a0f0ea500
…service-catalog/' changes from ae6b643caf..06b897d198 06b897d198 origin build: add origin tooling 092d7f8 Fix typos and resource names in walkthrough e2e logs (openshift#1237) d25bd11 Archive the old agenda doc, link to new one (openshift#1243) 6192d14 fix lint errors (openshift#1242) d103dad Fix lint errors and regenerate openapi (openshift#1238) e9328d3 Broker Relist (openshift#1183) b0f3222 Correct the reasons and messages set on the ready condition during async polling (openshift#1235) d2bb82f Re-enable the href checker (openshift#1232) 2c29654 Use feature gates in controller-manager (openshift#1231) 699eab9 switch build to go1.9 (openshift#1155) 7529ed8 broker resource secret authorization checking (openshift#1186) 50d9bdf v0.0.20 chart updates (openshift#1228) REVERT: ae6b643caf Use oc adm instead of oadm which might not exist in various installations. REVERT: 66a4eb2a2c Update instructions... will remove once documented elsewhere REVERT: 1b704d1530 replace build context setup with init containers REVERT: ee4df18c7f hack/lib: dedup os::util::host_platform and os::build::host_platform REVERT: 1cd6dfa998 origin: Switch out owners to Red Hatters REVERT: 664f4d318f Add instructions for syncing repos REVERT: 2f2cdd546b origin-build: delete files with colon in them REVERT: cdf8b12848 origin-build: don't build user-broker REVERT: ebfede9056 origin build: add _output to .gitignore REVERT: 55412c7e3d origin build: make build-go and build-cross work REVERT: 68c74ff4ae origin build: modify hard coded path REVERT: 3d41a217f6 origin build: add origin tooling git-subtree-dir: cmd/service-catalog/go/src/github.com/kubernetes-incubator/service-catalog git-subtree-split: 06b897d1988a5a3c035c5a971c15b97cbc732918
[3.8] Ensure Continue token is proxied for openshift RBAC list calls
The openshift router pod running the openshift/origin-haproxy-router:v* image does not reap the haproxy zombie processes. Its probably not a big deal for now - but we will hit process limits when this list of unreaped children grows - dependent on how often the /var/lib/haproxy/reload-haproxy reload script is called.
The text was updated successfully, but these errors were encountered: