Openshift router pod (docker container) doesn't reap zombie processes #1242

ramr · 2015-03-06T01:58:22Z

The openshift router pod running the openshift/origin-haproxy-router:v* image does not reap the haproxy zombie processes. Its probably not a big deal for now - but we will hit process limits when this list of unreaped children grows - dependent on how often the /var/lib/haproxy/reload-haproxy reload script is called.

[vagrant@openshiftdev ~]$ sudo nsenter -m -u -n -i -p -t $(sudo docker inspect --format "{{ .State.Pid }}" "$(sudo docker ps | grep origin-haproxy-router | awk '{print $NF}')")    
[root@router-1-paurv /]# ps -ef | cat    
UID        PID  PPID  C STIME TTY          TIME CMD    
root         1     0  0 01:09 ?        00:00:01 /usr/bin/openshift-router --template=/var/lib/haproxy/conf/haproxy-config.template --reload=/var/lib/haproxy/reload-haproxy    
haproxy     15     1  0 01:09 ?        00:00:00 [haproxy] <defunct>    
haproxy     19     1  0 01:09 ?        00:00:00 [haproxy] <defunct>    
haproxy     23     1  0 01:09 ?        00:00:00 [haproxy] <defunct>    
haproxy     27     1  0 01:09 ?        00:00:00 [haproxy] <defunct>    
haproxy     31     1  0 01:09 ?        00:00:00 [haproxy] <defunct>    
haproxy     35     1  0 01:09 ?        00:00:00 [haproxy] <defunct>    
haproxy     39     1  0 01:09 ?        00:00:00 [haproxy] <defunct>    
haproxy     43     1  0 01:10 ?        00:00:00 [haproxy] <defunct>    
haproxy     80     1  0 01:49 ?        00:00:00 /usr/sbin/haproxy -f /var/lib/haproxy/conf/haproxy.config -p /var/lib/haproxy/run/haproxy.pid -sf 43    
root        81     0  0 01:56 ?        00:00:00 -bash    
root        97    81  0 01:57 ?        00:00:00 ps -ef    
root        98    81  0 01:57 ?        00:00:00 cat

The text was updated successfully, but these errors were encountered:

ramr · 2015-03-06T01:58:36Z

@pweil- fyi.

smarterclayton · 2015-03-06T05:10:59Z

Sigh

pweil- · 2015-03-06T13:46:31Z

Hrm, well that's troublesome since we're checking for a pid file and sending finish signals in the restart if we found it:

if [ -n "$old_pid" ]; then
  /usr/sbin/haproxy -f $config_file -p $pid_file -sf $old_pid
else
  /usr/sbin/haproxy -f $config_file -p $pid_file
fi

pweil- · 2015-03-06T15:07:26Z

Some thoughts:

the plugin is the entry point for the image, it executes the reload commands
the plugin could be modified to reap child processes - this introduces os specific code with syscall
maybe the docker container can be set up differently so that haproxy is running under an init process that can reap?

cc @pmorie

smarterclayton · 2015-03-06T17:05:01Z

In general, the 3rd option is most sane. We have systemd container for this, also supervisor and a few others. None of them are ideal. The important thing is that haproxy must know it is being restarted.

There may be a golang library already for acting as pid 1 - you should investigate that.

pweil- · 2015-03-06T17:29:29Z

@smarterclayton

HAProxy knows it is being restarted, the old pid is stopped and is defunct. I think the issue is the parent process (the router plugin) doesn't care and isn't gathering the exit code so it lingers until the parent process stops (which running under init would do if it inherited the child).

But, I can do this without anything crazy or OS specific code if I have the restart script pass back the old pid and do something like this in a go routine:

proc, e := os.FindProcess(int(pid))
proc.Wait()

if that seems reasonable. Preliminary testing seems to work fine. Full test code: master...pweil-:router-reaper

I haven't run across any "do it for you" solutions. I did find a similar issue that led me to the SIGCHLD solution that was OS specific. I will look again since I was mainly searching for reaping with our process acting as pid 1.

smarterclayton · 2015-03-06T19:21:50Z

There's a golang process manager written by brad fitz. It's worth a look.

On Mar 6, 2015, at 12:29 PM, Paul [email protected] wrote:

@smarterclayton

HAProxy knows it is being restarted, the old pid is stopped and is defunct. I think the issue is the parent process (the router plugin) doesn't care and isn't gathering the exit code so it lingers until the parent process stops (which running under init would do if it inherited the child).

But, I can do this without anything crazy or OS specific code if I have the restart script pass back the old pid and do something like this in a go routine:

proc, e := os.FindProcess(int(pid))
proc.Wait()
if that seems reasonable. Preliminary testing seems to work fine.

I haven't run across any "do it for you" solutions. I did find a similar issue that led me to the SIGCHLD solution that was OS specific. I will look again since I was mainly searching for reaping with our process acting as pid 1.

—
Reply to this email directly or view it on GitHub.

pweil- · 2015-03-06T19:35:24Z

checking out dinit, and baseimage-docker. Will check that one out too.

smarterclayton · 2015-03-06T19:37:52Z

BaseImage docker is a terrible pattern

On Mar 6, 2015, at 2:35 PM, Paul [email protected] wrote:

checking out dinit, and baseimage-docker. Will check that one out too.

—
Reply to this email directly or view it on GitHub.

pweil- · 2015-03-06T19:47:04Z

rm -Rf pauls_brain/baseimage-docker

smarterclayton · 2015-03-09T05:13:49Z

My preferred solution here would be a simple addition to our Go binaries which detect when they are PID 1 and enable a simple minimal behavior for reaping (Wait on child processes in a goroutine) and proper signal handling. Docker may already have this from the old dockerinit code - I'd be surprised if someone hasn't handled it. @mrunalp you aware of anything like that?

mrunalp · 2015-03-09T06:06:13Z

@smarterclayton delved into that last week in lib container. Should be possible to put something together. I will take a closer look at the router tomorrow and see what we can do.

pweil- · 2015-03-09T12:30:23Z

@smarterclayton @mrunalp - that doesn't seem too difficult. I will wait to see if Mrunal knows of something already done. I looked at runsit, it seems to do the trick but makes setting up the image a little less obvious. Still solvable but if we'd prefer a simpler solution (I would) then I won't spend more time diving in.

@mrunalp - let me know if there is anything I can do to help

mrunalp · 2015-03-09T17:12:58Z

@pweil- I am going to prototype what @smarterclayton suggested; basically make the process act as a simple init that reaps zombies. If it works reliably, then we can use that else look for a real init.

pweil- · 2015-03-09T17:25:43Z

@mrunalp - this may not be the ideal implementation since it doesn't catch /any/ child process but it seemed to work fine for me as a general solution. master...pweil-:router-reaper . It was my prototype for option 2 in the comment above

smarterclayton · 2015-03-14T16:12:12Z

Fixt

…service-catalog/' changes from ae6b643caf..50e234de83 50e234de83 origin build: add origin tooling 092d7f8 Fix typos and resource names in walkthrough e2e logs (openshift#1237) d25bd11 Archive the old agenda doc, link to new one (openshift#1243) 6192d14 fix lint errors (openshift#1242) d103dad Fix lint errors and regenerate openapi (openshift#1238) e9328d3 Broker Relist (openshift#1183) b0f3222 Correct the reasons and messages set on the ready condition during async polling (openshift#1235) d2bb82f Re-enable the href checker (openshift#1232) 2c29654 Use feature gates in controller-manager (openshift#1231) 699eab9 switch build to go1.9 (openshift#1155) 7529ed8 broker resource secret authorization checking (openshift#1186) 50d9bdf v0.0.20 chart updates (openshift#1228) REVERT: ae6b643caf Use oc adm instead of oadm which might not exist in various installations. REVERT: 66a4eb2a2c Update instructions... will remove once documented elsewhere REVERT: 1b704d1530 replace build context setup with init containers REVERT: ee4df18c7f hack/lib: dedup os::util::host_platform and os::build::host_platform REVERT: 1cd6dfa998 origin: Switch out owners to Red Hatters REVERT: 664f4d318f Add instructions for syncing repos REVERT: 2f2cdd546b origin-build: delete files with colon in them REVERT: cdf8b12848 origin-build: don't build user-broker REVERT: ebfede9056 origin build: add _output to .gitignore REVERT: 55412c7e3d origin build: make build-go and build-cross work REVERT: 68c74ff4ae origin build: modify hard coded path REVERT: 3d41a217f6 origin build: add origin tooling git-subtree-dir: cmd/service-catalog/go/src/github.com/kubernetes-incubator/service-catalog git-subtree-split: 50e234de836b5e7c9e3d7d763847b99a0f0ea500

…service-catalog/' changes from ae6b643caf..06b897d198 06b897d198 origin build: add origin tooling 092d7f8 Fix typos and resource names in walkthrough e2e logs (openshift#1237) d25bd11 Archive the old agenda doc, link to new one (openshift#1243) 6192d14 fix lint errors (openshift#1242) d103dad Fix lint errors and regenerate openapi (openshift#1238) e9328d3 Broker Relist (openshift#1183) b0f3222 Correct the reasons and messages set on the ready condition during async polling (openshift#1235) d2bb82f Re-enable the href checker (openshift#1232) 2c29654 Use feature gates in controller-manager (openshift#1231) 699eab9 switch build to go1.9 (openshift#1155) 7529ed8 broker resource secret authorization checking (openshift#1186) 50d9bdf v0.0.20 chart updates (openshift#1228) REVERT: ae6b643caf Use oc adm instead of oadm which might not exist in various installations. REVERT: 66a4eb2a2c Update instructions... will remove once documented elsewhere REVERT: 1b704d1530 replace build context setup with init containers REVERT: ee4df18c7f hack/lib: dedup os::util::host_platform and os::build::host_platform REVERT: 1cd6dfa998 origin: Switch out owners to Red Hatters REVERT: 664f4d318f Add instructions for syncing repos REVERT: 2f2cdd546b origin-build: delete files with colon in them REVERT: cdf8b12848 origin-build: don't build user-broker REVERT: ebfede9056 origin build: add _output to .gitignore REVERT: 55412c7e3d origin build: make build-go and build-cross work REVERT: 68c74ff4ae origin build: modify hard coded path REVERT: 3d41a217f6 origin build: add origin tooling git-subtree-dir: cmd/service-catalog/go/src/github.com/kubernetes-incubator/service-catalog git-subtree-split: 06b897d1988a5a3c035c5a971c15b97cbc732918

[3.8] Ensure Continue token is proxied for openshift RBAC list calls

mrunalp mentioned this issue Mar 10, 2015

Adds reaping capability to openshift. #1264

Merged

smarterclayton closed this as completed Mar 14, 2015

jpeeler pushed a commit to jpeeler/origin that referenced this issue Feb 1, 2018

fix lint errors (openshift#1242)

6192d14

sjenning pushed a commit to sjenning/origin that referenced this issue Jul 30, 2018

Merge pull request openshift#1242 from liggitt/proxy-list-continue-3.8

fc84a4a

[3.8] Ensure Continue token is proxied for openshift RBAC list calls

wking mentioned this issue Mar 1, 2019

install: Fix reference to pod image openshift/machine-config-operator#508

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Openshift router pod (docker container) doesn't reap zombie processes #1242

Openshift router pod (docker container) doesn't reap zombie processes #1242

ramr commented Mar 6, 2015

ramr commented Mar 6, 2015

smarterclayton commented Mar 6, 2015

pweil- commented Mar 6, 2015

pweil- commented Mar 6, 2015

smarterclayton commented Mar 6, 2015

pweil- commented Mar 6, 2015

smarterclayton commented Mar 6, 2015

pweil- commented Mar 6, 2015

smarterclayton commented Mar 6, 2015

pweil- commented Mar 6, 2015

smarterclayton commented Mar 9, 2015

mrunalp commented Mar 9, 2015

pweil- commented Mar 9, 2015

mrunalp commented Mar 9, 2015

pweil- commented Mar 9, 2015

smarterclayton commented Mar 14, 2015

Openshift router pod (docker container) doesn't reap zombie processes #1242

Openshift router pod (docker container) doesn't reap zombie processes #1242

Comments

ramr commented Mar 6, 2015

ramr commented Mar 6, 2015

smarterclayton commented Mar 6, 2015

pweil- commented Mar 6, 2015

pweil- commented Mar 6, 2015

smarterclayton commented Mar 6, 2015

pweil- commented Mar 6, 2015

smarterclayton commented Mar 6, 2015

pweil- commented Mar 6, 2015

smarterclayton commented Mar 6, 2015

pweil- commented Mar 6, 2015

smarterclayton commented Mar 9, 2015

mrunalp commented Mar 9, 2015

pweil- commented Mar 9, 2015

mrunalp commented Mar 9, 2015

pweil- commented Mar 9, 2015

smarterclayton commented Mar 14, 2015