Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add healthcheck #47

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

Add healthcheck #47

wants to merge 1 commit into from

Conversation

phlax
Copy link
Contributor

@phlax phlax commented May 27, 2020

I think exportfs provides a fairly reliable way to check whether the container is healthy, so it seems to me make sense to add the healthcheck in the image - ie the healthcheck doesnt require any runtime vars

@ehough
Copy link
Owner

ehough commented Jul 21, 2020

LGTM. Thank you for your contribution! I'll merge momentarily.

@phlax
Copy link
Contributor Author

phlax commented Jul 21, 2020

hi @ehough - the only thing i wasnt 100% sure about is the optimal settings for retry etc - these seemed reasonable defaults - but not sure

@phlax
Copy link
Contributor Author

phlax commented Jul 21, 2020

i guess the issue with the grace period is significant here

@ehough
Copy link
Owner

ehough commented Jul 23, 2020

OK I think I spoke too soon. The problem is that exportfs's exit code seems like it's always 0. Run this:

docker run --rm alpine sh -c "apk add nfs-utils && exportfs; echo $?"

So exportfs's exit code would indicate to Docker that the container is healthy, even though there could be other problems with the server.

This image does a ton of error checking during boot up, so I think if entrypoint.sh is alive after, say 30 seconds, then we can somewhat safely assume that the container is healthy. Perhaps in the future we could add more sophisticated checks (i.e. ID mapping is requested, ensure that idmapd is running).

Would you be interested in updating this PR to instead check for entrypoint.sh? Something like pidof /usr/local/bin/entrypoint.sh might be enough. I'm happy to make the change but I don't want to clobber your contribution!

i guess the issue with the grace period is significant here

I haven't forgotten about #44. I was testing it this weekend and I kept crashing my laptop (it's running El Capitan, which has major issues with NFSv4). Stand by for updates there.

@phlax
Copy link
Contributor Author

phlax commented Jul 23, 2020

The problem is that exportfs's exit code seems like it's always 0

wierd, i tested this when i made the PR and have been using it since. Not sure what i must have tested 8/ .

Would you be interested in updating this PR to instead check for entrypoint.sh?

sure

@phlax
Copy link
Contributor Author

phlax commented Jul 23, 2020

after a little playing with this, i have come to the conclusion that this healthcheck should probs not be added to the base image, for a few reasons

  • checking the pid of the entrypoint doesnt tell you anything about the health of the nfs export - and im thinking that even if the entrypoint is hanging it would still return successfully with pidof. Also - unless i misunderstand - this would return healthy from the point the script was started, which is not what you want when waiting for services
  • probably the more reliable test is to nc test that any ports expected are alive - but these will depend on which nfs version and config etc
  • its generally only a good idea to add healthchecks to images (rather than in orchestration) if the check is reliable in any situation and wont degrade performance - im not sure that is the case here

i had a quick look at the checks in the entrypoint and my impression is that these are more reliable, but not sure how to indicate from script to env the health of exported dirs

also worth mentioning - https://www.ibm.com/support/pages/understanding-nfs-health-check-datapower - although it seems only tangentially helpful

@phlax
Copy link
Contributor Author

phlax commented Jul 23, 2020

also - i might be wrong - but i think this does work after all (it seems to in my deployment)

try this:

docker run --rm alpine sh -c "apk add nfs-utils && echo $(exportfs); echo \"Received: $?\""

compared to:

docker run --rm alpine sh -c "apk add nfs-utils && echo $(exit 0); echo \"Received: $?\""

this was wrong - further testing shows it was expanding the $() in the host

@phlax
Copy link
Contributor Author

phlax commented Jul 23, 2020

although not sure 8/

$ docker run -ti --rm alpine sh
/ # apk add nfs-utils && exportfs
fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/community/x86_64/APKINDEX.tar.gz
(1/15) Installing krb5-conf (1.0-r1)
(2/15) Installing libcom_err (1.45.5-r0)
(3/15) Installing keyutils-libs (1.6.1-r0)
(4/15) Installing libverto (0.3.1-r1)
(5/15) Installing krb5-libs (1.17.1-r0)
(6/15) Installing libtirpc (1.1.4-r0)
(7/15) Installing rpcbind (1.2.5-r0)
Executing rpcbind-1.2.5-r0.pre-install
(8/15) Installing libblkid (2.34-r1)
(9/15) Installing libcap (2.27-r0)
(10/15) Installing device-mapper-libs (2.02.186-r0)
(11/15) Installing libevent (2.1.11-r0)
(12/15) Installing libmount (2.34-r1)
(13/15) Installing libnfsidmap (2.4.1-r2)
(14/15) Installing sqlite-libs (3.30.1-r2)
(15/15) Installing nfs-utils (2.4.1-r2)
Executing busybox-1.31.1-r9.trigger
OK: 12 MiB in 29 packages
/ # echo $?
0

@phlax
Copy link
Contributor Author

phlax commented Jul 23, 2020

this seems to work, but not sure about performance or how reliable

/ # nfsstat -s
Error: No Server Stats (/proc/net/rpc/nfsd: No such file or directory). 
/ # echo "$?"
2

edit: i dont seem to be able to get this to work even with successfully exported dirs - i think because docker and /proc - either way it doesnt seem an option

@phlax
Copy link
Contributor Author

phlax commented Jul 28, 2020

@ehough shall i close this?

i think its probs best to leave to individual setups etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants