Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agentctl.sh: Sanity check PID_FILE and remove at stop #278

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -133,13 +133,15 @@ stop()
if [ $(pid_status) -ne 0 ]
then
status
rm -f $PID_FILE
exit 0
fi

# Send a kill (with an optional -9 signal) to the process
kill $OPT_FORCEKILL `cat $PID_FILE`

echo "Stopping $APP_NAME - PID [`cat $PID_FILE`]"
rm -f $PID_FILE
}

###############################################################################
Expand Down Expand Up @@ -202,14 +204,19 @@ get_pid()
PID=`cat $PID_FILE`
PS_STAT=`ps -p $PID -o'user,pid=' | tail -1 | awk '{print $2}'`

case "$PS_STAT" in
$PID ) echo $PID
;;
* ) echo 0
;;
esac
if [[ "$PS_STAT" -eq "$PID" ]]; then
CMD=$(ps -p $PID -o 'comm=')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request. Can you explain what those 2 lines do (208 and 209)?. I am not a shell expert :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm using ps to find the name of the program for PID $PID which we got from the PID_FILE.

$ cat /var/glu/data/logs/org.linkedin.glu.agent-server.pid
29302
$ ps -p 29302 -o 'comm='
java

I then do a string comparison to see if the command I found is equal to the command I expect to run with $JAVA_CMD. In my case $JAVA_CMD contains the full path to java for example JAVA_CMD=/opt/local/java/bin/java so basename removes all leading directory components.

$ basename /opt/local/java/bin/java
java

If you are curious basename works even with relative paths or if there is no leading directories.

$ basename java
java
$ basename ../bin/java
java

I suspect $(basename "$JAVA_CMD") will always return "java", but maybe someone out there is using something like Android's Dalvik so I figure it is best to find the command from JAVA_CMD.

This is all just a basic sanity check. We could still find a process with the PID we are looking for that happens to be a java process, but is not the glu-agent. However that is much more unlikely than the situation without the check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation.

  • Is basename a standard command on Mac/Linux/Unix?
  • not sure if that would help but one of the command line argument is this one: -Dorg.linkedin.app.name=org.linkedin.glu.agent-server so if there is an easier way to check for this vs simply checking for java then you would be more certain that it is the agent...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basename is part of GNU coreutils on Linux and part of BSD general commands on Mac. The mac man page says basename is part of the POSIX.2 standard so I'm pretty sure other Unix variants will have it as well.

We could check the command line arguments like you suggest but the test is more complicated in shell and I don't really think it is necessary. Since I'm removing the PID_FILE on clean shutdowns it is less likely that we'll have a stale PID_FILE. My systems have very few java processes so the odds of having a stale PID_FILE and having the PID wrap and get used by a different java process are very small. Perhaps others run lots of java processes, but I seem to be the first person who has hit (or at least reported/fixed) this issue in the first place which means it was already not that common even before my change. Maybe LinkedIn doesn't reboot or doesn't start glu-agent at startup.

if [[ "$CMD" == "$(basename "$JAVA_CMD")" ]]; then
echo $PID
else
echo 0
fi
else
echo 0
fi

fi

return 0
}

Expand Down Expand Up @@ -350,4 +357,4 @@ case $1 in
*) usage
exit 1
;;
esac
esac