-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Troubleshoot connection issue between parsl and kubernetes cluster #36
Comments
This issue is a sub-task towards the ultimate goal of #1 |
I was able to resolve the The job lasted a while, as it should because there are sleep periods inserted into the function, and it was running in a tmux session. Throughout the process, the script successfully output the print messages we expect from the function. The print statements in the script itself, outside of the function, also were present, such as When I checked back on the process this morning, at the end of the log, I see something new: I wonder why there is a Overall, the workflow has certainly made a big step forward. I think it is safe to say that the job is running successfully now, and there seems to be something funky going on with how the pods are shutting down after the script completes. |
A note for the last comment: The Since the other tickets regarding setting up a new user and an env in the container have been resolved, there may not be a "connection issue" between parsl and k8s anymore, but rather just some adjustments to be made to write the log and the output viz tilesets. The workflow is not running smoothly yet, but I will be able to pinpoint the smaller issue better now that we have the new user and env set up in the container. |
Progress
The parsl and kubernetes viz workflow has been progressing nicely in the following ways:
WORKDIR
specified in the Dockerfileapp/
for theWORKDIR
and/mnt/data
for the PVparsl_config.py
to a lower line because that script is often updated with a new version number for the published image, and moving the pip install line to right after copying over therequirements.txt
)runinfo
directory is created each run (in the dir of the python script on Datateam, not in the container or PV), which is a sign parsl is working behind the scenes to some degreepython parsl_workflow.py > k8s_parsl.log 2>&1
Problems
kubectl get pods
, their status isCrashLoopBackOff
kubectl logs {podname}
returns the print statement that we included in the workflow ("Worker started...") plus a vague syntax error:A good sign
Print statements inserted into the script at all stages are being printed to the log output we specify when we run the python script (in the example command given above, that is
k8s_parsl.log
), including the final statement "script complete".When running the parsl and kubernetes workflow with a parsl app that does not ingest data files nor output files, and instead executes a simple iterative mathematical operation with print statements and sleep periods inserted, the output seems to imply the script worked as expected. However, the pods are still not shutting down after, and the
CrashLoopBackOff
status is still the case.Useful Commands
kubectl run -n pdgrun -i --tty --rm busybox --image=busybox -- sh
initates a pod and inserts you into the pod so you can poke around. This pod is namedbusybox
by default. An example of one way to "poke around" is to ping the IP address you specify in the parsl config to see if it is open. Example:telnet 128.111.85.174 54001
Suggested next steps
in the parsl config and comment out the line we have been using thus far:
worker_init = 'echo "Worker started..."',
to get more info in the pod logsparsl_config.py
, play around with usingaddress = address_by_route(),
vsaddress='128.111.85.174',
Thank you to Matthew Brook and Matt Jones for all your help troubleshooting thus far!
The text was updated successfully, but these errors were encountered: