Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboard reports incorrect completion state #5

Open
gidden opened this issue Dec 12, 2013 · 12 comments
Open

Dashboard reports incorrect completion state #5

gidden opened this issue Dec 12, 2013 · 12 comments

Comments

@gidden
Copy link
Contributor

gidden commented Dec 12, 2013

See http://198.101.154.53:8080/dashboard for Issue #666, whose batlab result is:
http://submit-1.batlab.org/nmi/results/details?runID=215388

@gidden
Copy link
Contributor Author

gidden commented Jan 20, 2014

I'll ping this issue again with a current example. See http://cycamore-ci.fuelcycle.org/dashboard issue 175. It is reported as successful, but if you go to the batlab run, you'll notice it failed.

@scopatz
Copy link
Member

scopatz commented Jan 20, 2014

This is because either _NMI_STEP_FAILED is the wrong thing to test or has the incorrect value. You can see it's usage here. @zwelchWI, how do we get the return status of the remote nodes? We want them all to be true for the post_all to report success.

@zwelchWI
Copy link
Member

It looks like the problem is that |_NMI_STEP_FAILED| is being evaluated
on the polyphemus server. I went and looked at the post all script for
issue 175 on the submit server and it looks like this :

#!/bin/bash

polyphemus post_all callbacks

if [ -z ]
then
curl --form
status='{"status":"success","number":175,"description":"build and test
completed successfully"}'
http://cycamore-ci.fuelcycle.org:80/batlabstatus
else
curl --form
status='{"status":"failure","number":175,"description":"build and test
failed"}' http://cycamore-ci.fuelcycle.org:80/batlabstatus
fi

which will always be true!

On Mon Jan 20 10:59:42 2014, Anthony Scopatz wrote:

This is because either |_NMI_STEP_FAILED| is the wrong thing to test
or has the incorrect value. You can see it's usage here
https://github.com/polyphemus-ci/polyphemus/blob/master/polyphemus/batlabrun.py#L37.
@zwelchWI https://github.com/zwelchWI, how do we get the return
status of the remote nodes? We want them all to be true for the
post_all to report success.


Reply to this email directly or view it on GitHub
#5 (comment).

Zach Welch
Graduate Student
Dept of Computer Sciences
University of Wisconsin - Madison

@gidden
Copy link
Contributor Author

gidden commented Feb 6, 2014

So I'm looking into this again, and it might actually be on batlab's end... take a look at the Result field of this run: http://submit-1.batlab.org/nmi/results/details?runID=226876

@gidden
Copy link
Contributor Author

gidden commented Feb 6, 2014

The overall batlab result also claimed success in the original run I linked to at the start of the issue

@gidden
Copy link
Contributor Author

gidden commented Feb 6, 2014

Ok, I've checked the postall script for pr #25, and it's correct. this is batlab's problem.

@gidden
Copy link
Contributor Author

gidden commented Feb 6, 2014

I assume @zwelchWI can handle it on batlab's end, if that's ok?

@gidden
Copy link
Contributor Author

gidden commented Feb 6, 2014

So, one last comment (sorry). Are we sure the post_all script is doing what we think it's doing? If the step it's checking failure against is the post_all step, then it will definitely succeed.

@zwelchWI
Copy link
Member

zwelchWI commented Feb 6, 2014

I'm reworking how we detect failure in post_all, because $_NMI_STEP_FAILED is clearly not working like we want it to.

@scopatz
Copy link
Member

scopatz commented Feb 6, 2014

Thank @zwelchWI.

@zwelchWI
Copy link
Member

zwelchWI commented Feb 7, 2014

GOOD NEWS: I have fixed our batlab stuff so it correctly sends FAIL
messages to polyphemus.

BAD NEWS: Polyphemus crashes when it gets a FAIL.

Continuing to work on this

On Thu Feb 6 17:27:34 2014, Anthony Scopatz wrote:

Thank @zwelchWI https://github.com/zwelchWI.


Reply to this email directly or view it on GitHub
#5 (comment).

Zach Welch
Graduate Student
Dept of Computer Sciences
University of Wisconsin - Madison

@scopatz
Copy link
Member

scopatz commented Feb 7, 2014

I'd like to think that this happens because Polyphemus gets so distraught over a failed batlab run that he just can't take it anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants