-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for sending hadd jobs to condor #15
base: master
Are you sure you want to change the base?
Conversation
Hi Abby, This looks nice! Just a few comments for now:
it would be good to use the path relevant for whichever copy of the code is checked out (there is surely an env variable for this, though I don't recall it).
and it wasn't obvious to me that a condor job had even been submitted (in fact, you mention this in your TODOs in the Evernote). I'm also not sure where the output file goes? It didn't appear as
This would be important to check, since we could miss some of our input files (and duplicate the rest) otherwise! It's possible that I did something unconventional though, so we should double check against what you've seen...
I think we should have a .err, .out, and log file for each condor job that runs (and a .root file too, of course!). Let's look into these for now, and we can do a bit more testing after that. We should also think if the 200 input files per job (in Thanks! |
Hi Joey. Here is my next set of changes.
5 & 6. I may have said this previously, but because I was only popping 200 files at a time, the return value was only from the last(?) set of 200 ==> my conclusion (for now) was actually to simplify things and not have an upper limit to the files to hadd together so that max condor job submitted per sample is 1 ==> and 1 set of outputs. |
Now I have double checked that the new print statements work when you have a mix of jobs that were hadded locally and hadded via condor. |
Hi Abby, for me on the LPC, it doesn't seem like the input files are available for hadding when running things out-of-the-box. I wonder if it's a matter of bad input files on my side (very old ones!), or a difference between the LPC and the Wisconsin system. The condor jobs run without any obvious failures (!), but no output file is written. For testing purposes, it might be useful to have the same set of input files copied over to the LPC to remove one possible failure point--from the printouts I've done, it looks like the input files simply aren't on the condor node, but I could be wrong! |
Tools/scripts/mhadd
Outdated
else: | ||
#assume that the last case to consider is the one in which files are copied over (which do not produce any log but the .root file does exist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, I find the following error if I have an existing file that I made with mhadd
locally and try to run via mhadd -s
:
Traceback (most recent call last):
File "/uscms_data/d3/joeyr/abby_pr/mfv_946p1/bin/slc7_amd64_gcc630/mhadd", line 179, in <module>
hadd_(is_crab, d, new_dir=x)
File "/uscms_data/d3/joeyr/abby_pr/mfv_946p1/bin/slc7_amd64_gcc630/mhadd", line 153, in hadd_
print colors.boldred('skipping existing file %s (for which num sources %i = njobs %i)' % (new_name, log_njobs, njobs))
UnboundLocalError: local variable 'log_njobs' referenced before assignment
It looks like log_njobs
isn't defined when it gets to this line (at least in my case), so mhadd crashes here.
Ang is the main author of the most recent commit. We have restructured the submission to condor in a few ways :
|
#requirements necessary for wisconsin machines -- comment out if not needed | ||
# the CentOS == 7 requires proxy with 2048 bit and will fail with usual 1024 | ||
# voms-proxy-init -rfc -valid 144:00 -voms cms -bits 2048 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which ones are to be commented out, is it these?
TARGET.HAS_OSG_WN_CLIENT =?= TRUE
TARGET.OpSysMajorVer == 7
Rather than commenting in/out, you could do something like https://github.com/DisplacedVertices/cmssw-usercode/blob/master/Tools/python/ROOTTools.py#L1576-L1595 with the HOSTNAME
Hi @abbywarden and @Ang-Li-95, Thanks for the update to this! I skimmed the code and didn't see any obvious issues. But when trying to test it out, I noticed one thing:
Beyond that: have you been able to validate that the |
An additional flag (--submit) has been added to the mhadd command to send the hadd jobs to condor.