Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix parse_jobs_list_output parsing issues with SGE #52

Draft
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

QuantumChemist
Copy link
Contributor

@QuantumChemist QuantumChemist commented Dec 13, 2024

Addressing the issue #50

  • adjust parsing as xml file generated by qstat -xml -ext -j <JOB_ID> doesn't contain a <job_list> tag
  • fix unmatched job states

@QuantumChemist
Copy link
Contributor Author

QuantumChemist commented Dec 13, 2024

My leftover issue now is some unmapped job state <JAT_status>128</JAT_status> (or is this not a job state?):

this xml file from #50 generated via qstat -xml -ext -j <JOB_ID>

<?xml version='1.0'?>
<detailed_job_info  xmlns:xsd="http://arc.liv.ac.uk/repos/darcs/sge/source/dist/util/resources/schemas/qstat/detailed_job_info.xsd">
  <djob_info>
    <element>
      <JB_job_number>3</JB_job_number>
      <JB_ar>0</JB_ar>
      <JB_exec_file>job_scripts/3</JB_exec_file>
      <JB_submission_time>1732264574</JB_submission_time>
      <JB_owner>jobflow</JB_owner>
      <JB_uid>999</JB_uid>
      <JB_group>jobflow</JB_group>
      <JB_gid>999</JB_gid>
      <JB_account>sge</JB_account>
      <JB_merge_stderr>false</JB_merge_stderr>
      <JB_mail_list>
        <element>
          <MR_user>jobflow</MR_user>
          <MR_host>7d4618a8e833</MR_host>
        </element>
      </JB_mail_list>
      <JB_notify>false</JB_notify>
      <JB_job_name>add</JB_job_name>
      <JB_jobshare>0</JB_jobshare>
      <JB_env_list>
        <job_sublist>
          <VA_variable>__SGE_PREFIX__O_HOME</VA_variable>
          <VA_value>/home/jobflow</VA_value>
        </job_sublist>
        <job_sublist>
          <VA_variable>__SGE_PREFIX__O_LOGNAME</VA_variable>
          <VA_value>jobflow</VA_value>
        </job_sublist>
        <job_sublist>
          <VA_variable>__SGE_PREFIX__O_PATH</VA_variable>
          <VA_value>/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin</VA_value>
        </job_sublist>
        <job_sublist>
          <VA_variable>__SGE_PREFIX__O_SHELL</VA_variable>
          <VA_value>/bin/bash</VA_value>
        </job_sublist>
        <job_sublist>
          <VA_variable>TERM</VA_variable>
        </job_sublist>
        <job_sublist>
          <VA_variable>__SGE_PREFIX__O_HOST</VA_variable>
          <VA_value>7d4618a8e833</VA_value>
        </job_sublist>
        <job_sublist>
          <VA_variable>__SGE_PREFIX__O_WORKDIR</VA_variable>
          <VA_value>/home/jobflow</VA_value>
        </job_sublist>
      </JB_env_list>
      <JB_script_file>submit.sh</JB_script_file>
      <JB_ja_tasks>
        <ulong_sublist>
          <JAT_status>128</JAT_status>
          <JAT_task_number>1</JAT_task_number>
          <JAT_start_time>1732264584</JAT_start_time>
          <JAT_ntix>0.500000</JAT_ntix>
        </ulong_sublist>
      </JB_ja_tasks>
      <JB_deadline>0</JB_deadline>
      <JB_execution_time>0</JB_execution_time>
      <JB_checkpoint_attr>0</JB_checkpoint_attr>
      <JB_checkpoint_interval>0</JB_checkpoint_interval>
      <JB_reserve>false</JB_reserve>
      <JB_mail_options>0</JB_mail_options>
      <JB_priority>1024</JB_priority>
      <JB_restart>0</JB_restart>
      <JB_verify>0</JB_verify>
      <JB_script_size>0</JB_script_size>
      <JB_verify_suitable_queues>0</JB_verify_suitable_queues>
      <JB_soft_wallclock_gmt>0</JB_soft_wallclock_gmt>
      <JB_hard_wallclock_gmt>0</JB_hard_wallclock_gmt>
      <JB_override_tickets>0</JB_override_tickets>
      <JB_version>0</JB_version>
      <JB_ja_structure>
        <task_id_range>
          <RN_min>1</RN_min>
          <RN_max>1</RN_max>
          <RN_step>1</RN_step>
        </task_id_range>
      </JB_ja_structure>
      <JB_type>0</JB_type>
      <JB_binding>
        <element>
          <BN_strategy>no_job_binding</BN_strategy>
          <BN_type>0</BN_type>
          <BN_parameter_n>0</BN_parameter_n>
          <BN_parameter_socket_offset>0</BN_parameter_socket_offset>
          <BN_parameter_core_offset>0</BN_parameter_core_offset>
          <BN_parameter_striding_step_size>0</BN_parameter_striding_step_size>
          <BN_parameter_explicit>no_explicit_binding</BN_parameter_explicit>
        </element>
      </JB_binding>
      <JB_ja_task_concurrency>0</JB_ja_task_concurrency>
      <JB_nurg>0.500000</JB_nurg>
    </element>
  </djob_info>
  <messages>
    <element>
      <SME_global_message_list>
        <element>
          <MES_message_number>90</MES_message_number>
          <MES_message>There are no messages available</MES_message>
        </element>
      </SME_global_message_list>
    </element>
  </messages>
</detailed_job_info>

gives me:

/home/certural/Calcs/testcalcs/sge/qtoolkit_sge.py 
Unknown job state: 128 for job ID: 3
Unknown job state: Unknown for job ID: Unknown
Unknown job state: Unknown for job ID: Unknown
Unknown job state: Unknown for job ID: Unknown
Unknown job state: Unknown for job ID: Unknown
[QJob(name='add', job_id='3', exit_status=None, state=None, sub_state=None, info=QJobInfo(memory=None, memory_per_cpu=0, nodes=0, cpus=0, threads_per_process=None, time_limit=0), account=None, runtime=None, queue_name=None), QJob(name='Unknown', job_id='Unknown', exit_status=None, state=None, sub_state=None, info=QJobInfo(memory=None, memory_per_cpu=0, nodes=0, cpus=0, threads_per_process=None, time_limit=0), account=None, runtime=None, queue_name=None), QJob(name='Unknown', job_id='Unknown', exit_status=None, state=None, sub_state=None, info=QJobInfo(memory=None, memory_per_cpu=0, nodes=0, cpus=0, threads_per_process=None, time_limit=0), account=None, runtime=None, queue_name=None), QJob(name='Unknown', job_id='Unknown', exit_status=None, state=None, sub_state=None, info=QJobInfo(memory=None, memory_per_cpu=0, nodes=0, cpus=0, threads_per_process=None, time_limit=0), account=None, runtime=None, queue_name=None), QJob(name='Unknown', job_id='Unknown', exit_status=None, state=None, sub_state=None, info=QJobInfo(memory=None, memory_per_cpu=0, nodes=0, cpus=0, threads_per_process=None, time_limit=0), account=None, runtime=None, queue_name=None)]

Process finished with exit code 0

At least the returned list is not empty anymore.

@gpetretto
Copy link
Contributor

Hi @QuantumChemist,

thanks a lot for looking into this. The main issue that I see is that having the current state of the Job is critical, at least for jobflow-remote (although I suppose it would be important for any other usage of qtoolkit). Since it is absolutely not clear how to go from this JAT_status to the standard states, to avoid spending too much time on this maybe we should give up for the moment on this and allow only the job list based on user for SGE. What do you think?

@QuantumChemist
Copy link
Contributor Author

QuantumChemist commented Dec 18, 2024

Hi @QuantumChemist,

thanks a lot for looking into this. The main issue that I see is that having the current state of the Job is critical, at least for jobflow-remote (although I suppose it would be important for any other usage of qtoolkit). Since it is absolutely not clear how to go from this JAT_status to the standard states, to avoid spending too much time on this maybe we should give up for the moment on this and allow only the job list based on user for SGE. What do you think?

Hi @gpetretto ,
actually simply parsing the jobs info via the -u option sounds like the best approach for now.

I could change this PR then to remove the -j option in the new year (I'm on vacation already) or someone of you could go ahead if you don't want to wait. What would you prefer? 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants