Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why isn't set_dataset_error firing for salmon_rnaseq_10x_sn? #617

Open
jswelling opened this issue Jun 25, 2022 · 2 comments
Open

Why isn't set_dataset_error firing for salmon_rnaseq_10x_sn? #617

jswelling opened this issue Jun 25, 2022 · 2 comments

Comments

@jswelling
Copy link
Collaborator

Dependencies Blocking Task From Getting Scheduled

Dependency | Reason
-- | --
Task Instance State | Task is in the 'skipped' state which is not a valid state for execution. The task must be cleared in order to be run.
Not Previously Skipped | Skipping because of previous XCom result from parent task maybe_keep_cwl2
Dagrun Running | Task instance's dagrun was not in the 'running' state but in the state 'success'.

Task Instance Attributes

Attribute | Value
-- | --
dag_id | salmon_rnaseq_10x_sn
duration | None
end_date | 2022-06-25 19:14:49.488258+00:00
execution_date | 2022-06-24T20:08:20.535782+00:00
executor_config | {}
generate_command | <function TaskInstance.generate_command at 0x7f5d1a522d90>
hostname |  
is_premature | False
job_id | None
key | ('salmon_rnaseq_10x_sn', 'set_dataset_error', <Pendulum [2022-06-24T20:08:20.535782+00:00]>, 1)
log | <Logger airflow.task (INFO)>
log_filepath | /hive/users/hive/hubmap/hivevm193-prod/ingest-pipeline/src/ingest-pipeline/airflow/logs/salmon_rnaseq_10x_sn/set_dataset_error/2022-06-24T20:08:20.535782+00:00.log
log_url | http://hivevm193.psc.edu/admin/airflow/log?execution_date=2022-06-24T20%3A08%3A20.535782%2B00%3A00&task_id=set_dataset_error&dag_id=salmon_rnaseq_10x_sn
logger | <Logger airflow.task (INFO)>
mark_success_url | http://hivevm193.psc.edu/success?task_id=set_dataset_error&dag_id=salmon_rnaseq_10x_sn&execution_date=2022-06-24T20%3A08%3A20.535782%2B00%3A00&upstream=false&downstream=false
max_tries | 1
metadata | MetaData(bind=None)
next_try_number | 1
operator | PythonOperator
pid | None
pool | default_pool
pool_slots | 1
prev_attempted_tries | 0
previous_execution_date_success | 2022-06-24 19:59:43.918153+00:00
previous_start_date_success | 2022-06-25 18:17:02.466898+00:00
previous_ti | <TaskInstance: salmon_rnaseq_10x_sn.set_dataset_error 2022-06-24 19:59:43.918153+00:00 [skipped]>
previous_ti_success | <TaskInstance: salmon_rnaseq_10x_sn.set_dataset_error 2022-06-24 19:59:43.918153+00:00 [skipped]>
priority_weight | 3
queue | general_prod
queued_dttm | None
raw | False
run_as_user | None
start_date | 2022-06-25 19:14:49.488223+00:00
state | skipped
task | <Task(PythonOperator): set_dataset_error>
task_id | set_dataset_error
test_mode | False
try_number | 1
unixname | hive

Task Attributes

Attribute | Value
-- | --
dag | <DAG: salmon_rnaseq_10x_sn>
dag_id | salmon_rnaseq_10x_sn
depends_on_past | False
deps | {<TIDep(Trigger Rule)>, <TIDep(Previous Dagrun State)>, <TIDep(Not Previously Skipped)>, <TIDep(Not In Retry Period)>}
do_xcom_push | True
downstream_list | [<Task(JoinOperator): join>]
downstream_task_ids | {'join'}
email | ['[email protected]']
email_on_failure | False
email_on_retry | False
end_date | None
execution_timeout | None
executor_config | {}
extra_links | []
global_operator_extra_link_dict | {}
inlets | []
lineage_data | None
log | <Logger airflow.task.operators (INFO)>
logger | <Logger airflow.task.operators (INFO)>
max_retry_delay | None
on_failure_callback | <function create_dataset_state_error_callback.<locals>.set_dataset_state_error at 0x7f5c8f796400>
on_retry_callback | None
on_success_callback | None
op_args | []
op_kwargs | {'dataset_uuid_callable': <function get_dataset_uuid at 0x7f5c9287a9d8>, 'ds_state': 'Error', 'message': 'An error occurred in salmon-rnaseq'}
operator_extra_link_dict | {}
operator_extra_links | ()
outlets | []
owner | hubmap
params | {}
pool | default_pool
pool_slots | 1
priority_weight | 1
priority_weight_total | 3
provide_context | True
queue | general_prod
resources | None
retries | 1
retry_delay | 0:01:00
retry_exponential_backoff | False
run_as_user | None
schedule_interval | None
shallow_copy_attrs | ('python_callable', 'op_kwargs')
sla | None
start_date | 2019-01-01T00:00:00+00:00
subdag | None
task_concurrency | None
task_id | set_dataset_error
task_type | PythonOperator
template_ext | []
template_fields | ('templates_dict', 'op_args', 'op_kwargs')
templates_dict | None
trigger_rule | all_done
ui_color | #ffefeb
ui_fgcolor | #000
upstream_list | [<Task(BranchPythonOperator): maybe_keep_cwl4>, <Task(BranchPythonOperator): maybe_keep_cwl3>, <Task(BranchPythonOperator): maybe_keep_cwl2>, <Task(BranchPythonOperator): maybe_keep_cwl1>]
upstream_task_ids | {'maybe_keep_cwl4', 'maybe_keep_cwl3', 'maybe_keep_cwl2', 'maybe_keep_cwl1'}
wait_for_downstream | False
weight_rule | downstream



@jswelling
Copy link
Collaborator Author

It looks like set_dataset_error is happening correctly when the workflow fails at the maybe_keep_cwl1 step, but failing to trigger when the workflow fails at the maybe_keep_cwl3 step. Contrast the column just to the left of the 'June' label with the columns to the right of 'June' in this pic.
image

@jswelling
Copy link
Collaborator Author

This may have been due to the fact that the version of Airflow in use was 1.10.12, while our default is 1.10.15. Verify that the problem still exists now that 1.10.15 has been deployed on PROD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant