Skip to content

Commit

Permalink
Merge pull request #107 from HERA-Team/fix_all_tneighbors
Browse files Browse the repository at this point in the history
Fix time pre-requisite handling
  • Loading branch information
plaplant authored Jul 24, 2020
2 parents eadaf45 + f349928 commit fef5d34
Show file tree
Hide file tree
Showing 16 changed files with 351 additions and 294 deletions.
46 changes: 26 additions & 20 deletions docs/config_files.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,38 +71,44 @@ chunking keywords listed below are used to determine which files are primary
obsids for a given file, and hence which steps must be completed before
launching a particular task script.

### n_time_neighbors
### Time Chunking

When running a workflow, it is sometimes desirable to operate on several files
contiguous in time as a single chunk. There are several options that control how
a full list of files is partitioned into a series of time-contiguous chunks that
are all operated on together as a single job in the workflow, referred to as
"time neighbors". When evaluating the workflow to determine which obsids to
operate on, the code defines a notion of "primary obsids". For each primary
obsid, a task script is run. Each obsid could be a primary obsid. However, it
is also possible to partition the list such that, e.g., every tenth file is a
primary obsid, and the others do not have corresponding task scripts generated
for them. The specific keywords that may be specified are:

* `n_time_neighbors`: the number of files that are considered "time neighbors"
for a given primiary obsid. Must be a non-negative integer. Default is 0
(i.e., no time neighbors will be used unless specified).
are all operated on together as a single job in the workflow, referred to as a
"time chunk". When evaluating the workflow to determine which obsids to operate
on, the code defines a notion of "primary obsids". For each primary obsid, a
task script is run. Each obsid could be a primary obsid. However, it is also
possible to partition the list such that, e.g., every tenth file is a primary
obsid, and the others do not have corresponding task scripts generated for
them. The specific keywords that may be specified are:

* `chunk_size`: the total size of a given time chunk, in terms of the number of
files. In addition to an integer, can also be the string `"all"` to indicate
the chunk includes all time values.
* `time_centered`: whether to treat a chunk of files such that the primary obsid
is in the center, with `n_time_neighbors` on either side for a total length of
2 * `n_time_neighbors` + 1 (True), or as the start of a chunk of files with
total length of `n_time_neighbors` + 1 (False). Default is True.
is in the center the chunk (True), or the start of the chunk. If
`time_centered` is `True` and `chunk_size` is even, an extra entry is included
on the left to make the chunk symmetric about the chunk center. Default is
`True`.
* `stride_length`: the number of obsids to stride by when generating the list of
primary obsids. For example, if `stride_length = 11`, and `n_time_neighbors =
10`, and `time_centered` is `False`, the list will be partitioned into chunks
11 files long with no overlap. Default is 1 (i.e., every obsid will be treated
as a primary obsid with the exception of those files within `n_time_neighbors`
of the edge).
primary obsids. For example, if `stride_length = 10`, `chunk_size=10`, and
`time_centered` is `False`, the list will be partitioned into chunks 10 files
long with no overlap. Default is 1 (i.e., every obsid will be treated as a
primary obsid with the exception of those files within `chunk_size` of
the edge).
* `collect_stragglers`: determine how to handle lists that are not evenly
divided by `stride_length`. If True, any files that would not evenly be added
to a full group are instead added to the second-to-last group to make an
"extra large" group, ensuring that all files are accounted for when
processing. If False, these obsids will not be included in the list. Default
is False.
* `prereq_chunk_size`: this option is specified if the user wants to wait for
specific entries in the previous step to finish before starting the current
one, without necessarily using them. Usually this will not be set, or it will
be `"all"` to indicate all entries for the previous step must be completed
before proceeding.


### mem
Expand Down
2 changes: 1 addition & 1 deletion hera_opm/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.1.0
1.2.0
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,12 @@ args = ["{basename}", "${Options:ex_ants}"]

[OMNICAL_METRICS]
prereqs = "OMNICAL"
n_time_neighbors = 1
chunk_size = 3
args = ["{basename}", "{prev_basename}", "{next_basename}"]

[OMNI_APPLY]
prereqs = "OMNICAL_METRICS"
n_time_neighbors = 1
chunk_size = 3
stride_length = 2
args = "{basename}"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,15 @@ actions = ["XRFI"]

[XRFI]
args = "{basename}"
n_time_neighbors = 'all'
chunk_size = 'all'
stride_length = 'all'

[XRFI_CENTERED]
args = "{basename}"
n_time_neighbors = 'all'
chunk_size = 'all'
time_centered = true

[XRFI_NOT_CENTERED]
args = "{basename}"
n_time_neighbors = 'all'
chunk_size = 'all'
time_centered = false
4 changes: 2 additions & 2 deletions hera_opm/data/sample_config/nrao_rtp_stride_length.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,12 @@ args = ["{basename}", "${Options:ex_ants}"]

[OMNICAL_METRICS]
prereqs = "OMNICAL"
n_time_neighbors = 1
chunk_size = 3
args = ["{basename}", "{prev_basename}", "{next_basename}"]

[OMNI_APPLY]
prereqs = "OMNICAL_METRICS"
n_time_neighbors = 1
chunk_size = 3
stride_length = 2
# args = ["{basename}", "{obsid_list}"]
args = "{basename}"
Expand Down
Loading

0 comments on commit fef5d34

Please sign in to comment.