diff --git a/doc/iterative_cleanup.md b/doc/iterative_cleanup.md index cb52a348e..34f4a938b 100644 --- a/doc/iterative_cleanup.md +++ b/doc/iterative_cleanup.md @@ -11,12 +11,19 @@ all? ## Examples [kiba-extend-project](https://github.com/lyrasis/kiba-extend-project) -has been updated to reflect usage of the `IterativeCleanup` mixin. +has been updated to reflect usage of the `IterativeCleanup` mixin. If +you have an existing project based off `kiba-extend-project`, [this +diff](https://github.com/lyrasis/kiba-extend-project/compare/pre-iterative-cleanup...demo-iterative-cleanup) +might help identify what you need to add to your project to use +`IterativeCleanup`. -Refer to todo:link Kiba::Tms::AltNumsForObjTypeCleanup as an example - config module extending this mixin module in a simple way. See - todo:link Kiba::Tms::PlacesCleanupInitial for a more complex usage - with default overrides and custom pre/post transforms. +Refer to + [Kiba::Tms::AltNumsForObjTypeCleanup](https://github.com/lyrasis/kiba-tms/blob/main/lib/kiba/tms/alt_nums_for_obj_type_cleanup.rb) + as an example config module extending `IterativeCleanup` in a simple + way. See + [Kiba::Tms::PlacesCleanupInitial](https://github.com/lyrasis/kiba-tms/blob/main/lib/kiba/tms/places_cleanup_initial.rb) + for a more complex usage with default overrides and custom pre/post + transforms. ## Project setup assumptions @@ -114,16 +121,22 @@ is only used *inside the iterative cleanup process* to add and decode In the iterative cleanup process, the function of `:clean_fingerprint` is: -- Represent the original values of the editable fields of the - cleanup worksheet, so that we can identify rows where the client - made changes -- Allow multiple rows corrected to the same value to be collapsed - to one row for future iterations of review/cleanup +- Represent the original values of the editable fields of the cleanup + worksheet, so that we can identify rows where the client made + changes +- Allow multiple rows corrected to the same value to be collapsed to + one row for future iterations of review/cleanup -It follows that the `IterativeCleanup`-related `fingerprint_fields` used to create `:clean_fingerprint` should include all fields included in the worksheet that: +It follows that the `IterativeCleanup`-related `fingerprint_fields` +used to create `:clean_fingerprint` should include all fields included +in the worksheet that: - you expect to be edited -- combine to uniquely identify a row (for example, if you have an `:orig_name` column with the original data, and a separate, initially blank `:corrected_name` column, you'd need to include both fields in `fingerprint_fields`, since the initially blank value of `corrected_name` does not uniquely identify the rows.) +- combine to uniquely identify a row (for example, if you have an + `:orig_name` column with the original data, and a separate, + initially blank `:corrected_name` column, you'd need to include both + fields in `fingerprint_fields`, since the initially blank value of + `corrected_name` does not uniquely identify the rows.) The file associated with the iterative cleanup process' `base_job` is expected to include a `:fingerprint` field by default. The name of @@ -147,39 +160,88 @@ a `:fingerprint` field in ### places config notes -Defines settings used in `KeProject::Places::PrepForCleanup` job (and, presumably, in a real project, other jobs. +Defines settings used in `KeProject::Places::PrepForCleanup` job (and, +presumably, in a real project, other jobs. -Note that the value of `KeProject::Places.fingerprint_fields` is different from the value of `KeProject::PlacesCleanup.fingerprint_fields`. This works for the reasons outlined in the [`:fingerprint` vs. `:clean_fingerprint` section](#fingerprints). +Note that the value of `KeProject::Places.fingerprint_fields` is +different from the value of +`KeProject::PlacesCleanup.fingerprint_fields`. This works for the +reasons outlined in the [`:fingerprint` vs. `:clean_fingerprint` +section](#fingerprints). ### places cleanup config notes -Note that the value of `KeProject::Places.fingerprint_fields` is different from the value of `KeProject::PlacesCleanup.fingerprint_fields`. This works for the reasons outlined in the [`:fingerprint` vs. `:clean_fingerprint` section](#fingerprints). +Note that the value of `KeProject::Places.fingerprint_fields` is +different from the value of +`KeProject::PlacesCleanup.fingerprint_fields`. This works for the +reasons outlined in the [`:fingerprint` vs. `:clean_fingerprint` +section](#fingerprints). #### Required before extending `IterativeCleanup`: `base_job` -See [the description of base_job below](#basejob). +This job is created outside the iterative cleanup process, and serves +as the base and starting point for a cleanup process. -#### Required before extending `IterativeCleanup`: `job_tags` +The full registry entry key (e.g. `places__prep_for_cleanup`) must be +set as the `base_job` setting in a cleanup config module prior to +extending that module with {Kiba::Extend::Mixins::IterativeCleanup}. +See +`[lib/ke_project/places_cleanup.rb](https://github.com/lyrasis/kiba-extend-project/blob/main/lib/ke_project/places_cleanup.rb)`. -Allows retrieval and running of jobs via `thor jobs:tagged`, `thor jobs:tagged_or`, and `thor jobs:tagged_and` commands. +**IMPORTANT: This job's output must include a field which +combines/identifies the original values that may be affected by the +cleanup process.** The default expectation is that this field is named +`:fingerprint`, but this can be overridden by defining a custom +`orig_values_identifier` method in the extending module after +extension. This field is used as a matchpoint for merging cleaned up +data back into the migration, and identifying whether a given value in +subsequent worksheet iterations has been previously included in a +worksheet. -The cleanup config module must define either a `job_tags` setting or method before calling `extend Kiba::Extend::Mixins::IterativeCleanup`. +#### Required before extending `IterativeCleanup`: `fingerprint_fields` -If you do not wish to set tags, have the `job_tags` setting/method return an empty Array. +The fields that will be hashed into the `:clean_fingerprint` value. +See the [`:fingerprint` vs. `:clean_fingerprint` +section](#fingerprints) for more detail. -#### Required before extending `IterativeCleanup`: `worksheet_add_fields` +Usually you will want to include any `worksheet_add_fields`, plus any +other fields that, in combination with the `worksheet_add_fields`, +yield the full corrected value for the row. -Columns/fields that will be added in the cleanup process, usually to allow client to fill in values. +#### Optional default method overrides -In the first iteration of cleanup, these columns are blank. If these fields are also included in `fingerprint_fields`, their corrected values in returned worksheets will be merged into the migration and retained in subsequent worksheets. +The overrideable methods are well-documented at +{Kiba::Extend::Mixins::IterativeCleanup}. Look for the list under +"Methods that can be optionally overridden in extending module". -#### Required before extending `IterativeCleanup`: `fingerprint_fields` +##### `worksheet_add_fields` -The fields that will be hashed into the `:clean_fingerprint` value. See the [`:fingerprint` vs. `:clean_fingerprint` section](#fingerprints) for more detail. +I want clients to be able to remove things like "near" and "(?)" from +these place terms, recording proximity and uncertainty information in +separate fields. So I add those fields for use. -Usually you will want to include any `worksheet_add_fields`, plus any other fields that, in combination with the `worksheet_add_fields`, yield the full corrected value for the row. +##### `job_tags` -#### Required before extending `IterativeCleanup`: `fingerprint_fields` +Allows retrieval and running of jobs via `thor jobs:tagged`, `thor +jobs:tagged_or`, and `thor jobs:tagged_and` commands. + +##### `cleanup_base_name` + +This is an important one to understand. Our cleanup config module name +is `PlacesCleanup`, so by default, `cleanup_base_name` will be set to +`"places_cleanup"`. + +This is used as the namespace for registering the jobs associated with +the cleanup process, for example `:places_cleanup__worksheet`. + +You can override this if you want. + +##### Custom transforms! + +See {Kiba::Extend::Mixins::IterativeCleanup::Jobs} for documentation, +and the `kiba-tms` +[`PlacesInitialCleanup`](https://github.com/lyrasis/kiba-tms/blob/main/lib/kiba/tms/places_cleanup_initial.rb) +config module for use examples. ## The process @@ -192,25 +254,223 @@ The steps and settings are explained textually below the flowchart. ![Flowchart](https://github.com/lyrasis/kiba-extend/blob/main/doc/iterative_cleanup_flowchart.png?raw=true) -The following explanation uses the demonstration places cleanup in -kiba-extend-project as its main example. +Right now, the best place to step through and check out the processing +in a detailed way is to look at the following in the [`kiba-tms`] +repository: -### base_job {#basejob} +- [`PlacesInitialCleanup`](https://github.com/lyrasis/kiba-tms/blob/main/lib/kiba/tms/places_cleanup_initial.rb) + and [its detailed + tests](https://github.com/lyrasis/kiba-tms/blob/main/spec/kiba/tms/places_cleanup_initial_spec.rb): + - generation of initial worksheet (i.e. "when no cleanup done") + - merge of corrected data and generation of a second worksheet after + first round of cleanup is returned (i.e. "when initial cleanup + returned") + - after a new database export is received after an initial round of + cleanup has been done (i.e. "when fresh data after initial + cleanup") - all previous cleanup retained; cleanup rows linked to + now-deleted database data no longer appear; any new values in + cleanup worksheet generated at this point get flagged "to_review" + - verification of everything after worksheet based on fresh data is + returned, including "final" job (i.e. "when second round of + cleanup") -This job is created outside the iterative cleanup process, and serves -as the base and starting point for a cleanup process. +### BaseJobCleaned `:cleanup_base_name__base_job_cleaned` {#basejobcleaned} +#### If no cleanup worksheets returned -The full registry entry key (e.g. `places__prep_for_cleanup`) must be -set as the `base_job` setting in a cleanup config module prior to -extending that module with {Kiba::Extend::Mixins::IterativeCleanup}. -See -`[lib/ke_project/places_cleanup.rb](https://github.com/lyrasis/kiba-extend-project/blob/main/lib/ke_project/places_cleanup.rb)`. +Adds any `worksheet_add_fields` you have specified. + +Adds `:clean_fingerprint` field. + +#### If any cleanup worksheets returned + +Adds any `worksheet_add_fields` you have specified. + +Merges corrections. See [Corrections](#corrections) for details on how +corrections are prepared for merge back into original data. + +Adds `:clean_fingerprint` field. + +### CleanedUniq `:cleanup_base_name__cleaned_uniq` {#cleaneduniq} + +Starts with [BaseJobCleaned](#basejobcleaned) output. + +Deletes `:fingerprint` (or your custom `orig_values_identifier`) and +any custom `collate_fields` you specified. + +Deduplicates on `:clean_fingerprint` field values. Now if four rows +for "North Carolina", "NC", "N.C.", and "N. Carolina" have all been +changed to "North Carolina", we only have one row for "North +Carolina". + +Re-merges in the collate fields (including +`orig_values_identifier`/`:fingerprint` field) as multi-valued fields +(separated by `collation_delim`). This also pluralizes collate field +names that don't start with "s". So our one row for "North Carolina" +will have now have a `:fingerprints` field containing 4 fingerprint +values from the 4 original rows. + +Once all cleanup is done, this might be the appropriate source job for +further jobs generating unique authority terms. + +### Worksheet `:cleanup_base_name__worksheet` {#worksheet} + +Starts with [CleanedUniq](#cleaneduniq) output. + +See bottom of +[KeProject::PlacesCleanup](https://github.com/lyrasis/kiba-extend-project/blob/main/lib/ke_project/places_cleanup.rb) +for example of recording `provided_worksheets`. + +#### If you have recorded no `provided_worksheets` + +Rows are just passed through as-is. + +#### If you have recorded one or more `provided_worksheets` + +Gets a list of known `orig_values_identifier`/`:fingerprint` values in +provided worksheets. It does this by creating and `call`ing a new +{Kiba::Extend::Mixins::IterativeCleanup::KnownWorksheetValues} +instance. This: + +- Reads each provided worksheet file +- Gets the `:fingerprints` (or equivalent field) from each row and + splits the multiple values in a single field +- Compiles and deduplicates all the values + +A blank `:to_review` field is added to the worksheet being prepared. + +Now for each row we are going to output to *this* worksheet, we: + +- Split the values of the `:fingerprints` or equivalent field. +- If **all** the fingerprint values for this row are in the list of + known values, `:to_review` is left blank. +- Otherwise, `:to_review` is set to "y" + +### Worksheet is given to client for completion + +At this point, you should record this file in the +`provided_worksheets` setting. + +See bottom of +[KeProject::PlacesCleanup](https://github.com/lyrasis/kiba-extend-project/blob/main/lib/ke_project/places_cleanup.rb) +for example of recording `provided_worksheets`. + +### Client returns completed (or partially completed) worksheet + +At this point, you should record the returned file in the +`returned_files` setting. + +See bottom of +[KeProject::PlacesCleanup](https://github.com/lyrasis/kiba-extend-project/blob/main/lib/ke_project/places_cleanup.rb) +for example of recording `returned_files`. + +### ReturnedCompiled `:cleanup_base_name__returned_compiled` {#returnedcompiled} + +Reads in all rows from `returned_files` as data source. Note that +these must be listed oldest to newest. They are read in as sources in +that order, which is important when we get to merging corrections! + +Deletes :to_review field if present. + +Runs {Kiba::Extend::Transforms::Fingerprint::FlagChanged}, using +`:clean_fingerprint`. Any custom +`clean_fingerprint_flag_ignore_fields` are ignored. This: + +- Adds the decoded (original) fingerprint field values to new fields + prefixed with "fp_" +- Deletes `:clean_fingerprint` after it has been decoded +- Adds a `:corrected` field. +- Compares each original/fp_ field with its corresponding field in the + returned file. For rows where any values of the `fingerprint_fields` + was changed in the returned worksheet, the names of the fields with + changed values are gathered in the `:corrected` field. For rows with + no changes, the `:corrected` field is blank. + +Deletes the fields prefixed with `:fp_` derived during the +`FlagChanged` process. + +Runs {Kiba::Extend::Transforms::Clean::EnsureConsistentFields} to +ensure all rows have the same fields. + +### Corrections `:cleanup_base_name__corrections` {#corrections} + +Reads in the output of [ReturnedCompiled](#returnedcompiled). + +Deletes rows where `:corrections` field is blank. + +This leaves just rows where changes were made in a returned worksheet, +from oldest to newest. Order is important! + +Because this is an iterative cleanup process, we need to account for +the fact that cleanup done in worksheet #2 may have been done on a +single row that resulted from the cleanup of 4 rows in worksheet #1. +Recall the "North Carolina" example in [CleanedUniq](#cleaneduniq). + +For this reason, and because we merge all the corrections, from +oldest-to-newest, back into [BaseJobCleaned](#basejobcleaned) on the +original `:fingerprint`, we run +{Kiba::Extend::Transforms::Explode::RowsFromMultivalField} on that +`:fingerprint` (or equivalent) field. + +So, if, in round 1, the `:state` field values "NC", "N.C.", and "N. +Carolina" were all changed to "North Carolina", we have 3 rows in +Corrections output with instructions to merge "North Carolina" into +the `:state` field in rows with matching `:fingerprint` values. (The +4th "North Carolina" row had no change in round 1.) + +Now, in round 2, the client noticed that the row with `:state` = +"Ohio" also has `:country` = "USA", and added "USA" as country in the +row for "North Carolina". + +The Corrections output is now also going to have 4 rows with +instructions to merge "USA" into the `:country` field in rows with +matching `:fingerprint` values. + +So: + +``` +| country | state | corrected | fingerprint | +|---------+----------------+-----------+-------------| +| | North Carolina | state | 2 | +| | North Carolina | state | 3 | +| | North Carolina | state | 4 | +| USA | North Carolina | country | 1 | +| USA | North Carolina | country | 2 | +| USA | North Carolina | country | 3 | +| USA | North Carolina | country | 4 | +``` + +For lookup/merge back into [BaseJobCleaned](#basejobcleaned), those +rows are gathered into a hash, with `:fingerprint` as the key: + +``` +{ 2=>[ + {country: nil, state: "North Carolina", corrected: "state", fingerprint: 2}, + {country: "USA", state: "North Carolina", corrected: "country", fingerprint: 2} + ] +} +``` + +When the [BaseJobCleaned](#basejobcleaned) merge process hits the row +with `:fingerprint` = 2, it carries out the corrections per row, in +order. + +- `row[:state] = "North Carolina"` +- `row[:country] = "USA"` + +Why does it do this so inefficiently? Why not just take the last +cleanup row for each fingerprint and replace the field values? + +I can't tell you the details why but at some point I tried something +like that and ended up with a mess. That was before I had worked out +some of the stuff with having two separate fingerprints, and there +were other complications with that one. But I worked out this process +and it works, generally, across the board, so I'm leaving it for now. + +### Final `:cleanup_base_name__final` + +Unless you define custom transforms for this one, it just returns +[BaseJobCleaned](#basejobcleaned) with `:fingerprint` (or your custom +field defined in an override `final_lookup_on_field` method). -**IMPORTANT: This job's output must include a field which combines/identifies the -original values that may be affected by the cleanup process.** The -default expectation is that this field is named `:fingerprint`, but this -can be overridden by defining a custom `orig_values_identifier` method -in the extending module after extension. This field is used as a -matchpoint for merging cleaned up data back into the migration, and -identifying whether a given value in subsequent worksheet iterations -has been previously included in a worksheet. +Use this as a lookup to get your cleaned data back into other places +in the migration. diff --git a/doc/iterative_cleanup_flowchart.mmd b/doc/iterative_cleanup_flowchart.mmd index e5e20428d..1f57ebf17 100644 --- a/doc/iterative_cleanup_flowchart.mmd +++ b/doc/iterative_cleanup_flowchart.mmd @@ -8,8 +8,8 @@ graph TD; CleanedUniq["`**CleanedUniq** Deduplicate on :clean_fingerprint - Delete mod.cleaned_uniq_collate_fields - Collate mod.cleaned_uniq_collate_fields`"] + Delete collate_fields + Collate collate_fields`"] Worksheet["`**Worksheet** If worksheet already provided: @@ -36,6 +36,12 @@ graph TD; Explode collated mod.orig_values_identifier Deduplicate on full row match`"] + Final["`**Final** + Lets you: + - Set custom lookup key for merge back into migration + - Apply custom transforms on cleaned data that won't interfere with cleanup iterations`" + ] + base_job-->BaseJobCleaned; Corrections-. @@ -66,3 +72,5 @@ graph TD; Provided-->KnownWorksheetValues; KnownWorksheetValues-->Worksheet; + + BaseJobCleaned-->Final; diff --git a/doc/iterative_cleanup_flowchart.pdf b/doc/iterative_cleanup_flowchart.pdf index 719eccc25..45f0cb427 100644 Binary files a/doc/iterative_cleanup_flowchart.pdf and b/doc/iterative_cleanup_flowchart.pdf differ diff --git a/doc/iterative_cleanup_flowchart.png b/doc/iterative_cleanup_flowchart.png index 3a7e80fd6..3f98016f3 100644 Binary files a/doc/iterative_cleanup_flowchart.png and b/doc/iterative_cleanup_flowchart.png differ diff --git a/lib/kiba/extend/mixins/iterative_cleanup.rb b/lib/kiba/extend/mixins/iterative_cleanup.rb index 2729b1963..8bbfc1e2d 100644 --- a/lib/kiba/extend/mixins/iterative_cleanup.rb +++ b/lib/kiba/extend/mixins/iterative_cleanup.rb @@ -55,7 +55,7 @@ module Mixins # # `extend Kiba::Extend::Mixins::IterativeCleanup` # - # ### Optional settings/methods in extending module + # ### Methods that can be optionally overridden in extending module # # Default values for the following methods are defined in this mixin # module. If you want to override the values, define these methods @@ -69,6 +69,7 @@ module Mixins # - {collate_fields} # - {collation_delim} # - {clean_fingerprint_flag_ignore_fields} + # - {final_lookup_on_field} # # ## What extending this module does # @@ -143,8 +144,11 @@ def orig_values_identifier :fingerprint end - # Tags assigned to all jobs generated by IterativeCleanup for this - # module. DEFAULT VALUE: `[]` (empty array) + # Tags assigned to all jobs generated by IterativeCleanup for + # this module. Tags allow retrieval and running of jobs via + # `thor jobs:tagged`, `thor jobs:tagged_or`, and `thor + # jobs:tagged_and` commands. DEFAULT VALUE: `[]` (empty + # array) # # @note Optional: override in extending module after extending # @@ -177,9 +181,37 @@ def worksheet_field_order # Fields from base_job_cleaned that will be deleted in # cleaned_uniq, and then merged back into the deduplicated - # data from base_job_cleaned. I.e., fields whose values will - # be collated into multivalued fields on the deduplicated - # values. DEFAULT VALUE: `[]` + # data of that job from base_job_cleaned. I.e., fields whose + # values will be collated into multivalued fields on the + # deduplicated values. DEFAULT VALUE: `[]` + # + # Note that `:fingerprint` (or your overridden orig_values_identifier) + # is added to these values by the {all_collate_fields} method. That + # field should always be collated, or you will not be able to match + # final cleaned values back to original migration data. + # + # An example of when you might want to add additional collate + # fields: For authority term cleanup, especially if we are + # breaking up subject headings into individual subdivisions, + # I like to provide the full subject heading from which the + # term was derived, for context. For example, `:subdivision` + # = "History", `:fullheading` = "Ghana -- History". If you + # also have row with `:subdivision` = "Histories", + # `:fullheading` = "Ghana -- Histories", and the client + # corrects "Histories" to "History" in that row, if you + # include `:fullheading` in `collate_fields`, a subsequently + # generated worksheet row with `:subdivision` = "History" + # will have `:fullheading` = "Ghana -- History\\\\Ghana -- + # Histories". + # + # It can also be useful for clients with large cleanup + # projects to provide the number of occurrences for each + # value in the project. Retain this information through + # multiple cleanup iterations by collating the occurrences + # field and adding an inline transform to split and sum the + # values in a custom `cleaned_uniq_post_xforms` method. See + # [Tms::PlacesCleanupInitial](https://github.com/lyrasis/kiba-tms/blob/main/lib/kiba/tms/places_cleanup_initial.rb) + # for an example # # @note Optional: override in extending module after extending # @@ -233,6 +265,18 @@ def clean_fingerprint_flag_ignore_fields nil end + # Will be used to set the `lookup_on` field in job registry + # hash for `cleanup_base_name__final`, for merging + # cleaned-up data back into the rest of your migration. + # DEFAULT VALUE: value of orig_values_identifier + # + # @note Optional: override in extending module after extending + # + # @return [Symbol] + def final_lookup_on_field + orig_values_identifier + end + # DO NOT OVERRIDE REMAINING METHODS # @return [Array] supplied registry entry job keys @@ -309,6 +353,10 @@ def corrections_job_key "#{cleanup_base_name}__corrections".to_sym end + def final_job_key + "#{cleanup_base_name}__final".to_sym + end + # Appends "s" to module's `orig_values_identifier`. Used to # manage joining, collating, and splitting/exploding on this # value, while clarifying that any collated field in output @@ -417,6 +465,8 @@ def build_namespace register mod.send(:job_name, mod.send(:corrections_job_key)), mod.send(:corrections_job_hash, mod) end + register mod.send(:job_name, mod.send(:final_job_key)), + mod.send(:final_job_hash, mod) end end private :build_namespace @@ -497,6 +547,21 @@ def corrections_job_hash(mod) } end private :corrections_job_hash + + def final_job_hash(mod) + { + path: File.join(Kiba::Extend::Mixins::IterativeCleanup.datadir(mod), + "working", "#{mod.cleanup_base_name}_final.csv"), + creator: { + callee: + Kiba::Extend::Mixins::IterativeCleanup::Jobs::Final, + args: {mod: mod} + }, + tags: mod.job_tags, + lookup_on: mod.final_lookup_on_field + } + end + private :final_job_hash end end end diff --git a/lib/kiba/extend/mixins/iterative_cleanup/jobs/cleaned_uniq.rb b/lib/kiba/extend/mixins/iterative_cleanup/jobs/cleaned_uniq.rb index cd94d5e9f..a234f88c2 100644 --- a/lib/kiba/extend/mixins/iterative_cleanup/jobs/cleaned_uniq.rb +++ b/lib/kiba/extend/mixins/iterative_cleanup/jobs/cleaned_uniq.rb @@ -59,11 +59,11 @@ def cleaned_xforms(mod) Kiba.job_segment do job = bind.receiver + transform Delete::Fields, + fields: mod.all_collate_fields transform Deduplicate::Table, field: :clean_fingerprint, delete_field: false - transform Delete::Fields, - fields: mod.all_collate_fields transform Merge::MultiRowLookup, lookup: send(mod.base_job_cleaned_job_key), keycolumn: :clean_fingerprint, diff --git a/lib/kiba/extend/mixins/iterative_cleanup/jobs/final.rb b/lib/kiba/extend/mixins/iterative_cleanup/jobs/final.rb new file mode 100644 index 000000000..1c50853ea --- /dev/null +++ b/lib/kiba/extend/mixins/iterative_cleanup/jobs/final.rb @@ -0,0 +1,43 @@ +# frozen_string_literal: true + +module Kiba + module Extend + module Mixins + module IterativeCleanup + module Jobs + module Final + module_function + + def job(mod:) + Kiba::Extend::Jobs::Job.new( + files: { + source: mod.base_job_cleaned_job_key, + destination: mod.final_job_key + }, + transformer: get_xforms(mod) + ) + end + + def get_xforms(mod) + base = [] + if mod.respond_to?(:final_pre_xforms) + base << mod.final_pre_xforms + end + base << xforms(mod) + if mod.respond_to?(:final_post_xforms) + base << mod.final_post_xforms + end + base + end + + def xforms(mod) + Kiba.job_segment do + # passthrough - pre and post mean nothing here + end + end + end + end + end + end + end +end