Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize and better document IterativeCleanup module #185

Merged
merged 2 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
238 changes: 123 additions & 115 deletions lib/kiba/extend/mixins/iterative_cleanup.rb
Original file line number Diff line number Diff line change
@@ -1,118 +1,124 @@
# frozen_string_literal: true

# Mixin module for setting up iterative cleanup based on a source table.
#
# @since 4.0.0
#
# "Iterative cleanup" means the client may provide the worksheet more
# than once, or that you may need to produce a fresh worksheet for
# the client after a new database export is provided.
#
# Your project must follow some setup/configuration conventions in order to use
# this mixin:
#
# - Each cleanup process must be configured in its own config module.
# - A config module is a Ruby module that responds to `:config`.
#
# Refer to todo:link Kiba::Tms::AltNumsForObjTypeCleanup as an example config
# module extending this mixin module in a simple way. See
# todo:link Kiba::Tms::PlacesCleanupInitial for a more complex usage with
# default overrides and custom pre/post transforms.
#
# ## Implementation details
#
# ### Define before extending this module
#
# These can be defined as Dry::Configurable settings or as public methods. The
# section below lists the method/setting name the extending module should
# respond to, each preceded by its YARD signature.
#
# ```
# # @return [Symbol] registry entry job key for the job whose output
# # will be used as the base for generating the cleanup worksheet.
# # Iterations of cleanup will be layered over this output in the
# # auto-generated. **NOTE: This job's output should include a field
# # which combines/identifies the original values that may be
# # affected by the cleanup process. The default expectation is that
# # this field is named :fingerprint, but this can be overridden by
# # defining a custom `orig_values_identifier` method in the
# # extending module after extension. This field is used as a
# # matchpoint for merging cleaned up data back into the migration,
# # and identifying whether a given value in subsequent worksheet
# # iterations has been previously included in a worksheet**
# # base_job
# #
# # @return [Array<Symbol>] tags assigned to all jobs generated by extending
# # IterativeCleanup
# # job_tags
# #
# # @return [Array<Symbol>] nil/empty fields to be added to worksheet
# # worksheet_add_fields
# #
# # @return [Array<Symbol>] order of fields (in worksheet output). Will be used
# # to set destination special options/initial headers on the worksheet job.
# # worksheet_field_order
# #
# # @return [Array<Symbol>] fields included in the fingerprint value
# # fingerprint_fields
# #
# # @return [Symbol, Array<Symbol>, nil] field or fields included in
# # the fingerprint value that should be ignored when flagging
# # changes
# # fingerprint_flag_ignore_fields
# ```
#
# ### Then, extend this module
#
# `extend Kiba::Extend::Mixins::IterativeCleanup`
#
# ### Optional settings/methods in extending module
#
# Default values for the following methods defined in this mixin
# module. If you want to override the values, define these methods
# in your config module after extending this module.
#
# - {collation_delim}
# - {orig_values_identifier}
# - {cleaned_values_identifier}
# - {cleaned_uniq_collate_fields}
#
# ## What extending this module does
#
# ### Defines settings in the extending config module
#
# These are empty settings with constructors that will use the values in a
# client-specific project config file to build the data expected for cleanup
# processing
#
# - **:provided_worksheets** - Array of filenames of cleanup
# worksheets provided to the client. Files should be listed
# oldest-to-newest. Assumes files are in the `to_client`
# subdirectory of the migration base directory. **Define actual
# values in client config file.**
## - **:returned_files** - Array of filenames of completed worksheets
# returned by client. Files should be listed oldest-to-newest.
# Assumes files are in the `supplied` subdirectory of the migration
# base directory. **Define actual values in client config file.**
#
# ### Defines methods in the extending config module
#
# See method documentation inline below.
#
# ### Prepares registry entries for iterative cleanup jobs
#
# When the application loads, {Kiba::Tms::RegistryData.register} calls
# {Kiba::Tms::Utils::IterativeCleanupJobRegistrar}. This util class calls
# the {register_cleanup_jobs} method of each config module extending this
# module, adding the cleanup jobs to the registry dynamically.
#
# The jobs themselves (i.e. the sources, lookups, transforms) are
# defined in {Kiba::Tms::Jobs::IterativeCleanup}. See that module's
# documentation for how to set up custom pre/post transforms to customize
# specific cleanup routines.
module Kiba
module Extend
module Mixins
# Mixin module for setting up iterative cleanup based on a source table.
#
# @since 4.0.0
#
# "Iterative cleanup" means the client may provide the worksheet more
# than once, or that you may need to produce a fresh worksheet for
# the client after a new database export is provided.
#
# Your project must follow some setup/configuration conventions
# in order to use this mixin:
#
# - Each cleanup process must be configured in its own config module.
# - A config module is a Ruby module that responds to `:config`.
#
# Refer to todo:link Kiba::Tms::AltNumsForObjTypeCleanup as an
# example config module extending this mixin module in a
# simple way. See todo:link Kiba::Tms::PlacesCleanupInitial
# for a more complex usage with default overrides and custom
# pre/post transforms.
#
# ## Implementation details
#
# ### Define before extending this module
#
# These can be defined as Dry::Configurable settings or as
# public methods. The section below lists the method/setting
# name the extending module should respond to, each preceded
# by its YARD signature.
#
# ```
# # @return [Symbol] registry entry job key for the job whose output
# # will be used as the base for generating the cleanup worksheet.
# # Iterations of cleanup will be layered over this output in the
# # auto-generated. **NOTE: This job's output should include a field
# # which combines/identifies the original values that may be
# # affected by the cleanup process. The default expectation is that
# # this field is named :fingerprint, but this can be overridden by
# # defining a custom `orig_values_identifier` method in the
# # extending module after extension. This field is used as a
# # matchpoint for merging cleaned up data back into the migration,
# # and identifying whether a given value in subsequent worksheet
# # iterations has been previously included in a worksheet**
# # base_job
# #
# # @return [Array<Symbol>] tags assigned to all jobs generated
# # by extending IterativeCleanup
# # job_tags
# #
# # @return [Array<Symbol>] nil/empty fields to be added to worksheet
# # worksheet_add_fields
# #
# # @return [Array<Symbol>] order of fields (in worksheet
# # output). Will be used to set destination special
# # options/initial headers on the worksheet job.
# # worksheet_field_order
# #
# # @return [Array<Symbol>] fields included in the fingerprint value
# # fingerprint_fields
# #
# # @return [Symbol, Array<Symbol>, nil] field or fields included in
# # the fingerprint value that should be ignored when flagging
# # changes
# # fingerprint_flag_ignore_fields
# ```
#
# ### Then, extend this module
#
# `extend Kiba::Extend::Mixins::IterativeCleanup`
#
# ### Optional settings/methods in extending module
#
# Default values for the following methods defined in this mixin
# module. If you want to override the values, define these methods
# in your config module after extending this module.
#
# - {collation_delim}
# - {orig_values_identifier}
# - {cleaned_values_identifier}
# - {cleaned_uniq_collate_fields}
#
# ## What extending this module does
#
# ### Defines settings in the extending config module
#
# These are empty settings with constructors that will use the
# values in a client-specific project config file to build the
# data expected for cleanup processing
#
# - **:provided_worksheets** - Array of filenames of cleanup
# worksheets provided to the client. Files should be listed
# oldest-to-newest. Assumes files are in the `to_client`
# subdirectory of the migration base directory. **Define actual
# values in client config file.**
## - **:returned_files** - Array of filenames of completed worksheets
# returned by client. Files should be listed oldest-to-newest.
# Assumes files are in the `supplied` subdirectory of the migration
# base directory. **Define actual values in client config file.**
#
# ### Defines methods in the extending config module
#
# See method documentation inline below.
#
# ### Prepares registry entries for iterative cleanup jobs
#
# When the project application loads, the method that registers
# the project's registry entries calls
# {Kiba::Extend::Utils::IterativeCleanupJobRegistrar}. This
# util class calls the {register_cleanup_jobs} method of each
# config module extending this module, adding the cleanup jobs
# to the registry dynamically.
#
# The jobs themselves (i.e. the sources, lookups, transforms)
# are defined in
# {Kiba::Extend::Mixins::IterativeCleanup::Jobs}. See that
# module's documentation for how to set up custom pre/post
# transforms to customize specific cleanup routines.
module IterativeCleanup
def self.extended(mod)
check_required_settings(mod)
Expand Down Expand Up @@ -374,7 +380,8 @@ def base_job_cleaned_job_hash(mod)
path: File.join(Kiba::Extend::Mixins::IterativeCleanup.datadir(mod),
"working", "#{mod.cleanup_base_name}_base_job_cleaned.csv"),
creator: {
callee: Kiba::Extend::Mixins::IterativeCleanup::BaseJobCleaned,
callee:
Kiba::Extend::Mixins::IterativeCleanup::Jobs::BaseJobCleaned,
args: {mod: mod}
},
tags: mod.job_tags,
Expand All @@ -388,7 +395,7 @@ def cleaned_uniq_job_hash(mod)
path: File.join(Kiba::Extend::Mixins::IterativeCleanup.datadir(mod),
"working", "#{mod.cleanup_base_name}_cleaned_uniq.csv"),
creator: {
callee: Kiba::Extend::Mixins::IterativeCleanup::CleanedUniq,
callee: Kiba::Extend::Mixins::IterativeCleanup::Jobs::CleanedUniq,
args: {mod: mod}
},
tags: mod.job_tags
Expand All @@ -401,7 +408,7 @@ def worksheet_job_hash(mod)
path: File.join(Kiba::Extend::Mixins::IterativeCleanup.datadir(mod),
"to_client", "#{mod.cleanup_base_name}_worksheet.csv"),
creator: {
callee: Kiba::Extend::Mixins::IterativeCleanup::Worksheet,
callee: Kiba::Extend::Mixins::IterativeCleanup::Jobs::Worksheet,
args: {mod: mod}
},
tags: mod.job_tags,
Expand All @@ -415,7 +422,8 @@ def returned_compiled_job_hash(mod)
path: File.join(Kiba::Extend::Mixins::IterativeCleanup.datadir(mod),
"working", "#{mod.cleanup_base_name}_returned_compiled.csv"),
creator: {
callee: Kiba::Extend::Mixins::IterativeCleanup::ReturnedCompiled,
callee:
Kiba::Extend::Mixins::IterativeCleanup::Jobs::ReturnedCompiled,
args: {mod: mod}
},
tags: mod.job_tags
Expand All @@ -428,7 +436,7 @@ def corrections_job_hash(mod)
path: File.join(Kiba::Extend::Mixins::IterativeCleanup.datadir(mod),
"working", "#{mod.cleanup_base_name}_corrections.csv"),
creator: {
callee: Kiba::Extend::Mixins::IterativeCleanup::Corrections,
callee: Kiba::Extend::Mixins::IterativeCleanup::Jobs::Corrections,
args: {mod: mod}
},
tags: mod.job_tags,
Expand Down
70 changes: 0 additions & 70 deletions lib/kiba/extend/mixins/iterative_cleanup/base_job_cleaned.rb

This file was deleted.

Loading