Skip to content

Commit

Permalink
Merge pull request #67 from lyrasis/methodless-job
Browse files Browse the repository at this point in the history
Flexible job creator definition in registry entry hashes
  • Loading branch information
kspurgin authored Apr 5, 2022
2 parents f33456c + 6a87214 commit 38aec68
Show file tree
Hide file tree
Showing 20 changed files with 588 additions and 55 deletions.
18 changes: 14 additions & 4 deletions CHANGELOG.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -26,18 +26,28 @@ These changes are merged into the `main` branch but have not yet been tagged as
==== Breaking

==== Added
* `Fingerprint::Add` and `Fingerprint::Decode` transforms. In https://github.com/lyrasis/kiba-extend/pull/65[PR#65]
* `override_app_delim_check` param to `Fingerprint::Add` for backward compatibility with a project I want to be able to use this transform. Defaults to `false`. https://github.com/lyrasis/kiba-extend/pull/66[PR#66]

==== Changed
* Moves `Merge::CompareFieldsFlag` to `Compare::FieldValues`. Aliases the old transform to the new one for backward compatibility, but raises deprecation warning. In https://github.com/lyrasis/kiba-extend/pull/62[PR#62]
* `Fingerprint::Decode` forces field values to UTF-8, preventing CSV write errors. In https://github.com/lyrasis/kiba-extend/pull/66[PR#66]

==== Deleted

==== To be deprecated/Will break in a future version

== Releases
=== 2.7.2 - 2022-04-05
https://github.com/lyrasis/kiba-extend/compare/v2.7.1\...v2.7.2[Compare code changes]

==== Added
* When setting up a file registry hash, `creator` may be a `Hash` if you need to pass keyword arguments to your job. See https://lyrasis.github.io/kiba-extend/file.file_registry_entry.html[File registry entry reference] for more info and examples. In https://github.com/lyrasis/kiba-extend/pull/67[PR#67]
* When setting up a file registry hash, `creator` may be a `Module` if the relevant job is a private instance method named with the configured `default_job_method_name` (The default is `:job`). See https://lyrasis.github.io/kiba-extend/file.file_registry_entry.html[File registry entry reference] for more info and examples. In https://github.com/lyrasis/kiba-extend/pull/67[PR#67]
* `default_job_method_name` config setting. In https://github.com/lyrasis/kiba-extend/pull/67[PR#67]
* `Fingerprint::Add` and `Fingerprint::Decode` transforms. In https://github.com/lyrasis/kiba-extend/pull/65[PR#65]
* `override_app_delim_check` param to `Fingerprint::Add` for backward compatibility with a project I want to be able to use this transform. Defaults to `false`. https://github.com/lyrasis/kiba-extend/pull/66[PR#66]

==== Changed
* Moves `Merge::CompareFieldsFlag` to `Compare::FieldValues`. Aliases the old transform to the new one for backward compatibility, but raises deprecation warning. In https://github.com/lyrasis/kiba-extend/pull/62[PR#62]
* `Fingerprint::Decode` forces field values to UTF-8, preventing CSV write errors. In https://github.com/lyrasis/kiba-extend/pull/66[PR#66]

=== 2.7.1 - 2022-03-10
https://github.com/lyrasis/kiba-extend/compare/v2.6.1\...v2.7.1[Compare code changes]

Expand Down
4 changes: 2 additions & 2 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PATH
remote: .
specs:
kiba-extend (2.7.1)
kiba-extend (2.7.2)
activesupport (~> 6)
amazing_print (~> 1.4)
csv (~> 3)
Expand All @@ -26,7 +26,7 @@ GEM
ast (2.4.2)
byebug (11.1.3)
coderay (1.1.3)
concurrent-ruby (1.1.9)
concurrent-ruby (1.1.10)
csv (3.2.2)
diff-lcs (1.5.0)
docile (1.4.0)
Expand Down
171 changes: 159 additions & 12 deletions doc/file_registry_entry.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,32 +14,38 @@ A file registry entry is initialized with a Hash of data about the file. This Ha

The allowable Hash keys, expected Hash value formats, and expectations about them are described below.

**`:path` [String] full or expandable relative path to the expected location of the file**
### `:path`
[String] full or expandable relative path to the expected location of the file**

* default: `nil`
* required if either `:src_class` or `:dest_class` requires a path (in `PATH_REQ`)

`:src_class` [Class] the Ruby class used to read in data
### `:src_class`
[Class] the Ruby class used to read in data

* default: value of `Kiba::Extend.source` (`Kiba::Common::Sources::CSV` unless overridden by your ETL app)
* required, but default supplied if not given

`:src_opt` [Hash] file options used when reading in source
### `:src_opt`
[Hash] file options used when reading in source

* default: value of `Kiba::Extend.csvopts`
* required, but default supplied if not given

`:dest_class` [Class] the Ruby class used to write out the data
### `:dest_class`
[Class] the Ruby class used to write out the data

* default: value of `Kiba::Extend.destination` (`Kiba::Extend::Destinations::CSV` unless overridden by your ETL app)
* required, but default supplied if not given

`:dest_opt` [Hash] file options used when writing data
### `:dest_opt`
[Hash] file options used when writing data

* default: value of `Kiba::Extend.csvopts`
* required, but default supplied if not given

`:dest_special_opts` [Hash] additional options for writing out the data
### `:dest_special_opts`
[Hash] additional options for writing out the data

* Not all destination classes support extra options. If you provide unsupported extra options, they will not be sent through to the destination class, and you will receive a warning in STDOUT. The current most common use is to define `initial_headers` (i.e. which columns should be first in file) to `Kiba::Extend::Destinations::CSV`.
* optional
Expand All @@ -52,12 +58,150 @@ reghash = {
}
```

**`:creator` [Method] Ruby method that generates this file**
### `:creator`
[Method, Module, Hash] Ruby method that generates this file

* Used to run ETL jobs to create necessary files, if said files do not exist
* required unless file is supplied
* Not required at all if file is supplied
* If the method that runs the job is a module instance method named `job`, creator value can just be the `Module` containing the `:job` method
* Otherwise, the creator value must be a `Method` (Pattern: `Class::Or::Module::ConstantName.method(:name_of_method)`)
* Sometimes you may need to call a job with arguments. This may be particularly useful if the same job logic can be reused many times with slightly different parameters. @todo: example. In this case creator may be a Hash with `callee` and `args` keys

**`:supplied` [true, false] whether the file/data is supplied from outside the ETL**
NOTE: The default value for the default job method name set in `Kiba::Extend` is `:job`. You can override this in your project's base file as follows:

Kiba::Extend.config.default_job_method_name = :whatever

#### `Module` creator example

This is valid because the default `:job` method is present in the module:

```ruby
# in job definitions
module Project
module Table
module_function

def job
Kiba::Extend::Jobs::Job.new(
...
)
end
end
end

# in file registry
reghash = {
path: '/project/working/objects_prep.csv',
creator: Project::Table
}
```

#### `Method` creator example

Default `:job` method not present (or is not the method you need to call for this job).

```ruby
# in job definitions
module Project
module Table
module_function

def prep
Kiba::Extend::Jobs::Job.new(
...
)
end
end
end

# in file registry
reghash = {
path: '/project/working/objects_prep.csv',
creator: Project::Table.method(:prep)
}
```

#### `Hash` creator example

Default `:job` method accepts keyword arguments, so creator is a `Hash` with a `Method` or `Module` (as described above) in as `callee`, and an arguments `Hash` passed in as `args`.

```ruby
# in your project's registry_data.rb
module Project
module RegistryData
module_function

def register
register_lookups
register_files
Project.registry.transform
Project.registry.freeze
end

def normalized_lookup_type(type)
type.downcase
.gsub(' ', '_')
.gsub('/', '_')
end

def register_lookups
types = [
'Accession Review Decision', 'Accession Type', 'Account Codes', 'ArchSite', 'Box', 'Budget Code',
'Building', 'CityState', 'Cleaning', 'Condition Picks', 'Contact Type', 'Count Unit', 'Creator Type',
'Cultural Affiliation', 'Department Code', 'Digitize Parameters', 'Digitizing Hardware',
'Digitizing Software', 'Disposal Type', 'Exhibit Type', 'Format/Type', 'Genre', 'Image Resolution',
'In Exhibit', 'Insured By', 'Loan Purpose', 'Material', 'Mount', 'NAGPRA Type', 'Owner Type',
'Region', 'Room', 'Server Path', 'Technique', 'Treatment', 'Value'
]

Csws.registry.namespace('lkup') do
types.each do |type|
register Project::RegistryData.normalized_lookup_type(type).to_sym, {
path: File.join(Project.datadir, 'working', "#{Project::RegistryData.normalized_lookup_type(type)}.csv"),
creator: {callee: Project::Main::Lookups::Extract, args: {type: type}},
tags: %i[lkup],
lookup_on: :lookupvalueid
}
end
end
end

def register files
...
end
end
end

# in job definitions
module Project
module Main
module Lookups
module Extract
module_function

def job(type:)
Kiba::Extend::Jobs::Job.new(
files: {
source: :lkup__prep,
destination: "lkup__#{Project::RegistryData.normalized_lookup_type(type).to_sym}".to_sym
},
transformer: xforms(type)
)
end

def xforms(type)
Kiba.job_segment do
transform FilterRows::FieldEqualTo, action: :keep, field: :lookup_type, value: type
end
end
end
end
end
end
```

### `:supplied`
[true, false] whether the file/data is supplied from outside the ETL

- default: false
- Manually set to true for:
Expand All @@ -83,16 +227,19 @@ Note the following pattern!:

Class or Module constant name + `.method` + method name **as symbol**

**`:lookup_on` [Symbol] column to use as keys in lookup table created from file data**
### `:lookup_on`
[Symbol] column to use as keys in lookup table created from file data

* required if file is used as a lookup source
* You can register the same file multiple times under different file keys with different `:lookup_on` values if you need to use the data for different lookup purposes

`:desc` [String] description of what the file is/what it is used for. Used when post-processing reports results to STDOUT
### `:desc`
[String] description of what the file is/what it is used for. Used when post-processing reports results to STDOUT

* optional

`:tags` [Array<Symbol>] list of arbitrary tags useful for categorizing data/jobs in your ETL
###`:tags`
[Array<Symbol>] list of arbitrary tags useful for categorizing data/jobs in your ETL

* optional
* If set, you can filter to run only jobs tagged with a given tag
Expand Down
1 change: 1 addition & 0 deletions lib/kiba/extend.rb
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ module Extend
setting :warning_label, default: 'KIBA WARNING', reader: true

setting :registry, default: registry, reader: true
setting :default_job_method_name, default: :job, reader: true

setting :job, reader: true do
# Whether to output results to STDOUT for debugging
Expand Down
9 changes: 9 additions & 0 deletions lib/kiba/extend/error.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# frozen_string_literal: true

module Kiba
module Extend
# Base for kiba-extend specific errors
class Error < StandardError
end
end
end
8 changes: 1 addition & 7 deletions lib/kiba/extend/jobs/reporter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -77,12 +77,6 @@ def start_label
'->Starting dependency job'
end

def creator_method_to_s
job_data.creator.to_s
.delete_prefix('#<Method: ')
.sub(/\(\) .*$/, '')
end

def desc_and_tags
parts = [job_data.desc, tags].compact
return if parts.empty?
Expand Down Expand Up @@ -111,7 +105,7 @@ def start_label
end

def start_and_def
"#{start_label}: #{job_data.key} -- defined in: #{creator_method_to_s}"
"#{start_label}: #{job_data.key} -- defined in: #{job_data.creator.to_s}"
end

def tags
Expand Down
6 changes: 4 additions & 2 deletions lib/kiba/extend/jobs/runner.rb
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,10 @@ def file_config(config)
end

def handle_requirements
[@files[:source], @files[:lookup]].compact.flatten.map(&:required).compact.each { |method| method.call }

[@files[:source], @files[:lookup]].compact.flatten.map(&:required).compact.each do |creator|
creator.call
end

check_requirements
end

Expand Down
Loading

0 comments on commit 38aec68

Please sign in to comment.