Skip to content

Commit

Permalink
Merge pull request #57 from lyrasis/delete-transform-updates
Browse files Browse the repository at this point in the history
Delete transform updates; add `Delete::EmptyFields` transform
  • Loading branch information
kspurgin authored Feb 24, 2022
2 parents 3f0a512 + 910d6e3 commit 029cad4
Show file tree
Hide file tree
Showing 26 changed files with 1,675 additions and 350 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
/.bundle/
/.yardoc
/_yardoc/
/docs
/coverage/
/pkg/
/spec/reports/
Expand Down
2 changes: 1 addition & 1 deletion .ruby-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.0.2
2.7.4
25 changes: 24 additions & 1 deletion CHANGELOG.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ toc::[]
== Unreleased
These changes are merged into the `main` branch but have not yet been tagged as a new version/release.

==== Breaking
* Changes to keyword argument names for `Delete::FieldValueIfEqualsOtherField` (in https://github.com/lyrasis/kiba-extend/pull/57[PR#57])
** `sep` becomes `delim`
** `case_sensitive` becomes `casesensitive`

==== Added
* `multival` parameter added to `Cspace::NormalizeForID` transform (in https://github.com/lyrasis/kiba-extend/pull/49[PR#49])
* new https://lyrasis.github.io/kiba-extend/Kiba/Extend/Transforms/Count/FieldValues.html[`Count::FieldValues`] transform (in https://github.com/lyrasis/kiba-extend/pull/50[PR#50])
Expand All @@ -31,14 +36,32 @@ These changes are merged into the `main` branch but have not yet been tagged as
** warns of any supplied files that do not exist (in https://github.com/lyrasis/kiba-extend/pull/54[PR#54])
** creates any reference directories that do not exist (in https://github.com/lyrasis/kiba-extend/pull/54[PR#54])
* test Clean::RegexpFindReplaceFieldVals to replace `\n` (in https://github.com/lyrasis/kiba-extend/pull/55[PR#55])
* `Helpers.empty?` method, which returns true/false for a given string value (without treating delimiter values as special) (in https://github.com/lyrasis/kiba-extend/pull/57[PR#57])
* `fields` keyword argument to `Delete::FieldsExcept`, which should be used going forward instead of `keepfields` (in https://github.com/lyrasis/kiba-extend/pull/57[PR#57])
* `nullvalue` setting to `Kiba::Extend.config`. Default value is '%NULLVALUE%' (in https://github.com/lyrasis/kiba-extend/pull/57[PR#57])
* `usenull` keyword argument to `Delete::EmptyFieldValues` (in https://github.com/lyrasis/kiba-extend/pull/57[PR#57])
* `delim` keyword argument to `Delete::EmptyFieldValues`, which should be used going forward instead of `sep` (in https://github.com/lyrasis/kiba-extend/pull/57[PR#57])
* documentation for `Delete` transforms (in https://github.com/lyrasis/kiba-extend/pull/57[PR#57])
* `Delete::BlankFields` transform (in https://github.com/lyrasis/kiba-extend/pull/57[PR#57])

==== Changed
- move/alias `Merge::CountOfMatchingRows` to `Count::MatchingRowsInLookup`(in https://github.com/lyrasis/kiba-extend/pull/50[PR#50])
* move/alias `Merge::CountOfMatchingRows` to `Count::MatchingRowsInLookup`(in https://github.com/lyrasis/kiba-extend/pull/50[PR#50])
* `Delete::FieldsExcept` can accept a single symbol as value for `fields` keyword argument (in https://github.com/lyrasis/kiba-extend/pull/57[PR#57])
* `Delete::EmptyFieldValues` will default to `Kiba::Extend.delim` as delimiter if none given explicitly (in https://github.com/lyrasis/kiba-extend/pull/57[PR#57])
* keyword argument names for `Delete::FieldValueIfEqualsOtherField` (in https://github.com/lyrasis/kiba-extend/pull/57[PR#57])
** `sep` becomes `delim`
** `case_sensitive` becomes `casesensitive`

==== Deleted
- Removed JARD as development dependency (in https://github.com/lyrasis/kiba-extend/pull/52[PR#52])
- Removed `-t` alias from `jobs:tagged_and` and `jobs:tagged_or` tasks, as they conflicted with the `-t/--tell` option (in https://github.com/lyrasis/kiba-extend/pull/56[PR#56])

==== To be deprecated/Will break in a future version
These will now give warnings if used.

- `Delete::FieldsExcept` `keepfields` keyword parameter. Change to `fields` (in https://github.com/lyrasis/kiba-extend/pull/57[PR#57])
- `Delete::EmptyFieldValues` `sep` keyword parameter. Change to `delim` (in https://github.com/lyrasis/kiba-extend/pull/57[PR#57])

== Releases
=== version - date

Expand Down
1 change: 0 additions & 1 deletion Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ group :development, :test do
gem 'rspec', '~> 3.10'
gem 'rubocop', '~> 1.18.4'
gem 'rubocop-rspec', '~> 2.4.0'
# gem 'ruby_jard'
end

group :test do
Expand Down
18 changes: 9 additions & 9 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PATH
remote: .
specs:
kiba-extend (2.5.3)
kiba-extend (2.6.0)
activesupport (~> 6)
csv (~> 3)
dry-configurable (~> 0)
Expand All @@ -15,7 +15,7 @@ PATH
GEM
remote: https://rubygems.org/
specs:
activesupport (6.1.4.1)
activesupport (6.1.4.6)
concurrent-ruby (~> 1.0, >= 1.0.2)
i18n (>= 1.6, < 2)
minitest (>= 5.1)
Expand All @@ -25,26 +25,26 @@ GEM
byebug (11.1.3)
coderay (1.1.3)
concurrent-ruby (1.1.9)
csv (3.2.0)
csv (3.2.2)
diff-lcs (1.4.4)
docile (1.4.0)
dry-configurable (0.13.0)
dry-configurable (0.14.0)
concurrent-ruby (~> 1.0)
dry-core (~> 0.6)
dry-container (0.9.0)
concurrent-ruby (~> 1.0)
dry-configurable (~> 0.13, >= 0.13.0)
dry-core (0.7.1)
concurrent-ruby (~> 1.0)
i18n (1.8.10)
i18n (1.10.0)
concurrent-ruby (~> 1.0)
kiba (4.0.0)
kiba-common (1.5.0)
kiba (>= 3.0.0, < 5)
measured (2.7.1)
measured (2.8.2)
activesupport (>= 5.2)
method_source (1.0.0)
minitest (5.14.4)
minitest (5.15.0)
parallel (1.20.1)
parser (3.0.2.0)
ast (~> 2.4.1)
Expand Down Expand Up @@ -90,12 +90,12 @@ GEM
simplecov_json_formatter (~> 0.1)
simplecov-html (0.12.3)
simplecov_json_formatter (0.1.3)
thor (1.1.0)
thor (1.2.1)
tzinfo (2.0.4)
concurrent-ruby (~> 1.0)
unicode-display_width (2.0.0)
xxhash (0.4.0)
zeitwerk (2.4.2)
zeitwerk (2.5.4)

PLATFORMS
ruby
Expand Down
4 changes: 3 additions & 1 deletion lib/kiba/extend.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
require 'kiba-common/destinations/csv'
require 'kiba-common/destinations/lambda'
require 'pry'
#require 'ruby_jard'
require 'xxhash'

require 'kiba/extend/registry/file_registry'
Expand Down Expand Up @@ -58,6 +57,9 @@ module Extend
# Example: 'a^^y;b^^z' -> [['a', 'y'], ['b', 'z']]
setting :sgdelim, default: '^^', reader: true

# Default string to be treated as though it were a null/empty value.
setting :nullvalue, default: '%NULLVALUE%', reader: true

# Default source class for jobs
setting :source, default: Kiba::Common::Sources::CSV, reader: true

Expand Down
8 changes: 4 additions & 4 deletions lib/kiba/extend/transforms/deduplicate.rb
Original file line number Diff line number Diff line change
Expand Up @@ -148,8 +148,8 @@ def process(row)
# Used in pipeline as:
#
# ```
# @deduper = {}
# transform Deduplicate::FieldValues, fields: %i[foo bar], sep: ';'
# @deduper = {}
# transform Deduplicate::FieldValues, fields: %i[foo bar], sep: ';'
# ```
#
# Results in:
Expand Down Expand Up @@ -206,8 +206,8 @@ def process(row)
# Used in pipeline as:
#
# ```
# @deduper = {}
# transform Deduplicate::Flag, on_field: :combined, in_field: :duplicate, using: @deduper
# @deduper = {}
# transform Deduplicate::Flag, on_field: :combined, in_field: :duplicate, using: @deduper
# ```
#
# Results in:
Expand Down
85 changes: 1 addition & 84 deletions lib/kiba/extend/transforms/delete.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,70 +5,7 @@ module Extend
module Transforms
# Tranformations to delete fields and field values
module Delete
::Delete = Kiba::Extend::Transforms::Delete
class EmptyFieldValues
def initialize(fields:, sep:)
@fields = [fields].flatten
@sep = sep
end

# @private
def process(row)
@fields.each do |field|
val = row.fetch(field)
row[field] = val.split(@sep).compact.reject(&:empty?).join(@sep) unless val.nil?
end
row
end
end

class Fields
def initialize(fields:)
@fields = [fields].flatten
end

# @private
def process(row)
@fields.each { |name| row.delete(name) }
row
end
end

class FieldsExcept
def initialize(keepfields:)
@fields = keepfields
end

# @private
def process(row)
deletefields = row.keys - @fields
deletefields.each { |f| row.delete(f) }
row
end
end

class FieldValueContainingString
def initialize(fields:, match:, casesensitive: true)
@fields = [fields].flatten
@match = casesensitive ? match : match.downcase
@casesensitive = casesensitive
end

# @private
def process(row)
@fields.each do |field|
exval = row.fetch(field)
if exval.nil?
# do nothing
else
exval = @casesensitive ? row.fetch(field) : row.fetch(field).downcase
row[field] = nil if exval[@match]
end
end
row
end
end

::Delete = Kiba::Extend::Transforms::Delete
class FieldValueIfEqualsOtherField
def initialize(delete:, if_equal_to:, multival: false, sep: nil, grouped_fields: [], case_sensitive: true)
@delete = delete
Expand Down Expand Up @@ -109,26 +46,6 @@ def process(row)
row
end
end

class FieldValueMatchingRegexp
def initialize(fields:, match:, casesensitive: true)
@fields = [fields].flatten
@match = casesensitive ? Regexp.new(match) : Regexp.new(match, Regexp::IGNORECASE)
end

# @private
def process(row)
@fields.each do |field|
exval = row.fetch(field)
if exval.nil?
# do nothing
elsif exval.match?(@match)
row[field] = nil
end
end
row
end
end
end
end
end
Expand Down
111 changes: 111 additions & 0 deletions lib/kiba/extend/transforms/delete/empty_field_values.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# frozen_string_literal: true

module Kiba
module Extend
module Transforms
module Delete

# @note Only useful for multi-valued fields
#
# Deletes any empty values from the field. Supports `usenull` = true to treat the value of
# `Kiba::Extend.nullvalue` as empty
#
# # Examples
#
# Assuming `Kiba::Extend.nullvalue` = `%NULLVALUE%`, and input table:
#
# ```
# | data |
# |------------------|
# | abc;;;d e f |
# | ;;abc |
# | def;;;; |
# | ;;;;; |
# | ;;;%NULLVALUE%;; |
# | |
# | nil |
# ```
#
# Used in pipeline as:
#
# ```
# transform Delete::EmptyFieldValues, fields: [:data], sep: ';'
# ```
#
# Results in:
#
# ```
# | data |
# |-------------|
# | abc;d e f |
# | abc |
# | def |
# | |
# | %NULLVALUE% |
# | |
# | nil |
# ```
#
# Used in pipeline as:
#
# ```
# transform Delete::EmptyFieldValues, fields: [:data], sep: ';', usenull: true
# ```
#
# Results in:
#
# ```
# | data |
# |-----------|
# | abc;d e f |
# | abc |
# | def |
# | |
# | |
# | |
# | nil |
# ```
#
class EmptyFieldValues
# @note `sep` will be removed in a future version. **DO NOT USE**
# @param fields [Array<Symbol>,Symbol] field(s) to delete from
# @param sep [String] **DEPRECATED; DO NOT USE**
# @param delim [String] on which to split multivalued fields. Defaults to `Kiba::Extend.delim` if not provided.
# @param usenull [Boolean] whether to treat `Kiba::Extend.nullvalue` string as an empty value
def initialize(fields:, sep: nil, delim: nil, usenull: false)
@fields = [fields].flatten
@usenull = usenull

if sep && delim
puts %Q[#{Kiba::Extend.warning_label}: Do not use both `sep` and `delim`. Prefer `delim`]
elsif sep
puts %Q[#{Kiba::Extend.warning_label}: The `sep` keyword is being deprecated in a future version. Change it to `delim` in your ETL code.]
@delim = sep
else
@delim = delim ? delim : Kiba::Extend.delim
end
end

# @private

def process(row)
fields.each do |field|
val = row.fetch(field)
next if val.nil?

row[field] = val.split(delim)
.compact
.reject{ |str| Helpers.empty?(str, usenull) }
.join(delim)
end
row
end

private

attr_reader :fields, :delim, :usenull
end
end
end
end
end
Loading

0 comments on commit 029cad4

Please sign in to comment.