Skip to content
This repository has been archived by the owner on Apr 21, 2020. It is now read-only.

Build ka scrub types #1

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
Gemfile.lock
.idea
38 changes: 29 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,33 +16,44 @@ gem 'acts_as_scrubbable'

## Usage

Simple add the configuration for your fields that map directly to your columns
Add the configuration for your fields that map directly to your columns and a scrub_type
for those columns.

Default Scrub types include:
- `scrub` - scrub the field's value based on it's name or mapping (see mapping case below)
- `skip` - do not scrub the field's value
- `wipe` - set the field's value to nil on scrub
- `sterilize` - delete all records for this model on scrub

```ruby
class User < ActiveRecord::Base
class ScrubExample < ActiveRecord::Base
...

acts_as_scrubbable :first_name, :last_name
acts_as_scrubbable :scrub, :first_name # first_name will be random after `scrub!`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like "faker-generated" is clearer than "random". i had to look at the source code to figure that out.

acts_as_scrubbable :skip, :middle_name # middle_name will be original value after `scrub!`
Copy link

@jakeonfire jakeonfire Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"will be unchanged"?

acts_as_scrubbable :wipe, :last_name # last_name will be `nil` after `scrub!`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't love the DSL calling the same method 3 times, but otherwise the concept is sound. is there an easy way to come up with a way to create one dataset for the complete class configuration? maybe something like:

acts_as_scrubbable scrub: [:first_name, lat: :latitude],
                   skip: [:middle_name],
                   wipe: [:last_name]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, this technique would need to be paired with some validation that when any fields are configured to be scrubbed, skipped or wiped, that all fields are configured, right? otherwise 'skip' and literally skipping are the same... or is that something AnonCat would do?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, anonymizer_check!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on further thought, it'd be really cool to be able to define scrub procs here, too. e.g.:

acts_as_scrubbable scrub: [gender: -> { %w(M F).sample }]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:wipe scrub type could as easily be a :nullify scrub strategy

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fan of the array update and would push for going more in that direction.

also agree that defining scrub procs would be neat (maybe a well placed is_a?(Proc)?)

went with wipe for the theme :D, also thought it left room for fancier wiping (empty string?). Fine with nullify



# optionally you can add a scope to limit the rows to update
scope :scrubbable_scope, -> { where(some_value: true) }

...
end
```

class SterilizeExample < ActiveRecord::Base
acts_as_scrubbable :sterilize # table will contain no records after `scrub!`
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool


```

Incase the mapping is not straight forward

```ruby
class Address
acts_as_scrubbable :lng => :longitude, :lat => :latitude
acts_as_scrubbable :scrub, :lng => :longitude, :lat => :latitude
end
```


### To run

The confirmation message will be the db host
Expand Down Expand Up @@ -75,22 +86,31 @@ If you want to limit the classes you to be scrubbed you can set the `SCRUB_CLASS
rake scrub SCRUB_CLASSES=Blog,Post
```

If you want to skip the afterhook
If you want to skip the beforehook

```
rake scrub SKIP_AFTERHOOK=true
rake scrub SKIP_BEFOREHOOK=true
```

If you want to skip the afterhook

```
rake scrub SKIP_AFTERHOOK=true
```

### Extending

You may find the need to extend or add additional generators or an after_hook
You may find the need to extend or add additional generators or an before_hook/after_hook

```ruby
ActsAsScrubbable.configure do |c|
c.add :email_with_prefix, -> { "prefix-#{Faker::Internet.email}" }

c.before_hook do
puts "Running before scrub"
raise "Don't run in production" if Rails.env.production?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the env ever be production? the other PR assumes it is always development in the anonymizer_check task...
is there a way to instead look at something that is not configurable, like database name, or server, or?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on further thought, this suggestion makes no sense. it's good as-is.

end

c.after_hook do
puts "Running after commit"
ActiveRecord::Base.connection.execute("TRUNCATE some_table")
Expand Down
15 changes: 12 additions & 3 deletions lib/acts_as_scrubbable.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,18 @@ module ActsAsScrubbable
autoload :Scrub
autoload :VERSION


def self.configure(&block)
yield self
end

def self.before_hook(&block)
@before_hook = block
end

def self.execute_before_hook
@before_hook.call if @before_hook
end

def self.after_hook(&block)
@after_hook = block
end
Expand Down Expand Up @@ -45,16 +52,18 @@ def self.scrub_map
:state_abbr => -> { Faker::Address.state_abbr },
:state => -> { Faker::Address.state },
:city => -> { Faker::Address.city },
:full_address => -> { Faker::Address.full_address },
:latitude => -> { Faker::Address.latitude },
:longitude => -> { Faker::Address.longitude },
:username => -> { Faker::Internet.user_name },
:boolean => -> { [true, false ].sample },
:school => -> { Faker::University.name }
:school => -> { Faker::University.name },
:bs => -> { Faker::Company.bs },
:phone_number => -> { Faker::PhoneNumber.phone_number }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i personally found using these built-ins just some of the time seemed confusing, so didn't use them at all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after looking at the other PR i see you used :bs a bunch, so makes sense to put it either here or in the initializer config in that case.

}
end
end


ActiveSupport.on_load(:active_record) do
extend ActsAsScrubbable::Scrubbable
end
14 changes: 10 additions & 4 deletions lib/acts_as_scrubbable/scrub.rb
Original file line number Diff line number Diff line change
@@ -1,23 +1,29 @@
module ActsAsScrubbable
module Scrub

def scrub!
return unless self.class.scrubbable?

run_callbacks(:scrub) do
if self.class.sterilizable?
self.destroy!
Copy link

@jakeonfire jakeonfire Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think delete is safer, since the other updates also don't run callbacks, but now that i think about it maybe we do want them in case of dependent: :destroys? dunno...

Copy link

@jakeonfire jakeonfire Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also worried that calling destroy! on every row in a table will be pretty slow. can you think of a way to truncate it instead via SQL a-la the after-hook example in the README?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the tasks.rb:69 there's an initial check for sterilizable models that will call delete_all. you are both correct that it avoids the destroy calls and that there are dependent destroy issues. had realized it running the migration, brought it up in the story. feel like we kind of need the destroy_all b/c of the dependent destroy. side note this also creates an issue for large tables (thinking weight table of 70M records :( )

next
end

_updates = {}

scrubbable_fields.each do |_field, value|
unless self.respond_to?(_field)
raise ArgumentError, "#{self.class} do not respond to #{_field}"
end
next if self.send(_field).blank?
next if self.send(_field).blank? || value == :skip

if ActsAsScrubbable.scrub_map.keys.include?(value)
_updates[_field] = ActsAsScrubbable.scrub_map[value].call
elsif value == :wipe
_updates[_field] = nil
else
puts "Undefined scrub: #{value} for #{self.class}#{_field}"
end
puts "Undefined scrub: #{value} for #{self.class}.#{_field}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#{self.class}##{_field}" is a bit more ruby-doc

end
end

self.update_columns(_updates) unless _updates.empty?
Expand Down
25 changes: 12 additions & 13 deletions lib/acts_as_scrubbable/scrubbable.rb
Original file line number Diff line number Diff line change
@@ -1,36 +1,35 @@
module ActsAsScrubbable
module Scrubbable


def scrubbable?
false
end

def acts_as_scrubbable(scrub_type=:scrub, *scrubbable_fields, **mapped_fields)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think you can have default values to the left of var-args?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can! it just isn't respected. Was having trouble finding a good way to fit in the scrub_type while not creating too breaking of a change. I think the approach you listed at the top is better (kwargs all the way, arrays of symbols or kwargs as values

unless self.respond_to?(:scrubbable_fields)
class_attribute :scrubbable_fields
self.scrubbable_fields = {}
end

def acts_as_scrubbable(*scrubbable_fields, **mapped_fields)

class_attribute :scrubbable_fields

self.scrubbable_fields = {}
scrubbable_fields.each do |_field|
self.scrubbable_fields[_field] = _field
unless self.respond_to?(:sterilizable?)
class_attribute :sterilizable
self.sterilizable = scrub_type == :sterilize
end

mapped_fields.each do |_field|
self.scrubbable_fields[_field.first] = _field.last
scrubbable_fields.each do |field_name|
self.scrubbable_fields[field_name] = scrub_type == :scrub ? field_name : scrub_type
end

mapped_fields.each { |field_name, field_type| self.scrubbable_fields[field_name] = field_type }

class_eval do
define_callbacks :scrub

def self.scrubbable?
true
end

end

include Scrub
end

end
end
34 changes: 17 additions & 17 deletions lib/acts_as_scrubbable/tasks.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,33 +2,34 @@
require 'rake'

namespace :scrub do

desc "scrub all"
task all: :environment do

require 'highline/import'
require 'term/ansicolor'
require 'logger'
require 'parallel'


include Term::ANSIColor

@logger = Logger.new($stdout)
@logger.formatter = proc do |severity, datetime, progname, msg|
"#{datetime}: [#{severity}] - #{msg}\n"
"#{datetime}: [#{severity}] - #{msg}\n"
end

db_host = ActiveRecord::Base.connection_config[:host]
db_name = ActiveRecord::Base.connection_config[:database]
if ENV["SKIP_BEFOREHOOK"].blank?
@logger.info "Running before hook".red
ActsAsScrubbable.execute_before_hook
end

db_host, db_name = ActiveRecord::Base.connection_config.values_at(:host, :database)

@logger.warn "Please verify the information below to continue".red
@logger.warn "Host: ".red + " #{db_host}".white
@logger.warn "Database: ".red + "#{db_name}".white

unless ENV["SKIP_CONFIRM"] == "true"

answer = ask("Type '#{db_host}' to continue. \n".red + '-> '.white)

unless answer == db_host
@logger.error "exiting ...".red
exit
Expand All @@ -41,21 +42,17 @@

@total_scrubbed = 0

ar_classes = ActiveRecord::Base.descendants.select{|d| d.scrubbable? }.sort_by{|d| d.to_s }

ar_classes = ActiveRecord::Base.descendants.select(&:scrubbable?).sort_by(&:to_s)

# if the ENV variable is set

unless ENV["SCRUB_CLASSES"].blank?
class_list = ENV["SCRUB_CLASSES"].split(",")
class_list = class_list.map {|_class_str| _class_str.constantize }
ar_classes = ar_classes & class_list
ar_classes &= class_list.map {|_class_str| _class_str.constantize }
end

@logger.info "Srubbable Classes: #{ar_classes.join(', ')}".white

Parallel.each(ar_classes) do |ar_class|

# Removing any find or initialize callbacks from model
ar_class.reset_callbacks(:initialize)
ar_class.reset_callbacks(:find)
Expand All @@ -65,10 +62,12 @@
scrubbed_count = 0

ActiveRecord::Base.connection_pool.with_connection do
if ar_class.respond_to?(:scrubbable_scope)
relation = ar_class.send(:scrubbable_scope)
else
relation = ar_class.all
relation = ar_class.respond_to?(:scrubbable_scope) ? ar_class.send(:scrubbable_scope) : ar_class.all

if relation.sterilizable?
scrubbed_count += relation.count
relation.delete_all
next
end

relation.find_in_batches(batch_size: 1000) do |batch|
Expand All @@ -83,6 +82,7 @@

@logger.info "#{scrubbed_count} #{ar_class} objects scrubbed".blue
end

ActiveRecord::Base.connection.verify!

if ENV["SKIP_AFTERHOOK"].blank?
Expand Down
6 changes: 5 additions & 1 deletion spec/db/schema.rb
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
ActiveRecord::Schema.define(version: 20150421224501) do

create_table "scrubbable_models", force: true do |t|
t.string "first_name"
t.string "middle_name"
t.string "last_name"
t.string "address1"
t.string "lat"
end

create_table "sterilizable_models", force: true do |t|
t.string "irrelevant"
end
end
27 changes: 25 additions & 2 deletions spec/lib/acts_as_scrubbable/scrub_spec.rb
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
require 'spec_helper'

RSpec.describe ActsAsScrubbable::Scrub do

describe '.scrub' do

# update_columns cannot be run on a new record
subject{ ScrubbableModel.new }

before(:each) { subject.save }

it 'changes the first_name attribute when scrub is run' do
Expand All @@ -31,10 +30,34 @@
expect(subject.address1).to be_nil
end

it "doesn't update the field if the scrub type is `:skip`" do
subject.middle_name = "Edward"
subject.save
subject.scrub!
expect(subject.middle_name).to eq "Edward"
end

it "updates the field to nil if the scrub type is :wipe" do
subject.last_name = "Cortéz"
subject.save
subject.scrub!
expect(subject.last_name).to be_nil
end

it 'runs scrub callbacks' do
subject.scrub!
expect(subject.scrubbing_begun).to be(true)
expect(subject.scrubbing_finished).to be(true)
end

context 'when sterilizable? is true' do
subject { SterilizableModel.new }

it 'deletes all records' do
subject.save
subject.scrub!
expect(subject.class.all).to be_empty
end
end
end
end
Loading