Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make row validation optional, parse XLSX files, add worksheet column to Excel parser #1

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tylergannon
Copy link

hi, added the following features.

  • CSV and Excel parser can now accept :validate_rows as a parameter to their source object. If this is turned off, then the process won't die when a row has the wrong number of fields.
    • This is because in my application it's pretty annoying to have the whole process die: I have varied result sets in my input files and would rather validate and exclude rows then to have to preprocess my inputs before giving them to the ETL.
  • Added in a dependency to Roo, which is under more active development than the Spreadsheet gem and can parse XLSX files (big bonus for me)
  • If the excel parser is given worksheet_column: :column_name in the definition configuration, the excel parser will add the name of the worksheet to each row in a field by that name.
    • This is great for me since I have a multitude of sheets where the name of the sheet differentiates the data logically.

My change to ETL::Engine is for compatibility with activerecord 3.x, which won't allow mass assignment to fields that have not been declared as accessible in the model. I didn't add any attr_accessible calls on the model because I don't want to mess with backward compatibility.

Lastly, I am not having an easy time with getting the test suite to work. I'm happy to do the leg work in getting these changes properly tested but I wanted to see if you're interested in these changes -- and if so, whether you wouldn't mind helping me out with the test environment.

I have some cool new DSL I've developed, and if I can get my test environment working I'll pull it out of my application and send another pull request to see about adding it to the gem.

Cheers!
Tyler

Configuration of excel parser can now include:

    :worksheet_column => {{column_name}}

Resulting output has {{column_name}} field with the current worksheet
name in each row.
@thbar
Copy link
Member

thbar commented Jun 24, 2013

Hello Tyler!

I took quite a bit to answer, I was very busy launching https://www.wisecashhq.com.

I'm sorry this was not clear, but this specific repository is experimental and is not really maintained!

The maintained repo is here:

https://github.com/activewarehouse/activewarehouse-etl

I'm not sure how much work this is to adapt things to over there (and some of it is probably already integrated, like the mass assignment issue).

What do you think? Would you like to adapt the code so I merge it over there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants