Skip to content

Commit

Permalink
feat(jdbc): add jdbc input support (#88)
Browse files Browse the repository at this point in the history
* refactor(Input, ConfigurationInput):new structure

* feat(FileDateRangeInput):added

* feat(jdbc): add jdbc input

* feat(jdbc): change ReadableInput to reader and add options to jdbc reader

* scala version should be fixed to 2.11

* add file date range input type to sample yaml
  • Loading branch information
lyogev authored Aug 1, 2018
1 parent 76cafe8 commit 3f85c9f
Show file tree
Hide file tree
Showing 32 changed files with 9,336 additions and 64 deletions.
11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ You can check out a full example file for all possible values in the [sample YAM
##### Supported input/output:
Currently Metorikku supports the following inputs:
**CSV, JSON, parquet**
**CSV, JSON, parquet, JDBC**
And the following outputs:
**CSV, JSON, parquet, Redshift, Cassandra, Segment, JDBC, Kafka**<br />
Expand All @@ -68,9 +68,12 @@ There are currently 3 options to run Metorikku.
* Run the following command:
`spark-submit --class com.yotpo.metorikku.Metorikku metorikku.jar -c config.yaml`

#### *JDBC writer
When using the JDBC writer, provide the path of the driver jar in both jars and driver-class-path params. For example for Mysql:
`spark-submit --driver-class-path mysql-connector-java-5.0.8-bin.jar --jars mysql-connector-java-5.0.8-bin.jar --class com.yotpo.metorikku.Metorikku metorikku.jar -c config.yaml`
#### Using JDBC
When using JDBC writer or input you must provide a path to the driver JAR.
For example to run with spark-submit with a mysql driver:
`spark-submit --driver-class-path mysql-connector-java-5.1.45.jar --jars mysql-connector-java-5.1.45.jar --class com.yotpo.metorikku.Metorikku metorikku.jar -c config.yaml`
If you want to run this with the standalone JAR:
`java -Dspark.master=local[*] -cp metorikku-standalone.jar:mysql-connector-java-5.1.45.jar -c config.yaml`

#### JDBC query
JDBC query output allows running a query for each record in the dataframe.
Expand Down
36 changes: 22 additions & 14 deletions config/sample.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,31 @@ metrics:
- /path/to/metric-1
- /path/to/metric-2

# Parquet or JSON Tables files paths
# Input configuration
inputs:
input_1: parquet/input_1.parquet
input_2: parquet/input_2.parquet
input_3: parquet/input_3.parquet

# dateRange section allows defining dynamic date range for multiple folder names
# in case a dynamic date parameter was defined in 'inputs' section as '%s' , for example- userAggregatedData: /path/to/data/%s/
dateRange:
input_1:
format: yyyy/MM/dd
startDate: 2017/09/01
endDate: 2017/09/20
file:
path: parquet/input_1.parquet
input_2:
format: yyyy/MM/dd
startDate: 2017/09/01
endDate: 2017/09/20
file:
path: parquet/input_2.parquet
input_3:
file_date_range:
template: parquet/%s/input_1.parquet
date_range:
format: yyyy/MM/dd
startDate: 2017/09/01
endDate: 2017/09/03
input_4:
jdbc:
connectionUrl: jdbc:mysql://localhost/db?zeroDateTimeBehavior=convertToNull
user: user
password: pass
table: some_table
# You can optionally add here any supported option from https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases
options:
numPartitions: 100
driver: com.mysql.jdbc.Driver

# Set custom variables that would be accessible from the SQL
variables:
Expand Down
5,249 changes: 5,249 additions & 0 deletions examples/file_date_range_inputs/2017/09/01/movies.csv

Large diffs are not rendered by default.

2,316 changes: 2,316 additions & 0 deletions examples/file_date_range_inputs/2017/09/02/movies.csv

Large diffs are not rendered by default.

Loading

0 comments on commit 3f85c9f

Please sign in to comment.