This example shows how to fetch data from a database and write it to a CSV file. Let's say you have a mysql database with customer table and an order table. You want to generate a single file for customer's orders by joining data from both the tables.
Table 1: ORDERS
order_id | customer_id | order_date | total_amount |
---|---|---|---|
101 | 12 | 12-09-2023 | 234 |
102 | 13 | 12-09-2023 | 198 |
103 | 12 | 13-09-2023 | 789 |
104 | 15 | 13-09-2023 | 236 |
105 | 10 | 15-09-2023 | 100 |
Table 2: CUSTOMERS
customer_id | customer_name | customer_email | customer_address |
---|---|---|---|
10 | Andrew Symonds | [email protected] | 1C 104, Viviana Villas, Riverdale |
11 | Lilly Charms | [email protected] | 12B, Westwoods, Hillcrest |
12 | Pat Ronald | [email protected] | C-2, Green woods, Pinehurst |
13 | Mathew Josh | [email protected] | 112, Amar Heights, Oakwood |
14 | Elizabeth | [email protected] | 23, South Park, Amsteryork |
15 | Rahul Bajaj | [email protected] | 14, Indiana Heights, New Town |
In the output file we want to print customer's order details along with their name and address. So the output file will look like this:
order_id | customer_id | order_date | total_amount | customer_name | customer_address |
---|---|---|---|---|---|
101 | 12 | 12-09-2023 | 234 | Pat Ronald | C-2, Green woods, Pinehurst |
102 | 13 | 12-09-2023 | 198 | Mathew Josh | 112, Amar Heights, Oakwood |
103 | 12 | 13-09-2023 | 789 | Pat Ronald | C-2, Green woods, Pinehurst |
104 | 15 | 13-09-2023 | 236 | Rahul Bajaj | 14, Indiana Heights, New Town |
105 | 10 | 15-09-2023 | 100 | Andrew Symonds | 1C 104, Viviana Villas, Riverdale |
The configuration file defines the database connection properties.
datasource.mysql.host=<host_ip_or_url>
datasource.mysql.user=<username>
datasource.mysql.password=<password>
For a mysql database, we declare mysql
as type in sources and its properties like database
and query
to be executed. The
following code snippet lists our example sources:
sources:
- id: SAMPLE
type: mysql
database: SAMPLE_DATABASE
query: 'select
o.order_id, o.customer_id, o.order_date, o.total_amount, c.customer_name, c.customer_address
from ORDERS o inner join CUSTOMERS c on o.customer_id = c.customer_id'
The output generated by running InterfaceGenerator is called an interface, in our example it is a CSV file created using joining two data sets from single database we defined above. This requires defining the sources, columns required in the output, and output properties. The following code snippet lists our example interface:
interfaces:
sample:
sources: [ 'SAMPLE' ]
columns:
- src_col_name: 'order_id'
- src_col_name: 'customer_id'
- src_col_name: 'order_date'
- src_col_name: 'total_amount'
- src_col_name: 'customer_name'
- src_col_name: 'customer_address'
output:
type: delimited_file
props:
delimiter: '|'
path: ./test/customer_orders.csv
header:
type: delimited_result_header
In the above example, output properties contains type
and props
.
The output will be written to a file called customer_orders.psv
which is a pipe separated file. The file will also
contain a header row, which will be just the column names in our example.
Here's the complete yaml file with interface and source definition:
interfaces:
sample:
sources: [ 'SAMPLE' ]
columns:
- src_col_name: 'order_id'
- src_col_name: 'customer_id'
- src_col_name: 'order_date'
- src_col_name: 'total_amount'
- src_col_name: 'customer_name'
- src_col_name: 'customer_address'
output:
type: delimited_file
props:
delimiter: '|'
path: ./test/customer_orders.csv
header:
type: delimited_result_header
sources:
- id: SAMPLE
type: mysql
database: SAMPLE_DATABASE
query: 'select
o.order_id, o.customer_id, o.order_date, o.total_amount, c.customer_name, c.customer_address
from ORDERS o inner join CUSTOMERS c on o.customer_id = c.customer_id'