Skip to content

Latest commit

 

History

History
116 lines (102 loc) · 4.81 KB

extract_data_from_database.md

File metadata and controls

116 lines (102 loc) · 4.81 KB

Example 2: Fetch database data

This example shows how to fetch data from a database and write it to a CSV file. Let's say you have a mysql database with customer table and an order table. You want to generate a single file for customer's orders by joining data from both the tables.

Table 1: ORDERS

order_id customer_id order_date total_amount
101 12 12-09-2023 234
102 13 12-09-2023 198
103 12 13-09-2023 789
104 15 13-09-2023 236
105 10 15-09-2023 100

Table 2: CUSTOMERS

customer_id customer_name customer_email customer_address
10 Andrew Symonds [email protected] 1C 104, Viviana Villas, Riverdale
11 Lilly Charms [email protected] 12B, Westwoods, Hillcrest
12 Pat Ronald [email protected] C-2, Green woods, Pinehurst
13 Mathew Josh [email protected] 112, Amar Heights, Oakwood
14 Elizabeth [email protected] 23, South Park, Amsteryork
15 Rahul Bajaj [email protected] 14, Indiana Heights, New Town

In the output file we want to print customer's order details along with their name and address. So the output file will look like this:

order_id customer_id order_date total_amount customer_name customer_address
101 12 12-09-2023 234 Pat Ronald C-2, Green woods, Pinehurst
102 13 12-09-2023 198 Mathew Josh 112, Amar Heights, Oakwood
103 12 13-09-2023 789 Pat Ronald C-2, Green woods, Pinehurst
104 15 13-09-2023 236 Rahul Bajaj 14, Indiana Heights, New Town
105 10 15-09-2023 100 Andrew Symonds 1C 104, Viviana Villas, Riverdale

Configuration

The configuration file defines the database connection properties.

datasource.mysql.host=<host_ip_or_url>
datasource.mysql.user=<username>
datasource.mysql.password=<password>

Sources

For a mysql database, we declare mysql as type in sources and its properties like database and query to be executed. The following code snippet lists our example sources:

sources:
  - id: SAMPLE
    type: mysql
    database: SAMPLE_DATABASE
    query: 'select 
    o.order_id, o.customer_id, o.order_date, o.total_amount, c.customer_name, c.customer_address
    from ORDERS o inner join CUSTOMERS c on o.customer_id = c.customer_id'

Interfaces

The output generated by running InterfaceGenerator is called an interface, in our example it is a CSV file created using joining two data sets from single database we defined above. This requires defining the sources, columns required in the output, and output properties. The following code snippet lists our example interface:

interfaces:
  sample:
    sources: [ 'SAMPLE' ]
    columns:
      - src_col_name: 'order_id'
      - src_col_name: 'customer_id'
      - src_col_name: 'order_date'
      - src_col_name: 'total_amount'
      - src_col_name: 'customer_name'
      - src_col_name: 'customer_address'
    output:
      type: delimited_file
      props:
        delimiter: '|'
        path: ./test/customer_orders.csv
        header:
          type: delimited_result_header

In the above example, output properties contains type and props. The output will be written to a file called customer_orders.psv which is a pipe separated file. The file will also contain a header row, which will be just the column names in our example.

Here's the complete yaml file with interface and source definition:

interfaces:
  sample:
    sources: [ 'SAMPLE' ]
    columns:
      - src_col_name: 'order_id'
      - src_col_name: 'customer_id'
      - src_col_name: 'order_date'
      - src_col_name: 'total_amount'
      - src_col_name: 'customer_name'
      - src_col_name: 'customer_address'
    output:
      type: delimited_file
      props:
        delimiter: '|'
        path: ./test/customer_orders.csv
        header:
          type: delimited_result_header
sources:
  - id: SAMPLE
    type: mysql
    database: SAMPLE_DATABASE
    query: 'select 
    o.order_id, o.customer_id, o.order_date, o.total_amount, c.customer_name, c.customer_address
    from ORDERS o inner join CUSTOMERS c on o.customer_id = c.customer_id'