Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DB] Import or load data from an external CSV file to a table #680

Open
authorjapps opened this issue Sep 14, 2024 · 11 comments · May be fixed by #686 or authorjapps/zerocode-tdd-docs#24
Open

[DB] Import or load data from an external CSV file to a table #680

authorjapps opened this issue Sep 14, 2024 · 11 comments · May be fixed by #686 or authorjapps/zerocode-tdd-docs#24
Assignees
Labels
feature-request New feature which can be introduced

Comments

@authorjapps
Copy link
Owner

authorjapps commented Sep 14, 2024

AC1:

Given a datafile in CSV format, matching the Database table columns,
provide mechanism to load these data(rows and columns) in the that table.

Example:

  • Table "players" has three columns as below:
ID:
NAME:
AGE:

and data in CSV is as below:

1001, Ronaldo, 23
1002, Devaldo, 24
1003, Trevaldo, 35

Note- There are no headers present in the above CSV

AC2:

Same as AC1, but with headers.

Example:

and data in CSV is as below:

ID, AGE, NAME <----------- 1st row as header
1001, 23, Ronaldo
1002, 24, Devaldo
1003, 35, Trevaldo

AC3 :

Required for Postgres DB (v9 to v12 should be supported )

AC4:

File could be present in the local files system(eg /test/resources)

optional:
(or in a git repo ==> implement this via a separate ticket.)

AC5 :

{
    "scenarioName": "Sample interaction with the db executor ( data from AC2)",
    "steps": [
        {
            "name": "Loading a csv file",
            "url": "org.jsmart.zerocode.core.db.DbSqlExecutor",
            "operation": "LOADCSV", //<--- in uppercase (for consistency), but support both cases
            "request": {
                "tableName": "players",
                "csvSource": "path/db_sample.csv",
                "withHeaders" : false, //<--false:  this is the default value most cases. (optional)
                "nullString" : "" //<-- used to insert null values, this is the default value (optional)
            },
            "verify": { 
                "size" : 36 //<-- 36 records: number of records loaded. (optional field as usual)
             } 
       },
        {
            "name": "assert_loaded_data_via_select",
            "url": "org.jsmart.zerocode.core.db.DbSqlExecutor",
            "operation": "EXECUTE", //<---- executes an in-line SQL (in this case a SELECT sql)
            "request": {
                "sql": "SELECT id, name, age FROM players WHERE age > ? ORDER BY age ASC",
                "sqlParams" : [ 23 ]
            },
            "verify": {
                "rows.SIZE": 2, //<--- Number of rows in the output (This is OOTB available, don't have to implement)
                "rows": [
                    {
                        "id": 1002,
                        "name": "Devaldo",
                        "age": 24
                    },
                    {
                        "id": 1003,
                        "name": "Trevaldo",
                        "age": 35
                    }
                ]
            }
        }
    ]
}

AC6:

From @javiertuya :
This is a proposal for discussion before starting the implementation:

  • Where to place the code? Ans: org.jsmart.zerocode.core.db package

  • Use Apache Commons DbUtils that is a very stable and lightweight library.
    This will allow to implement something that does not depend on any particular DBMS.

  • Add an executor (e.g. DbSqlExecutor) with the operation explained in the example
    as well as EXECUTE (or SELECT and UPDATE separately) to handle SQL queries with parameters

  • Tests would use H2, as it is already included as test scoped dependency.

  • Postgress requires a container(an example of Docker already present), this should be tested manually, or added to the CI pipeline if required.

More >>

  • CSV reading will be based/inspired on the domain Parametrized.java class. : What does this mean? Please explain this or point to the PR when implemented.

On the AC6 or AC5, please discuss with @javiertuya if you have any questions or suggestions.

ACRONYMS:

  • OOTB : Out Of The Box
@authorjapps authorjapps added the feature-request New feature which can be introduced label Sep 14, 2024
@nirmalchandra
Copy link
Collaborator

nirmalchandra commented Sep 15, 2024

Suggestion: (WITHOUT HEADERS)

Try this for "WITH HEADERS": (replace the relevant line)

String sql = "COPY " + tableName + " FROM STDIN WITH (FORMAT csv, HEADER true)";

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;
import java.sql.Connection;
import java.sql.SQLException;
import java.util.Objects;

import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.stereotype.Service;
import org.springframework.beans.factory.annotation.Autowired;

@Service
public class ImportCsvToPostgres {

    @Autowired
    private JdbcTemplate jdbcTemplate;

    public void importCsvToTable(String fileName, String tableName) {
        Connection connection = null;
        BufferedReader reader = null;

        // COPY command without the HEADER true option
        String sql = "COPY " + tableName + " FROM STDIN WITH (FORMAT csv)";

        try {
            // Load CSV file from resources
            ClassLoader classLoader = getClass().getClassLoader();
            reader = new BufferedReader(new InputStreamReader(
                    Objects.requireNonNull(classLoader.getResourceAsStream(fileName))));

            // Get PostgreSQL connection and execute the COPY command
            connection = jdbcTemplate.getDataSource().getConnection();
            org.postgresql.copy.CopyManager copyManager = connection.unwrap(org.postgresql.PGConnection.class).getCopyManagerAPI();
            copyManager.copyIn(sql, reader);

        } catch (IOException e) {
            throw new RuntimeException("IOException reading the file from resources: " + fileName, e);
        } catch (SQLException e) {
            throw new RuntimeException("SQLException during COPY command execution", e);
        } finally {
            // Close the resources manually //
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
            if (connection != null) {
                try {
                    connection.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}

@authorjapps
Copy link
Owner Author

@javiertuya
Copy link
Collaborator

Hi @authorjapps @nirmalchandra, I find very useful the ability to interact with the database. I will like to contribute to this issue.

I have reviewed the repository and docs to get understanding of how zerocode works as it is my first contribution. I noted that the Sample-DB-SQL-Executor depends on Spring and specific DBMSs, and other classes like org.jsmart.zerocode.zerocodejavaexec.DbSqlExecutor.java do not interact with a database. I think that a DBMS vendor independent approach would be worth.

This is a proposal for discussion before starting the implementation:

  • Use Apache Commons DbUtils that is a very stable and lightweight library.
    This will allow to implement something that does not depend on any particular DBMS.
  • Add an executor (e.g. DbSqlExecutor) whith the loadCsv operation that implements this issue,
    as well as executeQuery and executeUpdate to handle SQL queries with parameters
  • Csv reading will be based/inspired on the domain Parametrized.java class.
  • Tests would use H2, as it is already included as test scoped dependency. Postgress requires a container, this should be tested manually, or added to the pipeline if required.
  • Where to place the code? maybe in a org.jsmart.zerocode.core.db namespace?
  • Below are some examples
{
    "scenarioName": "Sample interaction with the db executor ( data from AC2)",
    "steps": [
        {
            "name": "Loading a csv file",
            "url": "org.jsmart.zerocode.core.db.DbSqlExecutor",
            "operation": "loadCsv",
            "request": {
                "tableName": "players",
                "csvSource": "path/db_sample.csv",
                "noHeaders" : false, //<-- this is the default value, guess that using header may be the most used
                "nullString" : "" //<-- used to insert null values, this is the default value
            },
            "verify": { } //<-- no data returned, log and fail the step if anything fails
       },
        {
            "name": "Read the loaded data using SQL",
            "url": "org.jsmart.zerocode.core.db.DbSqlExecutor",
            "operation": "executeQuery",
            "request": {
                "sqlStatement": "SELECT id, name, age FROM players WHERE age > ? ORDER BY age ASC",
                "sqlParameters" : [ 23 ]
            },
            "verify": {
                "rows.SIZE": 2,
                "rows": [
                    { "id": 1002, "name": "Devaldo", "age": 24 },
                    { "id": 1003, "name": "Trevaldo", "age": 35 }
                ]
            }
        }
    ]
}

@authorjapps
Copy link
Owner Author

Hi @authorjapps @nirmalchandra, I find very useful the ability to interact with the database. I will like to contribute to this issue.

I have reviewed the repository and docs to get understanding of how zerocode works as it is my first contribution. I noted that the Sample-DB-SQL-Executor depends on Spring and specific DBMSs, and other classes like org.jsmart.zerocode.zerocodejavaexec.DbSqlExecutor.java do not interact with a database. I think that a DBMS vendor independent approach would be worth.

This is a proposal for discussion before starting the implementation:

  • Use Apache Commons DbUtils that is a very stable and lightweight library.
    This will allow to implement something that does not depend on any particular DBMS.
  • Add an executor (e.g. DbSqlExecutor) whith the loadCsv operation that implements this issue,
    as well as executeQuery and executeUpdate to handle SQL queries with parameters
  • Csv reading will be based/inspired on the domain Parametrized.java class.
  • Tests would use H2, as it is already included as test scoped dependency. Postgress requires a container, this should be tested manually, or added to the pipeline if required.
  • Where to place the code? maybe in a org.jsmart.zerocode.core.db namespace?
  • Below are some examples
{
    "scenarioName": "Sample interaction with the db executor ( data from AC2)",
    "steps": [
        {
            "name": "Loading a csv file",
            "url": "org.jsmart.zerocode.core.db.DbSqlExecutor",
            "operation": "loadCsv",
            "request": {
                "tableName": "players",
                "csvSource": "path/db_sample.csv",
                "noHeaders" : false, //<-- this is the default value, guess that using header may be the most used
                "nullString" : "" //<-- used to insert null values, this is the default value
            },
            "verify": { } //<-- no data returned, log and fail the step if anything fails
       },
        {
            "name": "Read the loaded data using SQL",
            "url": "org.jsmart.zerocode.core.db.DbSqlExecutor",
            "operation": "executeQuery",
            "request": {
                "sqlStatement": "SELECT id, name, age FROM players WHERE age > ? ORDER BY age ASC",
                "sqlParameters" : [ 23 ]
            },
            "verify": {
                "rows.SIZE": 2,
                "rows": [
                    { "id": 1002, "name": "Devaldo", "age": 24 },
                    { "id": 1003, "name": "Trevaldo", "age": 35 }
                ]
            }
        }
    ]
}

@javiertuya 👋 , Thanks for the suggestions! Great thought!

All of it has great details and sounds like great Idea.
Let me capture some ACs from your proposal and put it to the main ticket description, which would be consistent with the codebase and framework.

@nirmalchandra
Copy link
Collaborator

Also, please take into consideration in this ticket(if possible or in a new ticket later), how to LOAD/IMPORT data or execute code from a private URL(unlike GitHub raw.github... which is not a public URL). That will be of great help to fit more usecaes. Hope this makes sense!

(I think better to separete out this feature into a new ticket and impl in a separte PR)

@authorjapps
Copy link
Owner Author

AC1:

Given a datafile in CSV format, matching the Database table columns, provide mechanism to load these data(rows and columns) in the that table.

Example:

  • Table "players" has three columns as below:
ID:
NAME:
AGE:

and data in CSV is as below:

1001, Ronaldo, 23
1002, Devaldo, 24
1003, Trevaldo, 35

Note- There are no headers present in the above CSV

AC2:

Same as AC1, but with headers.

Example:

and data in CSV is as below:

ID, AGE, NAME <----------- 1st row as header
1001, 23, Ronaldo
1002, 24, Devaldo
1003, 35, Trevaldo

AC3 :

Required for Postgres DB (v9 to v12 should be supported )

AC4: (optional)

File could be present in the local files system(eg /test/resources) or in a git repo. Or implement this via a separate ticket.

AC5 :

{
    "scenarioName": "Sample interaction with the db executor ( data from AC2)",
    "steps": [
        {
            "name": "Loading a csv file",
            "url": "org.jsmart.zerocode.core.db.DbSqlExecutor",
            "operation": "LOADCSV", //<--- in uppercase (for consistency), but support both cases
            "request": {
                "tableName": "players",
                "csvSource": "path/db_sample.csv",
                "withHeaders" : false, //<--false:  this is the default value most cases. (optional)
                "nullString" : "" //<-- used to insert null values, this is the default value (optional)
            },
            "verify": { 
                "size" : 36 //<-- 36 records: number of records loaded. (optional field as usual)
             } 
       },
        {
            "name": "assert_loaded_data_via_select",
            "url": "org.jsmart.zerocode.core.db.DbSqlExecutor",
            "operation": "EXECUTE", //<---- executes an in-line SQL (in this case a SELECT sql)
            "request": {
                "sql": "SELECT id, name, age FROM players WHERE age > ? ORDER BY age ASC",
                "sqlParams" : [ 23 ]
            },
            "verify": {
                "SIZE": 2, //<--- Number of rows in the output
                "rows": [
                    { "id": 1002, "name": "Devaldo", "age": 24 },
                    { "id": 1003, "name": "Trevaldo", "age": 35 }
                ]
            }
        }
    ]
}

AC6:

From @javiertuya : This is a proposal for discussion before starting the implementation:

  • Where to place the code? Ans: org.jsmart.zerocode.core.db package
  • Use Apache Commons DbUtils that is a very stable and lightweight library.
    This will allow to implement something that does not depend on any particular DBMS.
  • Add an executor (e.g. DbSqlExecutor) with the operation explained in the example
    as well as EXECUTE (or SELECT and UPDATE separately) to handle SQL queries with parameters
  • Tests would use H2, as it is already included as test scoped dependency.
  • Postgress requires a container(an example of Docker already present), this should be tested manually, or added to the CI pipeline if required.

More >>

  • CSV reading will be based/inspired on the domain Parametrized.java class. : What does this mean? Please explain this or point to the PR when implemented.

On the AC6 or AC5, please discuss with @javiertuya if you have any questions or suggestions.

@javiertuya , I have captured almost all of it now into the ticket. Please proceed with implementation when ready.
Assigned this ticket to you.

(mainly AC5, AC6 with minor updates)

@javiertuya
Copy link
Collaborator

@authorjapps Great, working on it.

More >>

  • CSV reading will be based/inspired on the domain Parametrized.java class. : What does this mean? Please explain this or point to the PR when implemented.

I was referring to using the same approach to get the content of the csvSource. This could result in some code to refactor later.

@authorjapps
Copy link
Owner Author

authorjapps commented Oct 1, 2024

@authorjapps Great, working on it.

More >>

  • CSV reading will be based/inspired on the domain Parametrized.java class. : What does this mean? Please explain this or point to the PR when implemented.

I was referring to using the same approach to get the content of the csvSource. This could result in some code to refactor later.

Oh, ok.
I think it's better to render the output "rows[]" as simple JSON like below
instead of { "id": 1002, "name": "Devaldo", "age": 24 } in the same line.

This will make the assertions mechanism very easy and natural (also no additional overhead to implement this too imo )

Also, for this you non't need to implement Parametrized etc.

"rows": [
                    {
                        "id": 1002,
                        "name": "Devaldo",
                        "age": 24
                    },
                    {
                        "id": 1003,
                        "name": "Trevaldo",
                        "age": 35
                    }
]

@javiertuya
Copy link
Collaborator

@authorjapps

"operation": "LOADCSV", //<--- in uppercase (for consistency), but support both cases

I found that the JavaMethodExecutor is case sensitive when matching the operation methods. It seems that it is a single line change. Should I submit an issue+PR, just a new PR, or include in this PR (in a separate commit)?

@authorjapps
Copy link
Owner Author

authorjapps commented Oct 3, 2024

@authorjapps

"operation": "LOADCSV", //<--- in uppercase (for consistency), but support both cases

I found that the JavaMethodExecutor is case sensitive when matching the operation methods. It seems that it is a single line change. Should I submit an issue+PR, just a new PR, or include in this PR (in a separate commit)?

You don't have to change in the JavaMethodExecutor.

You can create something like below, a passthrough mechanism in the DBExecutor which you're going to impl.
Both, with public visibility should satisfy the AC.

LOADCSV(...){
   loadcsv(...);
}

loadcsv(...){
   ...
}

@authorjapps
Copy link
Owner Author

The documentation PR is here :
authorjapps/zerocode-tdd-docs#24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature which can be introduced
Projects
None yet
3 participants