diff --git a/mojo-db-udf/README.md b/mojo-db-udf/README.md index e69de29..6a9ee92 100644 --- a/mojo-db-udf/README.md +++ b/mojo-db-udf/README.md @@ -0,0 +1,103 @@ +# DriverlessAI Mojo +Some customers need to select rows from a Database, score and write back the predictions. + +Before using DAI, they would convert a model to SQL and execute it, the time to write the SQL was long. + +Using DAI and then this program enables them to quickly build and use models. + +![DAIMojoRunner_DB.png](./docs/img/DAIMojoRunner_DB.png) + +## Distribution +Both the compiled distribution and source are available. + +Available 'as-is' in [Distribution](Distribution) + +See the source directory for the code [Source](Source). + + +## Execution +The execution is simple: + +java -cp /PostgresData/postgresql-42.2.5.jar:dist/DAIMojoRunner_DB.jar daimojorunner_db.DAIMojoRunner_DB + +This is a generic JDBC approach, that allows any JDBC type 4 driver to be specified on the java classpath. The DAIMojoRunner_DB.properties file then defines what driver to use. + +You must have a DAI license to score, add the license in one of the standard ways, as a parameter, as a environment vailable for example: + + -Dai.h2o.mojos.runtime.license.file=/Driverless-AI/license/license.sig + +## Configuration + + +- Support encrypted password + + Store the encrypted password in the DAIMojoRunner_DB.properties + + Encrypt the password from Linux command line: (type in the password): + + echo h2oh2o | base64 -i – + + Encrypt the password from Windows PowerShell (password passed as h2oh2o) +[convert]::toBASe64String([Text.Encoding]::UTF8.GetBytes("h2oh2o")) + + Save the encrypted string to the SQLPassword parameter + +- Prompting for password + Set SQLPrompt= Enables prompting for the password in the DAIMojoRunner_DB.properties + + Add the username to the SQLUser line make sure SQLPassword, is set SQLPassword= +Set SQLPrompt=true + +- Support returning the single or multiple class predictions. +SQLKey is the DB key to use on the update statement +If SQLPrediction= then the returned model class names are used and have to match the DB schema. + + For example if the target variable is Model_Prediction and it’s a binomial classification the model returns Model_Prediction.0 and Model_Prediction.1 these two columns would need to exist in the DB. + + If SQLPrediction=string then that is the name of the column the Model_Prediction will be updated. + +- Handles nulls in the columns +Automatic detection + +- Logging option +To track the execution, add -Dlogging=true to the java command line, this will write the original “row : target prediction” as it runs. + +## Parameters +ModelName=pipeline.mojo +SQLConnectionString= jdbc:sqlserver://:;databaseName=;user=;password= +SQLUser= +SQLPassword= +SQLKey= +SQLPrediction= +SQLSelect=select statement +SQLWrite=update set = where = + +### Example +ModelName=pipeline.mojo +SQLConnectionString=jdbc:postgresql://192.168.1.171:5432/LendingClub +SQLUser=postgres +SQLPassword=aDJvaDJvCg== +SQLPrompt= +SQLKey=id +SQLPrediction= +SQLSelect=select id, loan_amnt, term, int_rate, installment, emp_length, home_ownership, annual_inc, verification_status, addr_state, dti, delinq_2yrs, inq_last_6mths, pub_rec, revol_bal, revol_util, total_acc from "import".loanstats4 +SQLWrite=update "import".loanstats4 set where id= + + The above will score using pipeline.mojo and use the key 'id' to update the column name from the mojo into the table, to bad_loan.0 and bad_loan.1 must be in the table schema. + + If SQLPrediction was set to ModelPrediction (for example) then bad_loan.0 would be written to the table column ModelPrediction. + +## Performance + The driver starts a thread pre core, but you can override this with the -Dthreads= option and -Dcapacity= is the queue size. + + On a MacBook to a remote DB, I have seen it achieve 814TPS when setting the threads to 4x the cores and the capacity to 2x the threads setting. + +## Testing +This has been used with a Postgres DB and Microsoft SQL Server 12 + + + + +## Questions +Feel free to email me and I will try to help + diff --git a/mojo-db-udf/build.gradle b/mojo-db-udf/build.gradle index 76250dd..de54e25 100644 --- a/mojo-db-udf/build.gradle +++ b/mojo-db-udf/build.gradle @@ -34,12 +34,24 @@ dependencies { application { mainClassName = "ai.h2o.mojos.db.MojoDbScorer" + applicationName = "dbscorer" } installDist { into file("${projectDir}/dist/${project.name}") } +distributions { + main { + contents { + from('config') { + into 'config' + } + } + } + +} + clean { delete file("${projectDir}/dist") } diff --git a/mojo-db-udf/config/lending_club.properties b/mojo-db-udf/config/lending_club_pg.conf similarity index 90% rename from mojo-db-udf/config/lending_club.properties rename to mojo-db-udf/config/lending_club_pg.conf index 2976a68..0f0023d 100644 --- a/mojo-db-udf/config/lending_club.properties +++ b/mojo-db-udf/config/lending_club_pg.conf @@ -10,8 +10,8 @@ mojo-db-scoring-app { user = "postgres" // User password password = "aDJvaDJvCg==" - // Password prompt in case the password is not specified - prompt = "" + // Ask for password + prompt = false } sql { diff --git a/mojo-db-udf/docs/img/DAIMojoRunner_DB.png b/mojo-db-udf/docs/img/DAIMojoRunner_DB.png new file mode 100644 index 0000000..72767bf Binary files /dev/null and b/mojo-db-udf/docs/img/DAIMojoRunner_DB.png differ