-
Notifications
You must be signed in to change notification settings - Fork 8
Connecting to native data
Anton Ivanov edited this page Feb 17, 2023
·
12 revisions
To start working on ETL, you need to scan a native database or load a scan report generated by White Rabbit (if you have one) WhiteRabbit.
The typical sequence:
- Connect to the source database (you need to input the credentials of the database) or delimited text file and click "Test Connection".
The user can operate with the following data types: delimited text files, SAS files, MySQL, SQL Server, Oracle, PostgreSQL, Microsoft Access, Amazon RedShift, PDW, Teradata, Google BigQuery, Azure SQL Database, Databricks.
- Select the tables to be scanned.
- Upload the scan report to Perseus or download it to a local machine.
Scan options:
- "Scan field values" - defines raw data items to be analyzed within the tables selected for a scan (i.e. if the user selects Table A, the scan will analyze the contents of each Table A column). Uncheck "Scan field values" in order NOT TO analyze and NOT TO report on any of the raw data items.
- "Min cell count" - By default, this parameter is set to 5, meaning the source data values that are repeated less than 5 times will not be considered in the report.
- "Rows per table" - By default, scan will randomize 100,000 rows in the table. Options to review 500,000, 1,000,000 or all the rows within the table are also available to the user.
- "Max distinct values" - By default, this parameter is set to 1,000, meaning that the scan report will contain maximum of 1,000 distinct values per field. This option can also be set to 100, 1,000 or 10,000 distinct values.
- "Numeric stats" - if checked, the set of statistics will be calculated for all integer, real and date data types.