diff --git a/notebooks/load-json-files-s3/meta.toml b/notebooks/load-json-files-s3/meta.toml index 7f3a525c..c82e5c0d 100644 --- a/notebooks/load-json-files-s3/meta.toml +++ b/notebooks/load-json-files-s3/meta.toml @@ -1,9 +1,11 @@ [meta] title="Load JSON files with Pipeline from S3" -description="This notebook will help you load JSON files from a public open AWS S3 bucket. You will see two modes: -*) where you map the JSON elements to columns in a relational table -*) where you just ingest all documents ito a JSON column. In that mode we also show how you can use persisted computed column for extracting JSON fields -" +description="""\ + This notebook will help you load JSON files from a public open + AWS S3 bucket. You will see two modes: + *) where you map the JSON elements to columns in a relational table + *) where you just ingest all documents ito a JSON column. In that mode we also show how you can use persisted computed column for extracting JSON fields +""" icon="chart-network" tags=["pipeline", "json", "s3"] -destinations=["spaces"] \ No newline at end of file +destinations=["spaces"] diff --git a/notebooks/load-json-files-s3/notebook.ipynb b/notebooks/load-json-files-s3/notebook.ipynb index a8067ff3..9f0ad7ff 100644 --- a/notebooks/load-json-files-s3/notebook.ipynb +++ b/notebooks/load-json-files-s3/notebook.ipynb @@ -1 +1,445 @@ -{"cells":[{"cell_type":"markdown","id":"deb8dbf4-2368-41b4-9f09-b14c96ccb344","metadata":{"language":"sql"},"source":"
DATABASE_NAME | \nPIPELINE_NAME | \nERROR_UNIX_TIMESTAMP | \nERROR_TYPE | \nERROR_CODE | \nERROR_MESSAGE | \nERROR_KIND | \nSTD_ERROR | \nLOAD_DATA_LINE | \nLOAD_DATA_LINE_NUMBER | \nBATCH_ID | \nERROR_ID | \nBATCH_SOURCE_PARTITION_ID | \nBATCH_EARLIEST_OFFSET | \nBATCH_LATEST_OFFSET | \nHOST | \nPORT | \nPARTITION | \n
---|
name | \nage | \nborn_at | \nBirthdate | \nphoto | \nwife | \nweight | \nhaschildren | \nhasGreyHair | \nchildren | \n
---|---|---|---|---|---|---|---|---|---|
Robert Downey Jr. | \n53 | \nNew York City, NY | \nApril 4, 1965 | \nhttps://jsonformatter.org/img/Robert-Downey-Jr.jpg | \nSusan Downey | \n77.1 | \n1 | \n0 | \n['Indio Falconer', 'Avri Roel', 'Exton Elias'] | \n
Tom Cruise | \n56 | \nSyracuse, NY | \nJuly 3, 1962 | \nhttps://jsonformatter.org/img/tom-cruise.jpg | \nNone | \n67.5 | \n1 | \n0 | \n['Suri', 'Isabella Jane', 'Connor'] | \n
DATABASE_NAME | \nPIPELINE_NAME | \nERROR_UNIX_TIMESTAMP | \nERROR_TYPE | \nERROR_CODE | \nERROR_MESSAGE | \nERROR_KIND | \nSTD_ERROR | \nLOAD_DATA_LINE | \nLOAD_DATA_LINE_NUMBER | \nBATCH_ID | \nERROR_ID | \nBATCH_SOURCE_PARTITION_ID | \nBATCH_EARLIEST_OFFSET | \nBATCH_LATEST_OFFSET | \nHOST | \nPORT | \nPARTITION | \n
---|
json_data | \n
---|
{'Birthdate': 'July 3, 1962', 'Born At': 'Syracuse, NY', 'age': 56, 'children': ['Suri', 'Isabella Jane', 'Connor'], 'hasChildren': True, 'hasGreyHair': False, 'name': 'Tom Cruise', 'photo': 'https://jsonformatter.org/img/tom-cruise.jpg', 'weight': 67.5, 'wife': None} | \n
{'Birthdate': 'April 4, 1965', 'Born At': 'New York City, NY', 'age': 53, 'children': ['Indio Falconer', 'Avri Roel', 'Exton Elias'], 'hasChildren': True, 'hasGreyHair': False, 'name': 'Robert Downey Jr.', 'photo': 'https://jsonformatter.org/img/Robert-Downey-Jr.jpg', 'weight': 77.1, 'wife': 'Susan Downey'} | \n
{'Birthdate': 'April 4, 1965', 'Born At': 'New York City, NY', 'age': 53, 'children': ['Indio Falconer', 'Avri Roel', 'Exton Elias'], 'hasChildren': True, 'hasGreyHair': False, 'name': 'Robert Downey Jr.', 'photo': 'https://jsonformatter.org/img/Robert-Downey-Jr.jpg', 'weight': 77.1, 'wife': 'Susan Downey'} | \n
{'Birthdate': 'July 3, 1962', 'Born At': 'Syracuse, NY', 'age': 56, 'children': ['Suri', 'Isabella Jane', 'Connor'], 'hasChildren': True, 'hasGreyHair': False, 'name': 'Tom Cruise', 'photo': 'https://jsonformatter.org/img/tom-cruise.jpg', 'weight': 67.5, 'wife': None} | \n
Note
\nFor that tutorial, we recommend using workspace of size S4 to ingest data faster and also see the difference and gain you can get from a distributed architecture.
\nAction Required
\nMake sure to select the s2_tpch_unoptimized database from the drop-down menu at the top of this notebook.\n It updates the connection_url to connect to that database.
\nAction Required
\nMake sure to select the s2_tpch_optimized database from the drop-down menu at the top of this notebook.\n It updates the connection_url to connect to that database.
\nNote
\n", + "For that tutorial, we recommend using workspace of size S4 to ingest data faster and also see the difference and gain you can get from a distributed architecture.
\n", + "Action Required
\n", + "Make sure to select the s2_tpch_unoptimized database from the drop-down menu at the top of this notebook.\n", + " It updates the connection_url to connect to that database.
\n", + "Action Required
\n", + "Make sure to select the s2_tpch_optimized database from the drop-down menu at the top of this notebook.\n", + " It updates the connection_url to connect to that database.
\n", + "