diff --git a/notebooks/managing-stage-files-with-fusion-sql/notebook.ipynb b/notebooks/managing-stage-files-with-fusion-sql/notebook.ipynb index 613a7e1..2703e22 100644 --- a/notebooks/managing-stage-files-with-fusion-sql/notebook.ipynb +++ b/notebooks/managing-stage-files-with-fusion-sql/notebook.ipynb @@ -19,53 +19,55 @@ { "attachments": {}, "cell_type": "markdown", + "id": "c33d5542", "metadata": {}, "source": [ "Fusion SQL can be used to manage your workspace groups and workspaces, but it\n", "can also be used to upload, download, and manage files in your workspace group\n", - "Stage. We'll show you how to work with files in Stage in this notebook." - ], - "id": "c33d5542" + "or starter workspace Stage. We'll show you how to work with files in Stage in\n", + "this notebook." + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "0afd983c", "metadata": {}, "source": [ "## Displaying the Stage Fusion SQL commands\n", "\n", "The `SHOW FUSION COMMANDS` displays the commands that are handled by the Fusion\n", "engine. You can use the `LIKE` to filter the commands." - ], - "id": "0afd983c" + ] }, { "cell_type": "code", "execution_count": 1, + "id": "13eb0b1f", "metadata": {}, "outputs": [], "source": [ "commands = %sql SHOW FUSION COMMANDS LIKE '%stage%'\n", "for cmd in commands:\n", " print(*cmd, '\\n')" - ], - "id": "13eb0b1f" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "91485576", "metadata": {}, "source": [ "## Creating a workspace group\n", "\n", "We'll start by creating a workspace group. We can get a region in the US by using the `SHOW REGIONS`\n", "command and the `random` package." - ], - "id": "91485576" + ] }, { "cell_type": "code", "execution_count": 2, + "id": "4ef2337d", "metadata": {}, "outputs": [], "source": [ @@ -76,23 +78,23 @@ "\n", "region_id = random.choice(us_regions).ID\n", "region_id" - ], - "id": "4ef2337d" + ] }, { "cell_type": "code", "execution_count": 3, + "id": "b00cf79d", "metadata": {}, "outputs": [], "source": [ "wg_name = 'Fusion Notebook'\n", "password = secrets.token_urlsafe(20) + '-x&'" - ], - "id": "b00cf79d" + ] }, { "cell_type": "code", "execution_count": 4, + "id": "ea62190e", "metadata": {}, "outputs": [], "source": [ @@ -100,31 +102,31 @@ "CREATE WORKSPACE GROUP '{{ wg_name }}'\n", " IN REGION ID '{{ region_id }}' WITH PASSWORD '{{ password }}'\n", " WITH FIREWALL RANGES '0.0.0.0/0'" - ], - "id": "ea62190e" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "0567f05d", "metadata": {}, "source": [ "## Uploading and downloading Stage files\n", "\n", "Uploading and downloading files to your Stage is easy with Fusion SQL. The commands are shown below.\n", "```\n", - "DOWNLOAD STAGE FILE '' [ IN GROUP { ID '' | '' } ] [ TO '' ]\n", + "DOWNLOAD STAGE FILE '' [ IN { ID '' | '' } ] [ TO '' ]\n", " [ OVERWRITE ] [ ENCODING '' ];\n", "\n", - "UPLOAD FILE TO STAGE '' [ IN GROUP { ID '' | '' } ] FROM '' [ OVERWRITE ];\n", + "UPLOAD FILE TO STAGE '' [ IN { ID '' | '' } ] FROM '' [ OVERWRITE ];\n", "```\n", "\n", "First we'll create a data file locally that we can work with." - ], - "id": "0567f05d" + ] }, { "cell_type": "code", "execution_count": 5, + "id": "44f5d066", "metadata": {}, "outputs": [], "source": [ @@ -134,103 +136,103 @@ "Joe,32,70\n", "Max,44,69\n", "Ann,33,64" - ], - "id": "44f5d066" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "b333c9e8", "metadata": {}, "source": [ "We can now upload our data file to our workspace group Stage." - ], - "id": "b333c9e8" + ] }, { "cell_type": "code", "execution_count": 6, + "id": "63ffcdad", "metadata": {}, "outputs": [], "source": [ "%%sql\n", - "UPLOAD FILE TO STAGE 'stats.csv' IN GROUP '{{ wg_name }}' FROM 'mydata.csv'" - ], - "id": "63ffcdad" + "UPLOAD FILE TO STAGE 'stats.csv' IN '{{ wg_name }}' FROM 'mydata.csv'" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "a5fd7a60", "metadata": {}, "source": [ "We can list the files in a Stage with the `SHOW STAGE FILES` command." - ], - "id": "a5fd7a60" + ] }, { "cell_type": "code", "execution_count": 7, + "id": "5bd84f4e", "metadata": {}, "outputs": [], "source": [ "%%sql\n", - "SHOW STAGE FILES IN GROUP '{{ wg_name }}'" - ], - "id": "5bd84f4e" + "SHOW STAGE FILES IN '{{ wg_name }}'" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "ebc693df", "metadata": {}, "source": [ "Downloading the file is just as easy as uploading." - ], - "id": "ebc693df" + ] }, { "cell_type": "code", "execution_count": 8, + "id": "43c7827d", "metadata": {}, "outputs": [], "source": [ "%%sql\n", - "DOWNLOAD STAGE FILE 'stats.csv' IN GROUP '{{ wg_name }}' TO 'stats.csv' OVERWRITE" - ], - "id": "43c7827d" + "DOWNLOAD STAGE FILE 'stats.csv' IN '{{ wg_name }}' TO 'stats.csv' OVERWRITE" + ] }, { "cell_type": "code", "execution_count": 9, + "id": "ee3c4d33", "metadata": {}, "outputs": [], "source": [ "!cat stats.csv" - ], - "id": "ee3c4d33" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "f5e5c776", "metadata": {}, "source": [ "If you just want to display the contents of the Stage file without saving it to a local\n", "file, you simply leave the `TO` option off the `DOWNLOAD STAGE FILE`." - ], - "id": "f5e5c776" + ] }, { "cell_type": "code", "execution_count": 10, + "id": "60984ea0", "metadata": {}, "outputs": [], "source": [ "%%sql\n", - "DOWNLOAD STAGE FILE 'stats.csv' IN GROUP '{{ wg_name }}' ENCODING 'utf-8'" - ], - "id": "60984ea0" + "DOWNLOAD STAGE FILE 'stats.csv' IN '{{ wg_name }}' ENCODING 'utf-8'" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "ac909bca", "metadata": {}, "source": [ "## Creating folders\n", @@ -238,7 +240,7 @@ "Up to this point we have just worked with files at the root of our Stage. We can use Fusion SQL\n", "to create folder structures as well. This is done with the `CREATE STAGE FOLDER` command.\n", "```\n", - "CREATE STAGE FOLDER '' [ IN GROUP { ID '' | '' } ] [ OVERWRITE ];\n", + "CREATE STAGE FOLDER '' [ IN { ID '' | '' } ] [ OVERWRITE ];\n", "```\n", "\n", "The following code will create this folder structure:\n", @@ -248,129 +250,129 @@ "project-2/\n", "project-2/data/\n", "```" - ], - "id": "ac909bca" + ] }, { "cell_type": "code", "execution_count": 11, + "id": "f5fac755", "metadata": {}, "outputs": [], "source": [ "for name in ['project-1', 'project-1/data', 'project-2', 'project-2/data']:\n", - " %sql CREATE STAGE FOLDER '{{ name }}' IN GROUP '{{ wg_name }}';" - ], - "id": "f5fac755" + " %sql CREATE STAGE FOLDER '{{ name }}' IN '{{ wg_name }}';" + ] }, { "cell_type": "code", "execution_count": 12, + "id": "cc98d1f2", "metadata": {}, "outputs": [], "source": [ "%%sql\n", - "SHOW STAGE FILES IN GROUP '{{ wg_name }}' RECURSIVE" - ], - "id": "cc98d1f2" + "SHOW STAGE FILES IN '{{ wg_name }}' RECURSIVE" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "79703772", "metadata": {}, "source": [ "Now that we have a folder structure we can put files into those folders." - ], - "id": "79703772" + ] }, { "cell_type": "code", "execution_count": 13, + "id": "5fc4df07", "metadata": {}, "outputs": [], "source": [ "%%sql\n", - "UPLOAD FILE TO STAGE 'project-1/data/stats.csv' IN GROUP '{{ wg_name }}' FROM 'mydata.csv';\n", - "UPLOAD FILE TO STAGE 'project-2/data/stats.csv' IN GROUP '{{ wg_name }}' FROM 'mydata.csv';" - ], - "id": "5fc4df07" + "UPLOAD FILE TO STAGE 'project-1/data/stats.csv' IN '{{ wg_name }}' FROM 'mydata.csv';\n", + "UPLOAD FILE TO STAGE 'project-2/data/stats.csv' IN '{{ wg_name }}' FROM 'mydata.csv';" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "eaca1ab2", "metadata": {}, "source": [ "Now when we do a recursive listing of our Stage, we'll see the newly created files." - ], - "id": "eaca1ab2" + ] }, { "cell_type": "code", "execution_count": 14, + "id": "9261cffa", "metadata": {}, "outputs": [], "source": [ "%%sql\n", - "SHOW STAGE FILES IN GROUP '{{ wg_name }}' RECURSIVE" - ], - "id": "9261cffa" + "SHOW STAGE FILES IN '{{ wg_name }}' RECURSIVE" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "0290c32a", "metadata": {}, "source": [ "We can list the files at a specific path as well." - ], - "id": "0290c32a" + ] }, { "cell_type": "code", "execution_count": 15, + "id": "e439eee3", "metadata": {}, "outputs": [], "source": [ "%%sql\n", - "SHOW STAGE FILES IN GROUP '{{ wg_name }}' AT 'project-2/data'" - ], - "id": "e439eee3" + "SHOW STAGE FILES IN '{{ wg_name }}' AT 'project-2/data'" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "883a9656", "metadata": {}, "source": [ "## Loading data from Stage\n", "\n", "We are going to load data from a Stage into a database table. For this, we need to\n", "have a workspace and a database." - ], - "id": "883a9656" + ] }, { "cell_type": "code", "execution_count": 16, + "id": "b3c4e207", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "CREATE WORKSPACE 'stage-loader' IN GROUP '{{ wg_name }}' WITH SIZE 'S-00' WAIT ON ACTIVE" - ], - "id": "b3c4e207" + ] }, { "cell_type": "code", "execution_count": 17, + "id": "36d0e56b", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SHOW WORKSPACES IN GROUP 'Fusion Notebook'" - ], - "id": "36d0e56b" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "43d88cde", "metadata": {}, "source": [ "
\n", @@ -380,23 +382,23 @@ "

Make sure to select the stage-loader workspace from the drop-down menu at the top of this notebook.

\n", "
\n", "" - ], - "id": "43d88cde" + ] }, { "cell_type": "code", "execution_count": 18, + "id": "c97e381f", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "CREATE DATABASE IF NOT EXISTS stage_loader" - ], - "id": "c97e381f" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "3deb2065", "metadata": {}, "source": [ "
\n", @@ -407,12 +409,12 @@ " It updates the connection_url to connect to that database.

\n", "
\n", "" - ], - "id": "3deb2065" + ] }, { "cell_type": "code", "execution_count": 19, + "id": "28c4dab8", "metadata": {}, "outputs": [], "source": [ @@ -423,21 +425,21 @@ " age INT,\n", " height INT\n", ");" - ], - "id": "28c4dab8" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "62ac5718", "metadata": {}, "source": [ "Load the data from the Stage using a pipeline." - ], - "id": "62ac5718" + ] }, { "cell_type": "code", "execution_count": 20, + "id": "cbef048d", "metadata": {}, "outputs": [], "source": [ @@ -453,24 +455,24 @@ " FORMAT CSV;\n", "START PIPELINE stage_test FOREGROUND;\n", "DROP PIPELINE stage_test;" - ], - "id": "cbef048d" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "f87ce322", "metadata": {}, "source": [ "We can now query the table and select the output into a Stage. Note that the\n", "`GROUP BY 1` is used here to combine the outputs from all of the database partitions\n", "into a single file. If you don't use that, you'll get multiple output files,\n", "each with a portion of the result set." - ], - "id": "f87ce322" + ] }, { "cell_type": "code", "execution_count": 21, + "id": "e04ea9c9", "metadata": {}, "outputs": [], "source": [ @@ -478,34 +480,34 @@ "SELECT * FROM stats GROUP BY 1 INTO STAGE 'project-3/data/stats.csv'\n", " FIELDS TERMINATED BY ','\n", " LINES TERMINATED BY '\\n'" - ], - "id": "e04ea9c9" + ] }, { "cell_type": "code", "execution_count": 22, + "id": "4cf83faf", "metadata": {}, "outputs": [], "source": [ "%%sql\n", - "SHOW STAGE FILES IN GROUP '{{ wg_name }}' AT 'project-3' RECURSIVE" - ], - "id": "4cf83faf" + "SHOW STAGE FILES IN '{{ wg_name }}' AT 'project-3' RECURSIVE" + ] }, { "cell_type": "code", "execution_count": 23, + "id": "b41add98", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "DOWNLOAD STAGE FILE 'project-3/data/stats.csv' ENCODING 'utf-8'" - ], - "id": "b41add98" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "11ec3a9f", "metadata": {}, "source": [ "## Deleting Stage files and folders\n", @@ -513,93 +515,93 @@ "Files and folders can be deleted from a workspace Stage as well.\n", "This is done with the `DROP STAGE FILE` and `DROP STAGE FOLDER` commands.\n", "```\n", - "DROP STAGE FILE '' [ IN GROUP { ID '' | '' } ];\n", + "DROP STAGE FILE '' [ IN { ID '' | '' } ];\n", "\n", - "DROP STAGE FOLDER '' [ IN GROUP { ID '' | '' } ] [ RECURSIVE ];\n", + "DROP STAGE FOLDER '' [ IN { ID '' | '' } ] [ RECURSIVE ];\n", "```\n", "\n", "Let's delete the `stats.csv` file at the root of our Stage." - ], - "id": "11ec3a9f" + ] }, { "cell_type": "code", "execution_count": 24, + "id": "058ab079", "metadata": {}, "outputs": [], "source": [ "%%sql\n", - "DROP STAGE FILE 'stats.csv' IN GROUP '{{ wg_name }}'" - ], - "id": "058ab079" + "DROP STAGE FILE 'stats.csv' IN '{{ wg_name }}'" + ] }, { "cell_type": "code", "execution_count": 25, + "id": "96516b35", "metadata": {}, "outputs": [], "source": [ "%%sql\n", - "SHOW STAGE FILES IN GROUP '{{ wg_name }}'" - ], - "id": "96516b35" + "SHOW STAGE FILES IN '{{ wg_name }}'" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "2e95a34e", "metadata": {}, "source": [ "Now let's delete the `project-2` folder including all of the files in it." - ], - "id": "2e95a34e" + ] }, { "cell_type": "code", "execution_count": 26, + "id": "112632ed", "metadata": {}, "outputs": [], "source": [ "%%sql\n", - "DROP STAGE FOLDER 'project-2' IN GROUP '{{ wg_name }}' RECURSIVE" - ], - "id": "112632ed" + "DROP STAGE FOLDER 'project-2' IN '{{ wg_name }}' RECURSIVE" + ] }, { "cell_type": "code", "execution_count": 27, + "id": "58410c8c", "metadata": {}, "outputs": [], "source": [ "%%sql\n", - "SHOW STAGE FILES IN GROUP '{{ wg_name }}' RECURSIVE" - ], - "id": "58410c8c" + "SHOW STAGE FILES IN '{{ wg_name }}' RECURSIVE" + ] }, { "cell_type": "code", "execution_count": 28, + "id": "0c1291e3", "metadata": {}, "outputs": [], "source": [ "%%sql\n", - "DROP STAGE FOLDER 'project-1' IN GROUP '{{ wg_name }}' RECURSIVE;\n", - "DROP STAGE FOLDER 'project-3' IN GROUP '{{ wg_name }}' RECURSIVE;" - ], - "id": "0c1291e3" + "DROP STAGE FOLDER 'project-1' IN '{{ wg_name }}' RECURSIVE;\n", + "DROP STAGE FOLDER 'project-3' IN '{{ wg_name }}' RECURSIVE;" + ] }, { "attachments": {}, "cell_type": "markdown", + "id": "9708218d", "metadata": {}, "source": [ "## Conclusion\n", "\n", "We have demonstrated how to create and delete files and folders in a workspace group Stage\n", - "using Fusion SQL. It is also possible to work with Stage files using the SingleStoreDB\n", - "Python SDK, see the [API documentation](https://singlestoredb-python.labs.singlestore.com/api.html#stage)\n", + "using Fusion SQL. Note that it also supports managing Stage for starter workspaces. It is\n", + "also possible to work with Stage files using the SingleStoreDB Python SDK, see the\n", + "[API documentation](https://singlestoredb-python.labs.singlestore.com/api.html#stage)\n", "for more details." - ], - "id": "9708218d" + ] }, { "id": "43ecb8fb",