From 021b5ef901716cb84856ecf0ee4b6f04ffcaeba0 Mon Sep 17 00:00:00 2001 From: Jonathan Rosenberg <96974219+Jonathan-Rosenberg@users.noreply.github.com> Date: Sun, 29 Oct 2023 17:17:50 +0200 Subject: [PATCH] Glue Metastore catalog: Docs- rephrasing some sections (#6898) --- docs/howto/hooks/lua.md | 2 +- docs/integrations/glue_metastore.md | 67 ++++++++++++++++++----------- 2 files changed, 43 insertions(+), 26 deletions(-) diff --git a/docs/howto/hooks/lua.md b/docs/howto/hooks/lua.md index aba2cd045ca..e6220e18fc5 100644 --- a/docs/howto/hooks/lua.md +++ b/docs/howto/hooks/lua.md @@ -420,7 +420,7 @@ Parameters: - `glue`: AWS glue client - `db(string)`: glue database name - `table_src_path(string)`: path to table spec (i.e _lakefs_tables/my_table.yaml) -- `create_table_input(Table)`: Input equal mapping to [table_input](https://docs.aws.amazon.com/glue/latest/webapAPI_CreateTable.html#API_CreateTable_RequestSyntax) in AWS, the same as we use for `glue.create_table`. +- `create_table_input(Table)`: Input equal mapping to [table_input](https://docs.aws.amazon.com/glue/latest/webapi/API_CreateTable.html#API_CreateTable_RequestSyntax) in AWS, the same as we use for `glue.create_table`. should contain inputs describing the data format (i.e InputFormat, OutputFormat, SerdeInfo) since the exporter is agnostic to this. by default this function will configure table location and schema. - `action_info(Table)`: the global action object. diff --git a/docs/integrations/glue_metastore.md b/docs/integrations/glue_metastore.md index 0884ba106c2..d11fece88d2 100644 --- a/docs/integrations/glue_metastore.md +++ b/docs/integrations/glue_metastore.md @@ -14,7 +14,7 @@ redirect_from: /using/glue_metastore.html The integration between Glue and lakeFS is based on [Data Catalog Exports]({% link howto/catalog_exports.md %}). -This guide will show you how to use lakeFS with the Glue Data Catalog. +This guide describes how to use lakeFS with the Glue Data Catalog. You'll be able to query your lakeFS data by specifying the repository, branch and commit in your SQL query. Currently, only read operations are supported on the tables. You will set up the automation required to work with lakeFS on top of the Glue Data Catalog, including: @@ -22,7 +22,7 @@ You will set up the automation required to work with lakeFS on top of the Glue D 2. Write an exporter script that will: * Mirror your branch's state into [Hive Symlink](https://svn.apache.org/repos/infra/websites/production/hive/content/javadocs/r2.1.1/api/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.html) files readable by Athena. * Export the table descriptors from your branch to the Glue Catalog. -3. Set up lakeFS [hooks]({% link howto/catalog_exports.md %}#running-an-exporter) that will run the above script when specific events occur. +3. Set up lakeFS [hooks]({% link howto/catalog_exports.md %}#running-an-exporter) to trigger the above script when specific events occur. ## Example: Using Athena to query lakeFS data @@ -32,9 +32,10 @@ Before starting, make sure you have: 1. An active lakeFS installation with S3 as the backing storage, and a repository in this installation. 2. A database in Glue Data Catalog (lakeFS does not create one). 3. AWS Credentials with permission to manage Glue, Athena Query and S3 access. + ### Add table descriptor -Let's define a table and commit to lakeFS. +Let's define a table, and commit it to lakeFS. Save the YAML below as `animals.yaml` and upload it to lakeFS. ```bash @@ -70,8 +71,8 @@ schema: ### Write some table data -Insert data under the table path, using your preferred method (i.e [Spark]({% link integrations/spark.md %})) and commit the data when done.. -In this example we used CSV and the files added to lakeFS should look something like this: +Insert data into the table path, using your preferred method (e.g. [Spark]({% link integrations/spark.md %})), and commit upon completion. +This example uses CSV files, and the files added to lakeFS should look like this: ![lakeFS Uploaded CSV Files]({{ site.baseurl }}/assets/img/csv_export_hooks_data.png) @@ -104,8 +105,8 @@ local res = glue_exporter.export_glue(glue, db, table_path, table_input, action, ### Configure Action Hooks -The hooks are the mechanism that will trigger exporter execution. -To learn more about how to configure exporter hooks read [Running an Exporter]({% link howto/catalog_exports.md %}#running-an-exporter). +Hooks serve as the mechanism that triggers the execution of the exporter. +For more detailed information on how to configure exporter hooks, you can refer to [Running an Exporter]({% link howto/catalog_exports.md %}#running-an-exporter). {: .note} > The `args.catalog.table_input` argument in the Lua script is assumed to be passed from the action arguments, that way the same script can be reused for different tables. Check the [example]({% link howto/hooks/lua.md %}#lakefscatalogexportglue_exporterexport_glueglue-db-table_src_path-create_table_input-action_info-options) to construct the table input in the lua code. @@ -119,7 +120,7 @@ To learn more about how to configure exporter hooks read [Running an Exporter]({