Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unity table registration docs: table metadata #7531

Merged
merged 1 commit into from
Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions docs/howto/hooks/lua.md
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ local client = databricks.client("https://my-host.cloud.databricks.com", "my-ser
local schema_name = client.create_schema("main", "mycatalog", true)
```

### `databricks/client.register_external_table(table_name, physical_path, warehouse_id, catalog_name, schema_name)`
### `databricks/client.register_external_table(table_name, physical_path, warehouse_id, catalog_name, schema_name, metadata)`

Registers an external table under the provided warehouse ID, catalog name, and schema name.
In order for this method call to succeed, an external location should be configured in the catalog, with the
Expand All @@ -299,6 +299,8 @@ Parameters:
`Connection Details`, or by running `databricks warehouses get`, choosing your SQL warehouse and fetching its ID).
- `catalog_name(string)`: The name of the catalog under which a schema will be created (or fetched from).
- `schema_name(string)`: The name of the schema under which the table will be created.
- `metadata(table)`: A table of metadata to be added to the table's registration. The metadata table should be of the form:
`{key1 = "value1", key2 = "value2", ...}`.

Example:

Expand Down Expand Up @@ -730,7 +732,7 @@ Parameters:

A package used to register exported Delta Lake tables to Databricks' Unity catalog.

### `lakefs/catalogexport/unity_exporter.register_tables(action, table_descriptors_path, delta_table_paths, databricks_client, warehouse_id)`
### `lakefs/catalogexport/unity_exporter.register_tables(action, table_descriptors_path, delta_table_details, databricks_client, warehouse_id)`

The function used to register exported Delta Lake tables in Databricks' Unity Catalog.
The registration will use the following paths to register the table:
Expand All @@ -741,7 +743,7 @@ Parameters:

- `action(table)`: The global action table
- `table_descriptors_path(string)`: The path under which the table descriptors of the provided `table_paths` reside.
- `delta_table_paths(table)`: Table names to physical paths mapping (e.g. `{ table1 = "s3://mybucket/mytable1", table2 = "s3://mybucket/mytable2" }`)
- `delta_table_details(table)`: Table names to physical paths mapping and table metadata (e.g. `{table1 = {path = "s3://mybucket/mytable1", metadata = {id = "table_1_id", name = "table1", ...}}, table2 = {path = "s3://mybucket/mytable2", metadata = {id = "table_2_id", name = "table2", ...}}}`.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope I don't understand this: right now it looks like we broke backwards compatibility of Lua scripts earlier, and now we're documenting it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You understand correctly.
The change was made at #7527.
What would you suggest on doing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a minor version bump, I would expect the older form (string that is a path) to remain valid, maybe deprecated, and the new form also to be allowed. Once that happened documentation would pretty much solve itself, something like:

Suggested change
- `delta_table_details(table)`: Table names to physical paths mapping and table metadata (e.g. `{table1 = {path = "s3://mybucket/mytable1", metadata = {id = "table_1_id", name = "table1", ...}}, table2 = {path = "s3://mybucket/mytable2", metadata = {id = "table_2_id", name = "table2", ...}}}`.)
- `delta_table_details(table)`: Map of table names to physical paths as a string, or to physical paths and table metadata as a JSON object (e.g. `{table1 = {path = "s3://mybucket/mytable1", metadata = {id = "table_1_id", name = "table1", ...}}, table2 = "s3://mybucket/mytable2"}}`.)

It might be water under the bridge, particularly if this feature is new enough and has not had many users.

Copy link
Contributor Author

@Jonathan-Rosenberg Jonathan-Rosenberg Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be water under the bridge, particularly if this feature is new enough and has not had many users.

I'm very positive that this is the case. If anything, the usage of this script is mainly as part of the Unity Catalog registration flow, and it won't be necessary to change the existing code (not breaking the flow of using the output of the delta exporter)

- `databricks_client(table)`: A Databricks client that implements `create_or_get_schema: function(id, catalog_name)` and `register_external_table: function(table_name, physical_path, warehouse_id, catalog_name, schema_name)`
- `warehouse_id(string)`: Databricks warehouse ID.

Expand Down
4 changes: 2 additions & 2 deletions docs/integrations/unity-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,11 +132,11 @@ local sc = aws.s3_client(args.aws.access_key_id, args.aws.secret_access_key, arg

-- Export Delta Lake tables export:
local delta_client = formats.delta_client(args.lakefs.access_key_id, args.lakefs.secret_access_key, args.aws.region)
local delta_table_locations = delta_export.export_delta_log(action, args.table_defs, sc.put_object, delta_client, "_lakefs_tables")
local delta_table_details = delta_export.export_delta_log(action, args.table_defs, sc.put_object, delta_client, "_lakefs_tables")

-- Register the exported table in Unity Catalog:
local databricks_client = databricks.client(args.databricks_host, args.databricks_token)
local registration_statuses = unity_export.register_tables(action, "_lakefs_tables", delta_table_locations, databricks_client, args.warehouse_id)
local registration_statuses = unity_export.register_tables(action, "_lakefs_tables", delta_table_details, databricks_client, args.warehouse_id)

for t, status in pairs(registration_statuses) do
print("Unity catalog registration for table \"" .. t .. "\" completed with commit schema status : " .. status .. "\n")
Expand Down
4 changes: 2 additions & 2 deletions pkg/actions/lua/lakefs/catalogexport/unity_exporter.lua
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,15 @@ local extractor = require("lakefs/catalogexport/table_extractor")

Returns a "<table name>: status" map for registration of provided tables.
]]
local function register_tables(action, table_descriptors_path, delta_table_paths, databricks_client, warehouse_id)
local function register_tables(action, table_descriptors_path, delta_table_details, databricks_client, warehouse_id)
local repo = action.repository_id
local commit_id = action.commit_id
if not commit_id then
error("missing commit id")
end
local branch_id = action.branch_id
local response = {}
for table_name_yaml, table_details in pairs(delta_table_paths) do
for table_name_yaml, table_details in pairs(delta_table_details) do
local tny = table_name_yaml
if not strings.has_suffix(tny, ".yaml") then
tny = tny .. ".yaml"
Expand Down
Loading