Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta Lake Catalog Exporter #7078

Merged
merged 38 commits into from
Dec 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
c446901
delta client impl
Jonathan-Rosenberg Nov 23, 2023
8e8ddca
delta exporter
Jonathan-Rosenberg Nov 23, 2023
8bb3f1c
Use the earliest available version to build the Delta log
Jonathan-Rosenberg Nov 27, 2023
5adf15d
pass listen address to hooks
Jonathan-Rosenberg Nov 27, 2023
37adb94
add stat_object api call to the lakefs lua client
Jonathan-Rosenberg Nov 27, 2023
cd7b8b6
add get_repo to eventually fetch its storage namespace
Jonathan-Rosenberg Nov 27, 2023
c3bc982
delta lake exporter example
Jonathan-Rosenberg Nov 29, 2023
8a81e4e
add error return. remove redundant lines. add godocs for some funcitons
Jonathan-Rosenberg Nov 29, 2023
c4f520a
extract get_storage_namespace to the internal lua file.
Jonathan-Rosenberg Nov 29, 2023
b2b81aa
remove comment
Jonathan-Rosenberg Nov 29, 2023
587ed18
go mod tidy
Jonathan-Rosenberg Nov 29, 2023
bde6581
pass aws properties directly to fetchS3Table
Jonathan-Rosenberg Nov 29, 2023
4df6b5a
json encoding refactor
Jonathan-Rosenberg Nov 29, 2023
c49c42e
fix load test
Jonathan-Rosenberg Nov 29, 2023
e5013b9
fix tests
Jonathan-Rosenberg Nov 29, 2023
7318c64
fix tests
Jonathan-Rosenberg Nov 29, 2023
2bac82e
goimports
Jonathan-Rosenberg Nov 29, 2023
9bd7a76
add indentation to lua test
Jonathan-Rosenberg Nov 30, 2023
855a443
remove storage_utils.lua
Jonathan-Rosenberg Nov 30, 2023
74eb76b
Rename `ListeningAddress` (and `listeningAddress`) to `ServerAddress`.
Jonathan-Rosenberg Nov 30, 2023
0866a3c
fix test
Jonathan-Rosenberg Nov 30, 2023
d0bb96e
lint
Jonathan-Rosenberg Nov 30, 2023
e5ca6b8
pass a delta client to the export_delta function
Jonathan-Rosenberg Nov 30, 2023
0b6853d
upgrade aws-sdk-v2
Jonathan-Rosenberg Nov 30, 2023
a028ac8
Merge branch 'master' into feature/catalog-export/delta
Jonathan-Rosenberg Dec 3, 2023
e32dc16
go mod tidy
Jonathan-Rosenberg Dec 3, 2023
aa476a8
use get_storage_prefix from the internal package. Remove it from docs
Jonathan-Rosenberg Dec 3, 2023
0bae4ba
remove get_storage_namespace and the `get_repo` method from lakeFS cl…
Jonathan-Rosenberg Dec 3, 2023
a603196
use get_storage_prefix from the internal package
Jonathan-Rosenberg Dec 3, 2023
9e522f5
use correct error when returning
Jonathan-Rosenberg Dec 3, 2023
178107d
comment the delta_log_entry_key_generator function and return a compl…
Jonathan-Rosenberg Dec 3, 2023
a687c91
use a writer_client function instead of a storage client when passed …
Jonathan-Rosenberg Dec 3, 2023
98c3ac2
updated version of delta go
Jonathan-Rosenberg Dec 3, 2023
590f2e3
severAddress -> lakeFSAddr + comment
Jonathan-Rosenberg Dec 3, 2023
9de27d8
Merge branch 'master' into feature/catalog-export/delta
Jonathan-Rosenberg Dec 6, 2023
d9cde83
change delta client builder API and drop storage type (since always u…
Jonathan-Rosenberg Dec 6, 2023
dd4c81d
rename the lakeFS address field and add clarifying comment
Jonathan-Rosenberg Dec 6, 2023
2b9f8a2
Merge branch 'master' into feature/catalog-export/delta
Jonathan-Rosenberg Dec 6, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cmd/lakefs/cmd/run.go
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ var runCmd = &cobra.Command{
idGen,
bufferedCollector,
actions.Config(cfg.Actions),
cfg.ListenAddress,
)

// wire actions into entry catalog
Expand Down
7 changes: 0 additions & 7 deletions docs/howto/hooks/lua.md
Original file line number Diff line number Diff line change
Expand Up @@ -403,13 +403,6 @@ local s3 = aws.s3_client(args.aws.aws_access_key_id, args.aws.aws_secret_access_
exporter.export_s3(s3, args.table_descriptor_path, action, {debug=true})
```

### `lakefs/catalogexport/symlink_exporter.get_storage_uri_prefix(storage_ns, commit_id, action_info)`

Generate prefix for Symlink file(s) structure that represents a `ref` and a `commit` in lakeFS.
The output pattern `${storage_ns}_lakefs/exported/${ref}/${commit_id}/`.
The `ref` is deduced from the action event in `action_info` (i.e branch name).


### `lakefs/catalogexport/glue_exporter`

A Package for automating the export process from lakeFS stored tables into Glue catalog.
Expand Down
25 changes: 25 additions & 0 deletions examples/hooks/delta_lake_S3_export.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
--[[
action_args:
- repo
- commit_id

export_delta_args:
- table_paths: ["path/to/table1", "path/to/table2", ...]
- lakefs_key
- lakefs_secret
- region

storage_client:
- put_object: function(bucket, key, data)
]]
local aws = require("aws")
local formats = require("formats")
local delta_export = require("lakefs/catalogexport/delta_exporter")

local sc = aws.s3_client(args.aws.aws_access_key_id, args.aws.aws_secret_access_key, args.aws.aws_region)
Isan-Rivkin marked this conversation as resolved.
Show resolved Hide resolved

local delta_client = formats.delta_client(args.lakefs.access_key_id, args.lakefs.secret_access_key, args.aws.aws_region)
local delta_table_locations = delta_export.export_delta_log(action, args.table_paths, sc.put_object, delta_client)
for t, loc in pairs(delta_table_locations) do
print("Delta Lake exported table \"" .. t .. "\"'s location: " .. loc .. "\n")
end
118 changes: 70 additions & 48 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,23 @@ module github.com/treeverse/lakefs
go 1.21

require (
cloud.google.com/go v0.110.8 // indirect
cloud.google.com/go/storage v1.33.0
cloud.google.com/go v0.111.0 // indirect
cloud.google.com/go/storage v1.35.1
github.com/apache/thrift v0.19.0
github.com/cockroachdb/pebble v0.0.0-20230106151110-65ff304d3d7a
github.com/cubewise-code/go-mime v0.0.0-20200519001935-8c5762b177d8
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc
github.com/deepmap/oapi-codegen v1.5.6
github.com/dgraph-io/ristretto v0.1.1
github.com/fsnotify/fsnotify v1.6.0
github.com/fsnotify/fsnotify v1.7.0
github.com/getkin/kin-openapi v0.53.0
github.com/go-chi/chi/v5 v5.0.10
github.com/go-openapi/swag v0.19.14
github.com/go-test/deep v1.1.0
github.com/gobwas/glob v0.2.3
github.com/golang/mock v1.6.0
github.com/golang/protobuf v1.5.3 // indirect
github.com/google/uuid v1.3.1
github.com/google/uuid v1.4.0
github.com/hashicorp/go-multierror v1.1.1
github.com/hnlq715/golang-lru v0.3.0
github.com/jamiealquiza/tachymeter v2.0.0+incompatible
Expand All @@ -45,42 +45,43 @@ require (
github.com/vbauerster/mpb/v5 v5.4.0
github.com/xitongsys/parquet-go v1.6.2
github.com/xitongsys/parquet-go-source v0.0.0-20230607234618-40034c8066df
golang.org/x/crypto v0.14.0
golang.org/x/oauth2 v0.13.0
golang.org/x/term v0.13.0
google.golang.org/api v0.147.0
golang.org/x/crypto v0.16.0
golang.org/x/oauth2 v0.15.0
golang.org/x/term v0.15.0
google.golang.org/api v0.152.0
google.golang.org/protobuf v1.31.0
gopkg.in/natefinch/lumberjack.v2 v2.0.0
gopkg.in/yaml.v3 v3.0.1
)

require (
cloud.google.com/go/compute v1.23.0 // indirect
golang.org/x/sync v0.4.0
cloud.google.com/go/compute v1.23.3 // indirect
golang.org/x/sync v0.5.0
)

require (
cloud.google.com/go/compute/metadata v0.2.3
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.8.0
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.9.0
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.4.0
github.com/Azure/azure-sdk-for-go/sdk/data/azcosmos v0.3.6
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.2.0
github.com/IBM/pgxpoolprometheus v1.1.1
github.com/Shopify/go-lua v0.0.0-20221004153744-91867de107cf
github.com/alitto/pond v1.8.3
github.com/antonmedv/expr v1.15.3
github.com/aws/aws-sdk-go-v2 v1.23.4
github.com/aws/aws-sdk-go-v2/config v1.25.10
github.com/aws/aws-sdk-go-v2/credentials v1.16.8
github.com/aws/aws-sdk-go-v2 v1.23.5
github.com/aws/aws-sdk-go-v2/config v1.25.11
github.com/aws/aws-sdk-go-v2/credentials v1.16.9
github.com/aws/aws-sdk-go-v2/feature/dynamodb/attributevalue v1.12.7
github.com/aws/aws-sdk-go-v2/feature/dynamodb/expression v1.6.7
github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.15.3
github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.15.4
github.com/aws/aws-sdk-go-v2/service/dynamodb v1.26.1
github.com/aws/aws-sdk-go-v2/service/glue v1.71.1
github.com/aws/aws-sdk-go-v2/service/s3 v1.47.1
github.com/aws/aws-sdk-go-v2/service/sts v1.26.1
github.com/aws/aws-sdk-go-v2/service/s3 v1.47.2
github.com/aws/aws-sdk-go-v2/service/sts v1.26.2
github.com/aws/smithy-go v1.18.1
github.com/benburkert/dns v0.0.0-20190225204957-d356cf78cdfc
github.com/csimplestring/delta-go v0.0.0-20231105162402-9b93ca02cedf
github.com/dgraph-io/badger/v4 v4.2.0
github.com/georgysavva/scany/v2 v2.0.0
github.com/go-co-op/gocron v1.35.2
Expand All @@ -98,54 +99,74 @@ require (
)

require (
cloud.google.com/go/iam v1.1.2 // indirect
cloud.google.com/go/iam v1.1.5 // indirect
github.com/Azure/azure-sdk-for-go v68.0.0+incompatible // indirect
github.com/Azure/azure-sdk-for-go/sdk/internal v1.3.0 // indirect
github.com/AzureAD/microsoft-authentication-library-for-go v1.1.1 // indirect
github.com/Azure/azure-sdk-for-go/sdk/internal v1.5.0 // indirect
github.com/Azure/go-autorest v14.2.0+incompatible // indirect
github.com/Azure/go-autorest/autorest/to v0.4.0 // indirect
github.com/AzureAD/microsoft-authentication-library-for-go v1.2.0 // indirect
github.com/ahmetb/go-linq/v3 v3.2.0 // indirect
github.com/apache/arrow/go/arrow v0.0.0-20200730104253-651201b0f516 // indirect
github.com/aws/aws-sdk-go v1.48.11 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.5.3 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.14.8 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.2.7 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.5.7 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.14.9 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.2.8 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.5.8 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.7.1 // indirect
github.com/aws/aws-sdk-go-v2/internal/v4a v1.2.7 // indirect
github.com/aws/aws-sdk-go-v2/internal/v4a v1.2.8 // indirect
github.com/aws/aws-sdk-go-v2/service/dynamodbstreams v1.18.1 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.10.3 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.2.7 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.2.8 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.8.7 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.10.7 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.16.7 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.18.1 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.21.1 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.10.8 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.16.8 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.18.2 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.21.2 // indirect
github.com/barweiss/go-tuple v1.1.2 // indirect
github.com/benbjohnson/clock v1.3.0 // indirect
github.com/deckarep/golang-set/v2 v2.5.0 // indirect
github.com/fraugster/parquet-go v0.12.0 // indirect
github.com/getsentry/sentry-go v0.16.0 // indirect
github.com/go-sql-driver/mysql v1.7.0 // indirect
github.com/golang-jwt/jwt/v5 v5.0.0 // indirect
github.com/golang/glog v1.1.0 // indirect
github.com/go-logr/logr v1.3.0 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/golang-jwt/jwt/v5 v5.2.0 // indirect
github.com/golang/glog v1.1.2 // indirect
github.com/google/flatbuffers v2.0.0+incompatible // indirect
github.com/google/s2a-go v0.1.7 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.1 // indirect
github.com/google/wire v0.5.0 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.2 // indirect
github.com/hashicorp/go-cleanhttp v0.5.2 // indirect
github.com/hashicorp/yamux v0.0.0-20180604194846-3520598351bb // indirect
github.com/jackc/puddle/v2 v2.2.1 // indirect
github.com/klauspost/cpuid/v2 v2.2.5 // indirect
github.com/kylelemons/godebug v1.1.0 // indirect
github.com/lib/pq v1.10.9 // indirect
github.com/mitchellh/go-testing-interface v0.0.0-20171004221916-a61a99592b77 // indirect
github.com/mitchellh/hashstructure/v2 v2.0.2 // indirect
github.com/oklog/run v1.0.0 // indirect
github.com/pelletier/go-toml/v2 v2.1.0 // indirect
github.com/pierrec/lz4/v4 v4.1.8 // indirect
github.com/pkg/browser v0.0.0-20210911075715-681adbf594b8 // indirect
github.com/repeale/fp-go v0.11.1 // indirect
github.com/robfig/cron/v3 v3.0.1 // indirect
github.com/rogpeppe/go-internal v1.10.0 // indirect
github.com/rotisserie/eris v0.5.4 // indirect
github.com/rs/dnscache v0.0.0-20211102005908-e0241e321417 // indirect
github.com/sagikazarmark/locafero v0.3.0 // indirect
github.com/sagikazarmark/slog-shim v0.1.0 // indirect
github.com/samber/mo v1.11.0 // indirect
github.com/shopspring/decimal v1.3.1 // indirect
github.com/sourcegraph/conc v0.3.0 // indirect
go.uber.org/multierr v1.9.0 // indirect
github.com/ulule/deepcopier v0.0.0-20200430083143-45decc6639b6 // indirect
github.com/xhit/go-str2duration/v2 v2.1.0 // indirect
go.opentelemetry.io/otel v1.21.0 // indirect
go.opentelemetry.io/otel/metric v1.21.0 // indirect
go.opentelemetry.io/otel/trace v1.21.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
gocloud.dev v0.34.1-0.20231122211418-53ccd8db26a1 // indirect
golang.org/x/time v0.5.0 // indirect
gonum.org/v1/gonum v0.9.3 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20231002182017-d307bd883b97 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20231009173412-8bfb1ae86b6c // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20231127180814-3a041ad873d4 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20231127180814-3a041ad873d4 // indirect
)

require (
Expand Down Expand Up @@ -175,7 +196,6 @@ require (
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/golang/snappy v0.0.4 // indirect
github.com/google/go-cmp v0.5.9 // indirect
github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510 // indirect
github.com/googleapis/gax-go/v2 v2.12.0 // indirect
github.com/hashicorp/errwrap v1.1.0 // indirect
Expand Down Expand Up @@ -218,16 +238,18 @@ require (
github.com/xeipuuv/gojsonschema v1.2.0 // indirect
go.opencensus.io v0.24.0 // indirect
go.uber.org/atomic v1.11.0
golang.org/x/exp v0.0.0-20230905200255-921286631fa9
golang.org/x/mod v0.12.0 // indirect
golang.org/x/net v0.17.0
golang.org/x/sys v0.13.0 // indirect
golang.org/x/text v0.13.0 // indirect
golang.org/x/tools v0.13.0 // indirect
golang.org/x/xerrors v0.0.0-20220907171357-04be3eba64a2 // indirect
google.golang.org/appengine v1.6.7 // indirect
google.golang.org/genproto v0.0.0-20231002182017-d307bd883b97 // indirect
google.golang.org/grpc v1.58.3
golang.org/x/exp v0.0.0-20231127185646-65229373498e
golang.org/x/mod v0.14.0 // indirect
golang.org/x/net v0.19.0
golang.org/x/sys v0.15.0 // indirect
golang.org/x/text v0.14.0 // indirect
golang.org/x/tools v0.16.0 // indirect
golang.org/x/xerrors v0.0.0-20231012003039-104605ab7028 // indirect
google.golang.org/appengine v1.6.8 // indirect
google.golang.org/genproto v0.0.0-20231127180814-3a041ad873d4 // indirect
google.golang.org/grpc v1.59.0
gopkg.in/ini.v1 v1.67.0 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
)

replace github.com/csimplestring/delta-go => github.com/treeverse/delta-go v0.0.0-20231203131847-a5acc36c8ba5
Loading
Loading