Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta Lake Catalog Exporter #7078

Merged
merged 38 commits into from
Dec 11, 2023
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
c446901
delta client impl
Jonathan-Rosenberg Nov 23, 2023
8e8ddca
delta exporter
Jonathan-Rosenberg Nov 23, 2023
8bb3f1c
Use the earliest available version to build the Delta log
Jonathan-Rosenberg Nov 27, 2023
5adf15d
pass listen address to hooks
Jonathan-Rosenberg Nov 27, 2023
37adb94
add stat_object api call to the lakefs lua client
Jonathan-Rosenberg Nov 27, 2023
cd7b8b6
add get_repo to eventually fetch its storage namespace
Jonathan-Rosenberg Nov 27, 2023
c3bc982
delta lake exporter example
Jonathan-Rosenberg Nov 29, 2023
8a81e4e
add error return. remove redundant lines. add godocs for some funcitons
Jonathan-Rosenberg Nov 29, 2023
c4f520a
extract get_storage_namespace to the internal lua file.
Jonathan-Rosenberg Nov 29, 2023
b2b81aa
remove comment
Jonathan-Rosenberg Nov 29, 2023
587ed18
go mod tidy
Jonathan-Rosenberg Nov 29, 2023
bde6581
pass aws properties directly to fetchS3Table
Jonathan-Rosenberg Nov 29, 2023
4df6b5a
json encoding refactor
Jonathan-Rosenberg Nov 29, 2023
c49c42e
fix load test
Jonathan-Rosenberg Nov 29, 2023
e5013b9
fix tests
Jonathan-Rosenberg Nov 29, 2023
7318c64
fix tests
Jonathan-Rosenberg Nov 29, 2023
2bac82e
goimports
Jonathan-Rosenberg Nov 29, 2023
9bd7a76
add indentation to lua test
Jonathan-Rosenberg Nov 30, 2023
855a443
remove storage_utils.lua
Jonathan-Rosenberg Nov 30, 2023
74eb76b
Rename `ListeningAddress` (and `listeningAddress`) to `ServerAddress`.
Jonathan-Rosenberg Nov 30, 2023
0866a3c
fix test
Jonathan-Rosenberg Nov 30, 2023
d0bb96e
lint
Jonathan-Rosenberg Nov 30, 2023
e5ca6b8
pass a delta client to the export_delta function
Jonathan-Rosenberg Nov 30, 2023
0b6853d
upgrade aws-sdk-v2
Jonathan-Rosenberg Nov 30, 2023
a028ac8
Merge branch 'master' into feature/catalog-export/delta
Jonathan-Rosenberg Dec 3, 2023
e32dc16
go mod tidy
Jonathan-Rosenberg Dec 3, 2023
aa476a8
use get_storage_prefix from the internal package. Remove it from docs
Jonathan-Rosenberg Dec 3, 2023
0bae4ba
remove get_storage_namespace and the `get_repo` method from lakeFS cl…
Jonathan-Rosenberg Dec 3, 2023
a603196
use get_storage_prefix from the internal package
Jonathan-Rosenberg Dec 3, 2023
9e522f5
use correct error when returning
Jonathan-Rosenberg Dec 3, 2023
178107d
comment the delta_log_entry_key_generator function and return a compl…
Jonathan-Rosenberg Dec 3, 2023
a687c91
use a writer_client function instead of a storage client when passed …
Jonathan-Rosenberg Dec 3, 2023
98c3ac2
updated version of delta go
Jonathan-Rosenberg Dec 3, 2023
590f2e3
severAddress -> lakeFSAddr + comment
Jonathan-Rosenberg Dec 3, 2023
9de27d8
Merge branch 'master' into feature/catalog-export/delta
Jonathan-Rosenberg Dec 6, 2023
d9cde83
change delta client builder API and drop storage type (since always u…
Jonathan-Rosenberg Dec 6, 2023
dd4c81d
rename the lakeFS address field and add clarifying comment
Jonathan-Rosenberg Dec 6, 2023
2b9f8a2
Merge branch 'master' into feature/catalog-export/delta
Jonathan-Rosenberg Dec 6, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cmd/lakefs/cmd/run.go
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ var runCmd = &cobra.Command{
idGen,
bufferedCollector,
actions.Config(cfg.Actions),
cfg.ListenAddress,
)

// wire actions into entry catalog
Expand Down
30 changes: 30 additions & 0 deletions examples/hooks/delta_lake_S3_export.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
--[[
action_args:
- repo
- commit_id

export_delta_args:
- table_paths: ["path/to/table1", "path/to/table2", ...]
- lakefs_key
- lakefs_secret
- region

storage_client:
- put_object: function(bucket, key, data)
]]

local export_delta_args = {
table_paths = args.table_paths,
lakefs_key = args.lakefs.access_key_id,
lakefs_secret = args.lakefs.secret_access_key,
region = args.aws.aws_region
}

local aws = require("aws")
local delta = require("lakefs/catalogexport/delta_exporter")
local sc = aws.s3_client(args.aws.aws_access_key_id, args.aws.aws_secret_access_key, args.aws.aws_region)
Isan-Rivkin marked this conversation as resolved.
Show resolved Hide resolved

local delta_table_locations = delta.export_delta_log(action, export_delta_args, sc)
for t, loc in pairs(delta_table_locations) do
print("Delta Lake exported table \"" .. t .. "\"'s location: " .. loc .. "\n")
end
22 changes: 19 additions & 3 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ require (
github.com/aws/aws-sdk-go-v2/service/sts v1.23.2
github.com/aws/smithy-go v1.15.0
github.com/benburkert/dns v0.0.0-20190225204957-d356cf78cdfc
github.com/csimplestring/delta-go v0.0.0-20231105162402-9b93ca02cedf
github.com/dgraph-io/badger/v4 v4.2.0
github.com/georgysavva/scany/v2 v2.0.0
github.com/go-co-op/gocron v1.35.2
Expand All @@ -102,7 +103,10 @@ require (
cloud.google.com/go/iam v1.1.2 // indirect
github.com/Azure/azure-sdk-for-go v68.0.0+incompatible // indirect
github.com/Azure/azure-sdk-for-go/sdk/internal v1.3.0 // indirect
github.com/Azure/go-autorest v14.2.0+incompatible // indirect
github.com/Azure/go-autorest/autorest/to v0.4.0 // indirect
github.com/AzureAD/microsoft-authentication-library-for-go v1.1.1 // indirect
github.com/ahmetb/go-linq/v3 v3.2.0 // indirect
github.com/apache/arrow/go/arrow v0.0.0-20200730104253-651201b0f516 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.4.14 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.13.13 // indirect
Expand All @@ -118,32 +122,42 @@ require (
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.15.6 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.15.2 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.17.3 // indirect
github.com/barweiss/go-tuple v1.1.1 // indirect
github.com/benbjohnson/clock v1.3.0 // indirect
github.com/deckarep/golang-set/v2 v2.3.1 // indirect
github.com/fraugster/parquet-go v0.12.0 // indirect
github.com/getsentry/sentry-go v0.16.0 // indirect
github.com/go-sql-driver/mysql v1.7.0 // indirect
github.com/golang-jwt/jwt/v5 v5.0.0 // indirect
github.com/golang/glog v1.1.0 // indirect
github.com/google/flatbuffers v2.0.0+incompatible // indirect
github.com/google/s2a-go v0.1.7 // indirect
github.com/google/wire v0.5.0 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.1 // indirect
github.com/hashicorp/go-cleanhttp v0.5.2 // indirect
github.com/hashicorp/yamux v0.0.0-20180604194846-3520598351bb // indirect
github.com/jackc/puddle/v2 v2.2.1 // indirect
github.com/klauspost/cpuid/v2 v2.2.5 // indirect
github.com/kylelemons/godebug v1.1.0 // indirect
github.com/lib/pq v1.10.9 // indirect
github.com/mitchellh/go-testing-interface v0.0.0-20171004221916-a61a99592b77 // indirect
github.com/mitchellh/hashstructure/v2 v2.0.2 // indirect
github.com/oklog/run v1.0.0 // indirect
github.com/pelletier/go-toml/v2 v2.1.0 // indirect
github.com/pierrec/lz4/v4 v4.1.8 // indirect
github.com/pkg/browser v0.0.0-20210911075715-681adbf594b8 // indirect
github.com/repeale/fp-go v0.11.1 // indirect
github.com/robfig/cron/v3 v3.0.1 // indirect
github.com/rogpeppe/go-internal v1.10.0 // indirect
github.com/rotisserie/eris v0.5.4 // indirect
github.com/rs/dnscache v0.0.0-20211102005908-e0241e321417 // indirect
github.com/sagikazarmark/locafero v0.3.0 // indirect
github.com/sagikazarmark/slog-shim v0.1.0 // indirect
github.com/samber/mo v1.10.0 // indirect
github.com/shopspring/decimal v1.3.1 // indirect
github.com/sourcegraph/conc v0.3.0 // indirect
go.uber.org/multierr v1.9.0 // indirect
github.com/ulule/deepcopier v0.0.0-20200430083143-45decc6639b6 // indirect
github.com/xhit/go-str2duration/v2 v2.1.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
gocloud.dev v0.34.0 // indirect
gonum.org/v1/gonum v0.9.3 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20231002182017-d307bd883b97 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20231009173412-8bfb1ae86b6c // indirect
Expand Down Expand Up @@ -232,3 +246,5 @@ require (
gopkg.in/ini.v1 v1.67.0 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
)

replace github.com/csimplestring/delta-go => github.com/treeverse/delta-go v0.0.0-20231128165607-87acc8332e05
Loading
Loading