Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifest List/Entry Creation #172

Open
dwilson1988 opened this issue Oct 15, 2024 · 1 comment
Open

Manifest List/Entry Creation #172

dwilson1988 opened this issue Oct 15, 2024 · 1 comment

Comments

@dwilson1988
Copy link
Contributor

Feature Request / Improvement

Hello, I'm working on a use case where I need to be my own catalog and need to be able to create my own Iceberg tables purely in Go. I understand that table creation through a catalog is one of the design goals, but direct creation of manifests (snapshots, manifest lists/entries, data file metadata) does not appear to be supported unless I'm missing something. My use case is fairly straightforward:

  1. crawl a filesystem/object store for parquet files
  2. gather column level statistics and file level metadata
  3. build up a single snapshot for the results
  4. create table metadata for this snapshot and keep track of this in a separate store.

I could be missing something, but it appears all of the concrete structs are un-exported and I don't see any external interface to create them. Is this within the design goals of this module? If so, where does it stand on the priorities? I will be started work on this in fairly short order and plan to use this module to at least read tables. I'd like to be able to use it to write as well.

I'm more than happy to contribute this, if desired and would love some guidance on how you'd like to see behavior like this implemented.

In addition, I notice a gocloud CDK PR that seems to have stalled out. Seeing as I also need this functionality, I'm happy to help take this across the finish line (though I might take a step back and rethink the design a little bit)

@zeroshade

@zeroshade
Copy link
Member

Thanks for filing this!

I understand that table creation through a catalog is one of the design goals, but direct creation of manifests (snapshots, manifest lists/entries, data file metadata) does not appear to be supported unless I'm missing something.

Currently we have concrete Manifest Builder objects in manifest.go for constructing manifest files while #146 is adding more generalized manifest building, snapshot additions, data file handling etc.

Is this within the design goals of this module? If so, where does it stand on the priorities? I will be started work on this in fairly short order and plan to use this module to at least read tables. I'd like to be able to use it to write as well.

I'm more than happy to contribute this, if desired and would love some guidance on how you'd like to see behavior like this implemented.

It is definitely within the design goals of this module to have full write support to construct metadata, snapshots, partitions and everything. In general, Builder pattern type handling seems to be the safest for the APIs in this package to ensure all of the moving parts are updated appropriately and consistent. A source of inspiration in this package has been to use pyiceberg as a starting point for developing interfaces followed by then making it more idiomatic for Go.

I would happily review any PRs that are put up and help get things implemented. My current priorities are on the read side currently as you can see with my recent PRs, with write support planned afterwards. But if you are going to be developing it anyways, I'd love the contribution.

In addition, I notice a gocloud CDK PR that seems to have stalled out. Seeing as I also need this functionality, I'm happy to help take this across the finish line (though I might take a step back and rethink the design a little bit)

That would be fantastic! I would greatly appreciate it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants