Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameters to function copyTable like ignoredColumns, selected column #29

Open
ilyasse05 opened this issue Jan 24, 2023 · 3 comments

Comments

@ilyasse05
Copy link

Hi all,
This is an amelioration of copyTable function, add optionals parameters like
SelectedColumns : Select columns that we need for copy
IgnoredColumns : ignore columns that we don't need
Filter : add filter if we need it to copy a part data

I think this will be usefully

@MrPowers
Copy link
Collaborator

Seems like a reasonable suggestion, but perhaps we'd need a new name for that functionality. When I copy "something", I expect the source and destination to have the same data.

The copyTable function is a weak abstraction in the first place and the Delta Lake CLONE functionality seems a lot better. When you replicate a Delta Table in a different location, you really want to copy over both the data and the transaction log.

The current implementation copies over the data, but you lose all the transaction log info.

@brayanjuls - thoughts?

@brayanjuls
Copy link
Collaborator

The use case proposed here seems like TL(transform and load) to me, where the transformations will be simple like filtering rows, adding or removing columns, etc. Do you see any advantages in abstracting this into a function? I think the way we do this in spark is already simple. @ilyasse05 Could you please describe some use cases for these additions?

If we get to implement this, I agree that it needs to be a new function with another name.

Regarding the copy function, I think it would be good to have the full implementation not only shallow clone but an option that lets us do the deep clone. Having that we should rename the function to resemble the functionality that already exists in DB. @MrPowers do you think we can open a ticket to implement that?

@ilyasse05
Copy link
Author

@MrPowers @brayanjuls
Ah okay i indrestand the objectif, so i think it's better to rename the function to replicateTable() or cloneTable, the use case of this functionality is to clone hole table+log to another environment Dev, Test, PreProd, or archive.

But the use case that i want to suggest, it can be the same with some differences, for exampe we need to copy data from Porduction to exploration environement with hash of some value of column or juste ignored columns and we need to add technical column like timestamp of copy.

This is some of reel problems that i occurred with my team for difference projects and uses case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants