Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Table Extractor Hook and _lakefs_tables format #6589
Table Extractor Hook and _lakefs_tables format #6589
Changes from 11 commits
f51a49e
22c67d1
a44293e
a6bec34
8755a70
0327daa
3096211
83ef166
4f02c4d
9892648
ac5978f
e1d8d03
72dc0d1
ef891ae
9dbb36f
0429e67
ce76e29
f4c3a96
ec5ddd6
c85c426
eb21f45
77bd52d
bdd7c2b
82fde2f
be77969
9f1de19
7e80978
e4e55bd
f31675d
cbc6ee0
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you're saying, please explain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
lakefs_paginiated_api
is the money maker - I can use it with multiple api supplied by the lakefs package.The
common
package undercatalogexport
use it just for listing object and expose this functionality only.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
lakefs_paginiated_api
is still not exportedThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When someone needs
lakefs_paginiated_api
outside of the package, we can expose it - no? what am i missing?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning (data, err) is Go practice not Lua.
Indicate invalid state by calling an error or by return nil or false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we export a function from a different module I can require?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the end goal is for the user to require 1 single package and
lib.lua
will be the interface for that.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we like single package we need to expose it as catalogexport?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ipairs as we run over split result is index based array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you switch between the inner and outer loop you will perform
perfix
concat one per column.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed that code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when there is a match we can break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed that code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I got it right the
extract_partition_prefix_from_path
usersget_parition_values
and the end result is getting the part of the object path based on the paritions / columns.I assume that the columns are ordered based on the same structure as the object path.
Also, that the number of partitions must match the path information.
Base on the above I suggest the following:
It should return the relevant part of the path based on the partitioning used to write the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better thanks! Did almost that (added comment in the code) 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is relevant to some functions here - how do we plan to support delimiter?
As I see it, we can use delimiter and pass it to list object as long as it is not the default delimiter.
When using a delimiter for listing we will not get recursive listing under prefix - it means that we will not get any partition information.
Please verify that we need recursive listing, and if this is the case I prefer not to pass any delimiter to these APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the support for delimiter, we only need recursive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partition_entries should be the same level as target_partition? or it means that the caller will have to handle multiple calls that will have the same target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be like this, not sure what you mean? From testing it looks fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of recursive listing, all are objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we are not using yield and trust our after it can have very bad impact on our users as we may do call per record where each call we pull a page of records and continue to request per record in case of unique values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed but, when we need we can add coroutine (yield) into the runtime and improve this, for now I think that's the tradeoff of paging over partitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the issue is not with paging is multiple listing over lakefs because of partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see a reason to generate a class for hive table - we can just return a table with this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getter method that perform marshal should name differently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module should extract this function - I don't see any reason to have this one as an object and initialize it with repo, commit, ref while for listing I don't use ref and for get table in specific case I use it.
Should be enable to get a list of tables from the function.
About get_table - it does more than get it uses the ref we provide to
HiveTable
- so I think getting a table is just parse yaml object, the rest is related to hive and should be handle by hive function.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed everything :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the code now a bit to make sure the rest of the path is considered.
Trying to make it work with only is_hidden params (suffix, separator) makes it look very confusing and convoluted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local
for temporary variablesif
is hard to read we can extract the checkThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done 👍