-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python Wrapper: Make ImportManager single use #7108
Changes from 1 commit
054bc16
db8eccb
f7a0d6c
1e956f8
ff87133
73c0ca9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -153,7 +153,7 @@ def uncommitted(self, max_amount: Optional[int], after: Optional[str] = None, pr | |
**kwargs): | ||
yield Change(**diff.dict()) | ||
|
||
def import_data(self, commit_message: str, metadata: Optional[dict] = None) -> ImportManager: | ||
def import_data(self, commit_message: Optional[str] = "", metadata: Optional[dict] = None) -> ImportManager: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it's optional it probably shouldn't be an empty string by default. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's optional in the SDK, it's a required param in the auto-generated code - empty string means to use default server commit message for import There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This comment make me suddenly uncertain: What happens if I pass an empty There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Passing commit_message=None will result in an error that the parameter is missing. Adding this to the docstring |
||
""" | ||
Import data to lakeFS | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -85,6 +85,8 @@ def start(self) -> str: | |
""" | ||
if self._in_progress: | ||
raise ImportManagerException("Import in progress") | ||
if self._import_id is not None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it's a single-use object used line this, why isn't it a function? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Several reasons:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIK the Pythonic way (and also in many other cases) of keeping state for async execution is to return a state object that holds the coro, as for instance in asyncio.create_task does. In fact an import could be a Task, and I imagine will end up at least duck-typing like a Task. A good way to achieve fluidity but only before calling a "start" method is with the builder pattern: you build a descriptor for the executor, and call AFAICT each important method of your single class can either only be called before execution of after execution. That's why ImportManager needs a run-time state. If we split There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a valid point and we can perhaps refactor this in the future. I'm deferring it because it's requires changes in the implementation |
||
raise ImportManagerException("Import Manager can only be used once") | ||
|
||
creation = lakefs_sdk.ImportCreation(paths=self.sources, | ||
commit=lakefs_sdk.CommitCreation(message=self.commit_message, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,8 @@ | ||
from time import sleep | ||
|
||
import pytest | ||
|
||
from lakefs import Client | ||
from lakefs.exceptions import ImportManagerException, ConflictException | ||
from tests.utests.common import expect_exception_context | ||
|
||
|
@@ -14,8 +17,14 @@ | |
"nested/prefix-7/file000101", ] | ||
|
||
|
||
def skip_on_unsupported_blockstore(clt: Client, supported_blockstores: [str]): | ||
if clt.storage_config.blockstore_type not in supported_blockstores: | ||
pytest.skip(f"Unsupported blockstore type for test: {clt.storage_config.blockstore_type}") | ||
|
||
|
||
def test_import_manager(setup_repo): | ||
_, repo = setup_repo | ||
clt, repo = setup_repo | ||
skip_on_unsupported_blockstore(clt, "s3") | ||
branch = repo.branch("import-branch").create("main") | ||
mgr = branch.import_data(commit_message="my imported data", metadata={"foo": "bar"}) | ||
|
||
|
@@ -32,17 +41,22 @@ def test_import_manager(setup_repo): | |
assert res.commit.metadata.get("foo") == "bar" | ||
assert res.ingested_objects == 0 | ||
|
||
# Expect failure trying to run manager twice | ||
with expect_exception_context(ImportManagerException): | ||
mgr.run() | ||
Comment on lines
+45
to
+46
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
|
||
# Import with objects and prefixes | ||
mgr = branch.import_data() | ||
dest_prefix = "imported/new-prefix/" | ||
mgr.prefix(_IMPORT_PATH + "prefix-1/", | ||
dest_prefix + "prefix-1/").prefix(_IMPORT_PATH + "prefix-2/", | ||
dest_prefix + "prefix-2/") | ||
for o in _FILES_TO_CHECK: | ||
mgr.object(_IMPORT_PATH + o, dest_prefix + o) | ||
|
||
mgr.commit_message = "new commit" | ||
mgr.commit_metadata = None | ||
res = mgr.run() | ||
|
||
assert res.error is None | ||
assert res.completed | ||
assert res.commit.id == branch.commit_id() | ||
|
@@ -56,7 +70,8 @@ def test_import_manager(setup_repo): | |
|
||
|
||
def test_import_manager_cancel(setup_repo): | ||
_, repo = setup_repo | ||
clt, repo = setup_repo | ||
skip_on_unsupported_blockstore(clt, "s3") | ||
branch = repo.branch("import-branch").create("main") | ||
expected_commit_id = branch.commit_id() | ||
expected_commit_message = branch.commit_message() | ||
|
@@ -66,6 +81,10 @@ def test_import_manager_cancel(setup_repo): | |
|
||
mgr.start() | ||
sleep(1) | ||
|
||
with expect_exception_context(ImportManagerException): | ||
mgr.start() | ||
|
||
mgr.cancel() | ||
|
||
status = mgr.status() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating these is probably out of scope for this pr. But doing so manually is a recipe for partial implementation, we're bound to miss some top level class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the comment.
We are doing this only for the Repository class, and just to align the syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is twofold:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of
Other classes are dependant on the context of their "parent" class we expect they should be initialized usually by the call to the respective method of the parent.
If it was up to me I wouldn't have done even the first - but we got some comments about it from users