Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalog to config #4323

Open
wants to merge 79 commits into
base: main
Choose a base branch
from
Open

Conversation

ElenaKhaustova
Copy link
Contributor

@ElenaKhaustova ElenaKhaustova commented Nov 12, 2024

Description

Made on top of #4347 (review it first)

Implementation of #4329

Full context: #3932 (comment)

TODO:

Development notes

To run pytest tests/io/test_kedro_data_catalog.py::TestKedroDataCatalog::TestKedroDataCatalogFromConfig or see an example from How to test section in #4329

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
@ElenaKhaustova ElenaKhaustova changed the title Catalog from/to dict prototype Catalog to config prototype Nov 13, 2024
@ElenaKhaustova
Copy link
Contributor Author

Is it worth splitting the version validation PR separately? Or is it tied to the "catalog to config" prototype in any way?

#4347

@ElenaKhaustova ElenaKhaustova marked this pull request as draft November 27, 2024 15:47
@ElenaKhaustova ElenaKhaustova changed the title Catalog to config prototype Catalog to_config() Nov 27, 2024
@ElenaKhaustova ElenaKhaustova changed the title Catalog to_config() Catalog to config Nov 27, 2024
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
@ElenaKhaustova ElenaKhaustova marked this pull request as ready for review November 27, 2024 20:36
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ElenaKhaustova, really great work! 🌟

@@ -28,6 +28,7 @@ def _create_session(package_name: str, **kwargs: Any) -> KedroSession:


def is_parameter(dataset_name: str) -> bool:
# TODO: when breaking change move it to kedro/io/core.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this todo still needed?

"""Converts the `KedroDataCatalog` instance into a configuration format suitable for
serialization. This includes datasets, credentials, and versioning information.
This method is only applicabe to catalogs that contain datasets initialized with static, primitive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This method is only applicabe to catalogs that contain datasets initialized with static, primitive
This method is only applicable to catalogs that contain datasets initialized with static, primitive

@@ -237,8 +237,9 @@ def _extract_patterns(

return sorted_patterns, user_default

@classmethod
def _resolve_config_credentials(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop the word config from the name of the method as well as the method below? What other credentials can they be except config credentials in the context of this class?

# Declares a class-level attribute that will store the initialization
# arguments of an instance. Initially, it is set to None, but it will
# hold a dictionary of arguments after initialization.
_init_args: dict[str, Any] | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this shared across all instances? How to make sure we're not overwriting everything with the latest instance args?

method of the instance is called with the arguments used to initialize
the object.
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@wraps(init_func)

and you need to import it at the top as from functools import wraps

# Save the original __init__ method of the subclass
init_func: Callable = cls.__init__

def init_decorator(previous_init: Callable) -> Callable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can drop the padding of an extra function, this decorator will not be used in the code as a decorator, but rather you will just assign the function it returns to be the cls.__init__, so only the new_init function is required.

"""

# Call the original __init__ method
previous_init(self, *args, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
previous_init(self, *args, **kwargs)
init_func(self, *args, **kwargs)

Comment on lines +394 to +397
if type(self) is cls:
# Capture and process the arguments passed to the original __init__
call_args = getcallargs(init_func, self, *args, **kwargs)
# Call the custom post-initialization method to save captured arguments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this might not be needed, if you setup the function as suggested.

# hold a dictionary of arguments after initialization.
_init_args: dict[str, Any] | None = None

def __post_init__(self, call_args: dict[str, Any]) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __post_init__(self, call_args: dict[str, Any]) -> None:
def __post_init__(self, *args, **kwargs) -> None:

Btw, do we even need this method at all? Can't all of this be done by the decorator itself instead of delegating it to a separate method?

@@ -484,14 +602,14 @@ def parse_dataset_definition(
config = copy.deepcopy(config)

# TODO: remove when removing old catalog as moved to KedroDataCatalog
if "type" not in config:
if TYPE_KEY not in config:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

load_versions: dict[str, str | None] = {}

for ds_name, ds in self._lazy_datasets.items():
if _is_memory_dataset(ds.config.get("type", "")):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if _is_memory_dataset(ds.config.get("type", "")):
if _is_memory_dataset(ds.config.get(TYPE_KEY, "")):

...maybe?


def _is_memory_dataset(ds_or_type: AbstractDataset | str) -> bool:
"""Check if dataset or str type provided is a MemoryDataset."""
if isinstance(ds_or_type, AbstractDataset):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we check directly for a MemoryDataset instance here or this is to avoid circular dependencies?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants