[MAINTENANCE] Enforce mandatory docstrings for public API decorated objects #10799

cdkini · 2024-12-19T16:20:31Z

Refer to ADR 0005 - we should always have docstrings when dealing with public objects that are rendered in our docs

Description of PR changes above includes a link to an existing GitHub issue
PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB]
Code is linted - run invoke lint (uses ruff format + ruff check)
Appropriate tests and docs have been updated

For more information about contributing, visit our community resources.

After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!

netlify · 2024-12-19T16:20:51Z

✅ Deploy Preview for niobium-lead-7998 ready!

Name	Link
🔨 Latest commit
🔍 Latest deploy log	https://app.netlify.com/sites/niobium-lead-7998/deploys/676498a924c64e6a95f22f4a
😎 Deploy Preview	https://deploy-preview-10799.docs.greatexpectations.io
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

codecov · 2024-12-19T16:53:51Z

❌ 4 Tests Failed:

Tests completed	Failed	Passed	Skipped
2966	4	2962	47

View the top 3 failed tests by shortest run time

tests.datasource.fluent.test_schemas::test_vcs_schemas_match[pandas_s3:PandasS3Datasource]

Stack Traces | 0.2s run time

fluent_ds_or_asset_model = <class 'great_expectations.datasource.fluent.pandas_s3_datasource.PandasS3Datasource'>
schema_dir = PosixPath('.../datasource/fluent/schemas')

    @pytest.mark.skipif(
        min_supported_python() < PYTHON_VERSION,
        reason=f"_sort_any_of() keys needs to be fixed for py {PYTHON_VERSION}",
    )
    @pytest.mark.timeout(
        2.0  # this is marked as unit so that it will run on different versions of python
    )
    @pytest.mark.unit
    @pytest.mark.parametrize(
        ["fluent_ds_or_asset_model", "schema_dir"],
        [pytest.param(t[0], t[1], id=t[2]) for t in _models_and_schema_dirs()],
    )
    def test_vcs_schemas_match(  # noqa: C901
        fluent_ds_or_asset_model: Type[Datasource | DataAsset], schema_dir: pathlib.Path
    ):
        """
        Test that json schemas for each DataSource and DataAsset match the current schema
        under version control.
    
        This is important because some of these classes are generated dynamically and
        monitoring these schemas files can clue us into problems that could otherwise go
        unnoticed.
    
        If this test is failing run `invoke schema --sync` to update schemas and commit the
        changes.
    
        Note: if the installed version of pandas doesn't match the one used in the standard
        test pipeline, the test will be marked a `xfail` because the schemas will not match.
        """
    
        def _sort_any_of(d: dict) -> str:
            if items := d.get("items"):
                _sort_any_of(items)
            if ref := d.get("$ref"):
                return ref
            if type_ := d.get("type"):
                return type_
            # return any string for sorting
            return "z"
    
        def _sort_lists(schema_as_dict: dict) -> None:
            """Sometimes "required" and "anyOf" lists come unsorted, causing misleading assertion failures; this corrects the issue.
    
            Args:
                schema_as_dict: source dictionary (will be modified "in-situ")
    
            """  # noqa: E501
            key: str
            value: Any
    
            for key, value in schema_as_dict.items():
                if key == "required":
                    schema_as_dict[key] = sorted(value)
                elif key == "anyOf":
                    if isinstance(value, list):
                        for val in value:
                            if isinstance(value, dict):
                                schema_as_dict[key] = sorted(val, key=_sort_any_of)
                    else:
                        schema_as_dict[key] = sorted(value, key=_sort_any_of)
    
                if isinstance(value, dict):
                    _sort_lists(schema_as_dict=value)
    
        if "Pandas" in str(schema_dir) and _PANDAS_SCHEMA_VERSION != PANDAS_VERSION:
            pytest.xfail(reason=f"schema generated with pandas {_PANDAS_SCHEMA_VERSION}")
    
        print(f"python version: {sys.version.split()[0]}")
        print(f"pandas version: {PANDAS_VERSION}")
    
        schema_path = schema_dir.joinpath(f"{fluent_ds_or_asset_model.__name__}.json")
        print(schema_path)
    
        json_str = schema_path.read_text().rstrip()
    
        schema_as_dict = json.loads(json_str)
        _sort_lists(schema_as_dict=schema_as_dict)
        # we have tuples in our schema, which are mutated to lists when dumped to json
        # dump and reload the schema dict to ensure we are comparing
        fluent_ds_or_asset_model_as_dict = json.loads(fluent_ds_or_asset_model.schema_json())
        _sort_lists(schema_as_dict=fluent_ds_or_asset_model_as_dict)
    
        if "Excel" in str(schema_path):
            pytest.xfail(reason="Sorting of nested anyOf key")
    
>       assert (
            schema_as_dict == fluent_ds_or_asset_model_as_dict
        ), "Schemas are out of sync. Run `invoke schema --sync`. Also check your pandas version."
E       AssertionError: Schemas are out of sync. Run `invoke schema --sync`. Also check your pandas version.
E       assert equals failed
E               'required': ['key'],             'required': ['key'],      
E               'title': 'Sorter',               'title': 'Sorter',        
E               'type': 'object',                'type': 'object',         
E             },                               },                          
E           },                               },                            
E         #x00-  'description': '--Public API-#x01  #x00+  'description': '--Public API-#x01 
E         #x00--',#x01                              #x00+-\nPandasS3Datasource is a Pand#x01 
E                                          #x00+asDatasource that uses Amazon S#x01 
E                                          #x00+3 as a data store.',#x01            
E           'properties': {                  'properties': {               
E             'assets': {                      'assets': {                 
E               'default': [],                   'default': [],            
E               'items': {                       'items': {                
E                 '$ref': '#/definitions/          '$ref': '#/definitions/ 
E         FileDataAsset',                  FileDataAsset',

.../datasource/fluent/test_schemas.py:135: AssertionError

tests.datasource.fluent.test_schemas::test_vcs_schemas_match[pandas_gcs:PandasGoogleCloudStorageDatasource]

Stack Traces | 0.202s run time

fluent_ds_or_asset_model = <class 'great_expectations.datasource.fluent.pandas_google_cloud_storage_datasource.PandasGoogleCloudStorageDatasource'>
schema_dir = PosixPath('.../datasource/fluent/schemas')

    @pytest.mark.skipif(
        min_supported_python() < PYTHON_VERSION,
        reason=f"_sort_any_of() keys needs to be fixed for py {PYTHON_VERSION}",
    )
    @pytest.mark.timeout(
        2.0  # this is marked as unit so that it will run on different versions of python
    )
    @pytest.mark.unit
    @pytest.mark.parametrize(
        ["fluent_ds_or_asset_model", "schema_dir"],
        [pytest.param(t[0], t[1], id=t[2]) for t in _models_and_schema_dirs()],
    )
    def test_vcs_schemas_match(  # noqa: C901
        fluent_ds_or_asset_model: Type[Datasource | DataAsset], schema_dir: pathlib.Path
    ):
        """
        Test that json schemas for each DataSource and DataAsset match the current schema
        under version control.
    
        This is important because some of these classes are generated dynamically and
        monitoring these schemas files can clue us into problems that could otherwise go
        unnoticed.
    
        If this test is failing run `invoke schema --sync` to update schemas and commit the
        changes.
    
        Note: if the installed version of pandas doesn't match the one used in the standard
        test pipeline, the test will be marked a `xfail` because the schemas will not match.
        """
    
        def _sort_any_of(d: dict) -> str:
            if items := d.get("items"):
                _sort_any_of(items)
            if ref := d.get("$ref"):
                return ref
            if type_ := d.get("type"):
                return type_
            # return any string for sorting
            return "z"
    
        def _sort_lists(schema_as_dict: dict) -> None:
            """Sometimes "required" and "anyOf" lists come unsorted, causing misleading assertion failures; this corrects the issue.
    
            Args:
                schema_as_dict: source dictionary (will be modified "in-situ")
    
            """  # noqa: E501
            key: str
            value: Any
    
            for key, value in schema_as_dict.items():
                if key == "required":
                    schema_as_dict[key] = sorted(value)
                elif key == "anyOf":
                    if isinstance(value, list):
                        for val in value:
                            if isinstance(value, dict):
                                schema_as_dict[key] = sorted(val, key=_sort_any_of)
                    else:
                        schema_as_dict[key] = sorted(value, key=_sort_any_of)
    
                if isinstance(value, dict):
                    _sort_lists(schema_as_dict=value)
    
        if "Pandas" in str(schema_dir) and _PANDAS_SCHEMA_VERSION != PANDAS_VERSION:
            pytest.xfail(reason=f"schema generated with pandas {_PANDAS_SCHEMA_VERSION}")
    
        print(f"python version: {sys.version.split()[0]}")
        print(f"pandas version: {PANDAS_VERSION}")
    
        schema_path = schema_dir.joinpath(f"{fluent_ds_or_asset_model.__name__}.json")
        print(schema_path)
    
        json_str = schema_path.read_text().rstrip()
    
        schema_as_dict = json.loads(json_str)
        _sort_lists(schema_as_dict=schema_as_dict)
        # we have tuples in our schema, which are mutated to lists when dumped to json
        # dump and reload the schema dict to ensure we are comparing
        fluent_ds_or_asset_model_as_dict = json.loads(fluent_ds_or_asset_model.schema_json())
        _sort_lists(schema_as_dict=fluent_ds_or_asset_model_as_dict)
    
        if "Excel" in str(schema_path):
            pytest.xfail(reason="Sorting of nested anyOf key")
    
>       assert (
            schema_as_dict == fluent_ds_or_asset_model_as_dict
        ), "Schemas are out of sync. Run `invoke schema --sync`. Also check your pandas version."
E       AssertionError: Schemas are out of sync. Run `invoke schema --sync`. Also check your pandas version.
E       assert equals failed
E               'required': ['key'],             'required': ['key'],      
E               'title': 'Sorter',               'title': 'Sorter',        
E               'type': 'object',                'type': 'object',         
E             },                               },                          
E           },                               },                            
E         #x00-  'description': '--Public API-#x01  #x00+  'description': '--Public API-#x01 
E         #x00--',#x01                              #x00+-\nPandasGoogleCloudStorageData#x01 
E                                          #x00+source is a PandasDatasource th#x01 
E                                          #x00+at uses Google Cloud Storage as#x01 
E                                          #x00+ a\ndata store.',#x01               
E           'properties': {                  'properties': {               
E             'assets': {                      'assets': {                 
E               'default': [],                   'default': [],            
E               'items': {                       'items': {                
E                 '$ref': '#/definitions/          '$ref': '#/definitions/ 
E         FileDataAsset',                  FileDataAsset',

.../datasource/fluent/test_schemas.py:135: AssertionError

tests.datasource.fluent.test_schemas::test_vcs_schemas_match[pandas_abs:PandasAzureBlobStorageDatasource]

Stack Traces | 0.206s run time

fluent_ds_or_asset_model = <class 'great_expectations.datasource.fluent.pandas_azure_blob_storage_datasource.PandasAzureBlobStorageDatasource'>
schema_dir = PosixPath('.../datasource/fluent/schemas')

    @pytest.mark.skipif(
        min_supported_python() < PYTHON_VERSION,
        reason=f"_sort_any_of() keys needs to be fixed for py {PYTHON_VERSION}",
    )
    @pytest.mark.timeout(
        2.0  # this is marked as unit so that it will run on different versions of python
    )
    @pytest.mark.unit
    @pytest.mark.parametrize(
        ["fluent_ds_or_asset_model", "schema_dir"],
        [pytest.param(t[0], t[1], id=t[2]) for t in _models_and_schema_dirs()],
    )
    def test_vcs_schemas_match(  # noqa: C901
        fluent_ds_or_asset_model: Type[Datasource | DataAsset], schema_dir: pathlib.Path
    ):
        """
        Test that json schemas for each DataSource and DataAsset match the current schema
        under version control.
    
        This is important because some of these classes are generated dynamically and
        monitoring these schemas files can clue us into problems that could otherwise go
        unnoticed.
    
        If this test is failing run `invoke schema --sync` to update schemas and commit the
        changes.
    
        Note: if the installed version of pandas doesn't match the one used in the standard
        test pipeline, the test will be marked a `xfail` because the schemas will not match.
        """
    
        def _sort_any_of(d: dict) -> str:
            if items := d.get("items"):
                _sort_any_of(items)
            if ref := d.get("$ref"):
                return ref
            if type_ := d.get("type"):
                return type_
            # return any string for sorting
            return "z"
    
        def _sort_lists(schema_as_dict: dict) -> None:
            """Sometimes "required" and "anyOf" lists come unsorted, causing misleading assertion failures; this corrects the issue.
    
            Args:
                schema_as_dict: source dictionary (will be modified "in-situ")
    
            """  # noqa: E501
            key: str
            value: Any
    
            for key, value in schema_as_dict.items():
                if key == "required":
                    schema_as_dict[key] = sorted(value)
                elif key == "anyOf":
                    if isinstance(value, list):
                        for val in value:
                            if isinstance(value, dict):
                                schema_as_dict[key] = sorted(val, key=_sort_any_of)
                    else:
                        schema_as_dict[key] = sorted(value, key=_sort_any_of)
    
                if isinstance(value, dict):
                    _sort_lists(schema_as_dict=value)
    
        if "Pandas" in str(schema_dir) and _PANDAS_SCHEMA_VERSION != PANDAS_VERSION:
            pytest.xfail(reason=f"schema generated with pandas {_PANDAS_SCHEMA_VERSION}")
    
        print(f"python version: {sys.version.split()[0]}")
        print(f"pandas version: {PANDAS_VERSION}")
    
        schema_path = schema_dir.joinpath(f"{fluent_ds_or_asset_model.__name__}.json")
        print(schema_path)
    
        json_str = schema_path.read_text().rstrip()
    
        schema_as_dict = json.loads(json_str)
        _sort_lists(schema_as_dict=schema_as_dict)
        # we have tuples in our schema, which are mutated to lists when dumped to json
        # dump and reload the schema dict to ensure we are comparing
        fluent_ds_or_asset_model_as_dict = json.loads(fluent_ds_or_asset_model.schema_json())
        _sort_lists(schema_as_dict=fluent_ds_or_asset_model_as_dict)
    
        if "Excel" in str(schema_path):
            pytest.xfail(reason="Sorting of nested anyOf key")
    
>       assert (
            schema_as_dict == fluent_ds_or_asset_model_as_dict
        ), "Schemas are out of sync. Run `invoke schema --sync`. Also check your pandas version."
E       AssertionError: Schemas are out of sync. Run `invoke schema --sync`. Also check your pandas version.
E       assert equals failed
E               'required': ['key'],             'required': ['key'],      
E               'title': 'Sorter',               'title': 'Sorter',        
E               'type': 'object',                'type': 'object',         
E             },                               },                          
E           },                               },                            
E         #x00-  'description': '--Public API-#x01  #x00+  'description': '--Public API-#x01 
E         #x00--',#x01                              #x00+-\nPandasAzureBlobStorageDataso#x01 
E                                          #x00+urce is a PandasDatasource that#x01 
E                                          #x00+ uses Azure Blob Storage as a\n#x01 
E                                          #x00+data store.',#x01                   
E           'properties': {                  'properties': {               
E             'assets': {                      'assets': {                 
E               'default': [],                   'default': [],            
E               'items': {                       'items': {                
E                 '$ref': '#/definitions/          '$ref': '#/definitions/ 
E         FileDataAsset',                  FileDataAsset',

.../datasource/fluent/test_schemas.py:135: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

…m/great-expectations/great_expectations into m/_/enforce_public_api_docstrings

first pass

45dbaa8

cdkini added 4 commits December 19, 2024 11:28

add issues

83e01df

move to test

fc27905

more detailed name

5f7a050

delete issues

b90e50c

cdkini and others added 12 commits December 19, 2024 13:11

some more

5c81607

some more

81454a3

some more

9b039d6

Add docstrings add_asset methods on the sqlite datasource.

5a5716e

some more

8110bd2

Merge branch 'm/_/enforce_public_api_docstrings' of https://github.co…

b9c8e87

…m/great-expectations/great_expectations into m/_/enforce_public_api_docstrings

some more

57a737c

some more

47d5de7

some more

80ddda4

update logic

85677b1

Add docstrings to sql_datasource.py

fb9c09e

schemas

a25b2f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAINTENANCE] Enforce mandatory docstrings for public API decorated objects #10799

[MAINTENANCE] Enforce mandatory docstrings for public API decorated objects #10799

cdkini commented Dec 19, 2024 •

edited

Loading

netlify bot commented Dec 19, 2024 •

edited

Loading

codecov bot commented Dec 19, 2024 •

edited

Loading

[MAINTENANCE] Enforce mandatory docstrings for public API decorated objects #10799

Are you sure you want to change the base?

[MAINTENANCE] Enforce mandatory docstrings for public API decorated objects #10799

Conversation

cdkini commented Dec 19, 2024 • edited Loading

netlify bot commented Dec 19, 2024 • edited Loading

✅ Deploy Preview for niobium-lead-7998 ready!

codecov bot commented Dec 19, 2024 • edited Loading

❌ 4 Tests Failed:

cdkini commented Dec 19, 2024 •

edited

Loading

netlify bot commented Dec 19, 2024 •

edited

Loading

codecov bot commented Dec 19, 2024 •

edited

Loading