Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] Enforce mandatory docstrings for public API decorated objects #10799

Open
wants to merge 17 commits into
base: develop
Choose a base branch
from

Conversation

cdkini
Copy link
Member

@cdkini cdkini commented Dec 19, 2024

Refer to ADR 0005 - we should always have docstrings when dealing with public objects that are rendered in our docs

  • Description of PR changes above includes a link to an existing GitHub issue
  • PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB]
  • Code is linted - run invoke lint (uses ruff format + ruff check)
  • Appropriate tests and docs have been updated

For more information about contributing, visit our community resources.

After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!

Copy link

netlify bot commented Dec 19, 2024

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/676498a924c64e6a95f22f4a
😎 Deploy Preview https://deploy-preview-10799.docs.greatexpectations.io
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Dec 19, 2024

❌ 4 Tests Failed:

Tests completed Failed Passed Skipped
2966 4 2962 47
View the top 3 failed tests by shortest run time
tests.datasource.fluent.test_schemas::test_vcs_schemas_match[pandas_s3:PandasS3Datasource]
Stack Traces | 0.2s run time
fluent_ds_or_asset_model = <class 'great_expectations.datasource.fluent.pandas_s3_datasource.PandasS3Datasource'>
schema_dir = PosixPath('.../datasource/fluent/schemas')

    @pytest.mark.skipif(
        min_supported_python() < PYTHON_VERSION,
        reason=f"_sort_any_of() keys needs to be fixed for py {PYTHON_VERSION}",
    )
    @pytest.mark.timeout(
        2.0  # this is marked as unit so that it will run on different versions of python
    )
    @pytest.mark.unit
    @pytest.mark.parametrize(
        ["fluent_ds_or_asset_model", "schema_dir"],
        [pytest.param(t[0], t[1], id=t[2]) for t in _models_and_schema_dirs()],
    )
    def test_vcs_schemas_match(  # noqa: C901
        fluent_ds_or_asset_model: Type[Datasource | DataAsset], schema_dir: pathlib.Path
    ):
        """
        Test that json schemas for each DataSource and DataAsset match the current schema
        under version control.
    
        This is important because some of these classes are generated dynamically and
        monitoring these schemas files can clue us into problems that could otherwise go
        unnoticed.
    
        If this test is failing run `invoke schema --sync` to update schemas and commit the
        changes.
    
        Note: if the installed version of pandas doesn't match the one used in the standard
        test pipeline, the test will be marked a `xfail` because the schemas will not match.
        """
    
        def _sort_any_of(d: dict) -> str:
            if items := d.get("items"):
                _sort_any_of(items)
            if ref := d.get("$ref"):
                return ref
            if type_ := d.get("type"):
                return type_
            # return any string for sorting
            return "z"
    
        def _sort_lists(schema_as_dict: dict) -> None:
            """Sometimes "required" and "anyOf" lists come unsorted, causing misleading assertion failures; this corrects the issue.
    
            Args:
                schema_as_dict: source dictionary (will be modified "in-situ")
    
            """  # noqa: E501
            key: str
            value: Any
    
            for key, value in schema_as_dict.items():
                if key == "required":
                    schema_as_dict[key] = sorted(value)
                elif key == "anyOf":
                    if isinstance(value, list):
                        for val in value:
                            if isinstance(value, dict):
                                schema_as_dict[key] = sorted(val, key=_sort_any_of)
                    else:
                        schema_as_dict[key] = sorted(value, key=_sort_any_of)
    
                if isinstance(value, dict):
                    _sort_lists(schema_as_dict=value)
    
        if "Pandas" in str(schema_dir) and _PANDAS_SCHEMA_VERSION != PANDAS_VERSION:
            pytest.xfail(reason=f"schema generated with pandas {_PANDAS_SCHEMA_VERSION}")
    
        print(f"python version: {sys.version.split()[0]}")
        print(f"pandas version: {PANDAS_VERSION}")
    
        schema_path = schema_dir.joinpath(f"{fluent_ds_or_asset_model.__name__}.json")
        print(schema_path)
    
        json_str = schema_path.read_text().rstrip()
    
        schema_as_dict = json.loads(json_str)
        _sort_lists(schema_as_dict=schema_as_dict)
        # we have tuples in our schema, which are mutated to lists when dumped to json
        # dump and reload the schema dict to ensure we are comparing
        fluent_ds_or_asset_model_as_dict = json.loads(fluent_ds_or_asset_model.schema_json())
        _sort_lists(schema_as_dict=fluent_ds_or_asset_model_as_dict)
    
        if "Excel" in str(schema_path):
            pytest.xfail(reason="Sorting of nested anyOf key")
    
>       assert (
            schema_as_dict == fluent_ds_or_asset_model_as_dict
        ), "Schemas are out of sync. Run `invoke schema --sync`. Also check your pandas version."
E       AssertionError: Schemas are out of sync. Run `invoke schema --sync`. Also check your pandas version.
E       assert equals failed
E               'required': ['key'],             'required': ['key'],      
E               'title': 'Sorter',               'title': 'Sorter',        
E               'type': 'object',                'type': 'object',         
E             },                               },                          
E           },                               },                            
E         #x00-  'description': '--Public API-#x01  #x00+  'description': '--Public API-#x01 
E         #x00--',#x01                              #x00+-\nPandasS3Datasource is a Pand#x01 
E                                          #x00+asDatasource that uses Amazon S#x01 
E                                          #x00+3 as a data store.',#x01            
E           'properties': {                  'properties': {               
E             'assets': {                      'assets': {                 
E               'default': [],                   'default': [],            
E               'items': {                       'items': {                
E                 '$ref': '#/definitions/          '$ref': '#/definitions/ 
E         FileDataAsset',                  FileDataAsset',

.../datasource/fluent/test_schemas.py:135: AssertionError
tests.datasource.fluent.test_schemas::test_vcs_schemas_match[pandas_gcs:PandasGoogleCloudStorageDatasource]
Stack Traces | 0.202s run time
fluent_ds_or_asset_model = <class 'great_expectations.datasource.fluent.pandas_google_cloud_storage_datasource.PandasGoogleCloudStorageDatasource'>
schema_dir = PosixPath('.../datasource/fluent/schemas')

    @pytest.mark.skipif(
        min_supported_python() < PYTHON_VERSION,
        reason=f"_sort_any_of() keys needs to be fixed for py {PYTHON_VERSION}",
    )
    @pytest.mark.timeout(
        2.0  # this is marked as unit so that it will run on different versions of python
    )
    @pytest.mark.unit
    @pytest.mark.parametrize(
        ["fluent_ds_or_asset_model", "schema_dir"],
        [pytest.param(t[0], t[1], id=t[2]) for t in _models_and_schema_dirs()],
    )
    def test_vcs_schemas_match(  # noqa: C901
        fluent_ds_or_asset_model: Type[Datasource | DataAsset], schema_dir: pathlib.Path
    ):
        """
        Test that json schemas for each DataSource and DataAsset match the current schema
        under version control.
    
        This is important because some of these classes are generated dynamically and
        monitoring these schemas files can clue us into problems that could otherwise go
        unnoticed.
    
        If this test is failing run `invoke schema --sync` to update schemas and commit the
        changes.
    
        Note: if the installed version of pandas doesn't match the one used in the standard
        test pipeline, the test will be marked a `xfail` because the schemas will not match.
        """
    
        def _sort_any_of(d: dict) -> str:
            if items := d.get("items"):
                _sort_any_of(items)
            if ref := d.get("$ref"):
                return ref
            if type_ := d.get("type"):
                return type_
            # return any string for sorting
            return "z"
    
        def _sort_lists(schema_as_dict: dict) -> None:
            """Sometimes "required" and "anyOf" lists come unsorted, causing misleading assertion failures; this corrects the issue.
    
            Args:
                schema_as_dict: source dictionary (will be modified "in-situ")
    
            """  # noqa: E501
            key: str
            value: Any
    
            for key, value in schema_as_dict.items():
                if key == "required":
                    schema_as_dict[key] = sorted(value)
                elif key == "anyOf":
                    if isinstance(value, list):
                        for val in value:
                            if isinstance(value, dict):
                                schema_as_dict[key] = sorted(val, key=_sort_any_of)
                    else:
                        schema_as_dict[key] = sorted(value, key=_sort_any_of)
    
                if isinstance(value, dict):
                    _sort_lists(schema_as_dict=value)
    
        if "Pandas" in str(schema_dir) and _PANDAS_SCHEMA_VERSION != PANDAS_VERSION:
            pytest.xfail(reason=f"schema generated with pandas {_PANDAS_SCHEMA_VERSION}")
    
        print(f"python version: {sys.version.split()[0]}")
        print(f"pandas version: {PANDAS_VERSION}")
    
        schema_path = schema_dir.joinpath(f"{fluent_ds_or_asset_model.__name__}.json")
        print(schema_path)
    
        json_str = schema_path.read_text().rstrip()
    
        schema_as_dict = json.loads(json_str)
        _sort_lists(schema_as_dict=schema_as_dict)
        # we have tuples in our schema, which are mutated to lists when dumped to json
        # dump and reload the schema dict to ensure we are comparing
        fluent_ds_or_asset_model_as_dict = json.loads(fluent_ds_or_asset_model.schema_json())
        _sort_lists(schema_as_dict=fluent_ds_or_asset_model_as_dict)
    
        if "Excel" in str(schema_path):
            pytest.xfail(reason="Sorting of nested anyOf key")
    
>       assert (
            schema_as_dict == fluent_ds_or_asset_model_as_dict
        ), "Schemas are out of sync. Run `invoke schema --sync`. Also check your pandas version."
E       AssertionError: Schemas are out of sync. Run `invoke schema --sync`. Also check your pandas version.
E       assert equals failed
E               'required': ['key'],             'required': ['key'],      
E               'title': 'Sorter',               'title': 'Sorter',        
E               'type': 'object',                'type': 'object',         
E             },                               },                          
E           },                               },                            
E         #x00-  'description': '--Public API-#x01  #x00+  'description': '--Public API-#x01 
E         #x00--',#x01                              #x00+-\nPandasGoogleCloudStorageData#x01 
E                                          #x00+source is a PandasDatasource th#x01 
E                                          #x00+at uses Google Cloud Storage as#x01 
E                                          #x00+ a\ndata store.',#x01               
E           'properties': {                  'properties': {               
E             'assets': {                      'assets': {                 
E               'default': [],                   'default': [],            
E               'items': {                       'items': {                
E                 '$ref': '#/definitions/          '$ref': '#/definitions/ 
E         FileDataAsset',                  FileDataAsset',

.../datasource/fluent/test_schemas.py:135: AssertionError
tests.datasource.fluent.test_schemas::test_vcs_schemas_match[pandas_abs:PandasAzureBlobStorageDatasource]
Stack Traces | 0.206s run time
fluent_ds_or_asset_model = <class 'great_expectations.datasource.fluent.pandas_azure_blob_storage_datasource.PandasAzureBlobStorageDatasource'>
schema_dir = PosixPath('.../datasource/fluent/schemas')

    @pytest.mark.skipif(
        min_supported_python() < PYTHON_VERSION,
        reason=f"_sort_any_of() keys needs to be fixed for py {PYTHON_VERSION}",
    )
    @pytest.mark.timeout(
        2.0  # this is marked as unit so that it will run on different versions of python
    )
    @pytest.mark.unit
    @pytest.mark.parametrize(
        ["fluent_ds_or_asset_model", "schema_dir"],
        [pytest.param(t[0], t[1], id=t[2]) for t in _models_and_schema_dirs()],
    )
    def test_vcs_schemas_match(  # noqa: C901
        fluent_ds_or_asset_model: Type[Datasource | DataAsset], schema_dir: pathlib.Path
    ):
        """
        Test that json schemas for each DataSource and DataAsset match the current schema
        under version control.
    
        This is important because some of these classes are generated dynamically and
        monitoring these schemas files can clue us into problems that could otherwise go
        unnoticed.
    
        If this test is failing run `invoke schema --sync` to update schemas and commit the
        changes.
    
        Note: if the installed version of pandas doesn't match the one used in the standard
        test pipeline, the test will be marked a `xfail` because the schemas will not match.
        """
    
        def _sort_any_of(d: dict) -> str:
            if items := d.get("items"):
                _sort_any_of(items)
            if ref := d.get("$ref"):
                return ref
            if type_ := d.get("type"):
                return type_
            # return any string for sorting
            return "z"
    
        def _sort_lists(schema_as_dict: dict) -> None:
            """Sometimes "required" and "anyOf" lists come unsorted, causing misleading assertion failures; this corrects the issue.
    
            Args:
                schema_as_dict: source dictionary (will be modified "in-situ")
    
            """  # noqa: E501
            key: str
            value: Any
    
            for key, value in schema_as_dict.items():
                if key == "required":
                    schema_as_dict[key] = sorted(value)
                elif key == "anyOf":
                    if isinstance(value, list):
                        for val in value:
                            if isinstance(value, dict):
                                schema_as_dict[key] = sorted(val, key=_sort_any_of)
                    else:
                        schema_as_dict[key] = sorted(value, key=_sort_any_of)
    
                if isinstance(value, dict):
                    _sort_lists(schema_as_dict=value)
    
        if "Pandas" in str(schema_dir) and _PANDAS_SCHEMA_VERSION != PANDAS_VERSION:
            pytest.xfail(reason=f"schema generated with pandas {_PANDAS_SCHEMA_VERSION}")
    
        print(f"python version: {sys.version.split()[0]}")
        print(f"pandas version: {PANDAS_VERSION}")
    
        schema_path = schema_dir.joinpath(f"{fluent_ds_or_asset_model.__name__}.json")
        print(schema_path)
    
        json_str = schema_path.read_text().rstrip()
    
        schema_as_dict = json.loads(json_str)
        _sort_lists(schema_as_dict=schema_as_dict)
        # we have tuples in our schema, which are mutated to lists when dumped to json
        # dump and reload the schema dict to ensure we are comparing
        fluent_ds_or_asset_model_as_dict = json.loads(fluent_ds_or_asset_model.schema_json())
        _sort_lists(schema_as_dict=fluent_ds_or_asset_model_as_dict)
    
        if "Excel" in str(schema_path):
            pytest.xfail(reason="Sorting of nested anyOf key")
    
>       assert (
            schema_as_dict == fluent_ds_or_asset_model_as_dict
        ), "Schemas are out of sync. Run `invoke schema --sync`. Also check your pandas version."
E       AssertionError: Schemas are out of sync. Run `invoke schema --sync`. Also check your pandas version.
E       assert equals failed
E               'required': ['key'],             'required': ['key'],      
E               'title': 'Sorter',               'title': 'Sorter',        
E               'type': 'object',                'type': 'object',         
E             },                               },                          
E           },                               },                            
E         #x00-  'description': '--Public API-#x01  #x00+  'description': '--Public API-#x01 
E         #x00--',#x01                              #x00+-\nPandasAzureBlobStorageDataso#x01 
E                                          #x00+urce is a PandasDatasource that#x01 
E                                          #x00+ uses Azure Blob Storage as a\n#x01 
E                                          #x00+data store.',#x01                   
E           'properties': {                  'properties': {               
E             'assets': {                      'assets': {                 
E               'default': [],                   'default': [],            
E               'items': {                       'items': {                
E                 '$ref': '#/definitions/          '$ref': '#/definitions/ 
E         FileDataAsset',                  FileDataAsset',

.../datasource/fluent/test_schemas.py:135: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants