Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1461192: Fix StringMethods and DatetimeProperties docs #1741

Merged
merged 7 commits into from
Jun 11, 2024

Conversation

sfc-gh-joshi
Copy link
Contributor

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1461192

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
  3. Please describe how your code solves the related issue.

SNOW-1445536 removed our copies of the StringMethods and DatetimeProperties classes in favor of their upstream modin versions. After this PR, doctests (run via pytest src/snowflake/snowpark/modin/plugin/docstrings/series_utils.py) still passed, but the __doc__ attributes of these classes' methods were not correctly updated since we did not use modin.config.DocModule (only snowflake.snowpark.modin.config.DocModule).

This PR sets modin's DocModule attribute so the docstrings are appropriately overridden, and changes some module initialization order to accommodate this change.

@sfc-gh-joshi sfc-gh-joshi requested a review from a team as a code owner June 5, 2024 22:57
@sfc-gh-joshi sfc-gh-joshi added NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs labels Jun 5, 2024
Copy link
Contributor

@sfc-gh-mvashishtha sfc-gh-mvashishtha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Collaborator

@sfc-gh-azhan sfc-gh-azhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not showing any implemented APIs from src/snowflake/snowpark/modin/plugin/docstrings/series_utils.py in our doc. Can you help to fix it too?

@sfc-gh-joshi sfc-gh-joshi removed the NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs label Jun 6, 2024
@sfc-gh-joshi
Copy link
Contributor Author

@sfc-gh-azhan It seems like this is nontrivial to do since sphinx isn't currently autogenerating docstrings for those Series.str or Series.dt. Since this isn't a regression (as far as I can tell, the docs haven't been generated since before we migrated from the snowpandas repo), I'll follow up with it later (SNOW-1464571).

@sfc-gh-joshi sfc-gh-joshi added the NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs label Jun 6, 2024
Copy link
Contributor

@sfc-gh-dpetersohn sfc-gh-dpetersohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sfc-gh-joshi sfc-gh-joshi force-pushed the joshi-SNOW-1461192-dt-docs branch from 0b789bd to ac46f41 Compare June 7, 2024 00:07
Copy link
Contributor

@sfc-gh-helmeleegy sfc-gh-helmeleegy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix!

@sfc-gh-joshi
Copy link
Contributor Author

I'm having some struggles getting doctests to work properly with this new import order. Running individual doctests (for example, pytest pytest src/snowflake/snowpark/modin/pandas/base.py::snowpark.modin.pandas.base.BasePandasDataset.all) has no issues, but running the whole suite of doctests like CI does (pytest -m doctest src/snowflake/snowpark/modin/ tests/unit/modin -x) seems to cause an importlib.reload somewhere that overwrites our loaded modin extensions. Moreover, running just pytest -m doctest src/snowflake/snowpark/modin/ (not specifying the unit tests) causes no tests to be run at all, so it's very confusing what order the various conftest + module initialization modules are run in.

I'll continue investigating to see if there's a fix for this, otherwise we might have to find a more primitive workaround to avoid incurring importlib.reload. This may be a bug that needs to be fixed upstream.

@sfc-gh-yzou
Copy link
Collaborator

@sfc-gh-joshi does that mean if customer runs importlib.reload in their notebook, it could potentially cause problem to their workload also?

@sfc-gh-joshi
Copy link
Contributor Author

@sfc-gh-yzou Yes, but that's kind of an unavoidable architectural consequence of the extension plugin system. If a user calls importlib.reload(modin) or importlib.reload(modin.pandas) (or on some other submodule of modin), then the reloaded version of modin will have the dictionaries tracking our registered extension methods erased. Calling importlib.reload(snowflake.snowpark.modin.plugin) afterwards would probably fix this, though I haven't tested this interaction.

From what I can tell, the issue here is occurring because when we call modin's DocModule.put, it calls importlib.reload here in order to rerun all of modin's _inherit_docstrings annotations with the newly specified docstring classes as parents. In a REPL or in our other tests, this isn't an issue since we deliberately run import snowflake.snowpark.modin.pandas for the first time only after DocModule.put is called, which means the module reload occurs before any extensions are loaded. However, it seems like something in the way our tox doctest environment is set up breaks this order, and the reload occurs after extensions are loaded.

@sfc-gh-joshi sfc-gh-joshi force-pushed the joshi-SNOW-1461192-dt-docs branch from ac46f41 to e00f714 Compare June 10, 2024 23:44
@sfc-gh-joshi
Copy link
Contributor Author

The fix I've settled on is to manually call modin.utils._inherit_docstrings for StringMethods and DatetimeProperties. Since these are the only two classes that we pull directly from upstream modin, I've elected to hardcode this for the time being; we can revisit a more robust solution as we remove more frontend classes from our fork. This was the only relatively simple solution I tried that worked, and since the cause is importlib.reload calls from upstream modin, I think it is much better to keep this simple workaround until importlib.reload calls are removed upstream.

I've asked @sfc-gh-mvashishtha to merge a fix for the offending issue (modin-project/modin#7138) some time this week, which will hopefully become available in the next modin release. Until then, we should continue to directly call modin.utils._inherit_docstrings as a workaround.

import modin.utils # type: ignore[import] # isort: skip # noqa: E402
import modin.pandas.series_utils # type: ignore[import] # isort: skip # noqa: E402

modin.utils._inherit_docstrings(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good solution!

@sfc-gh-joshi sfc-gh-joshi merged commit fea1faf into main Jun 11, 2024
35 checks passed
@sfc-gh-joshi sfc-gh-joshi deleted the joshi-SNOW-1461192-dt-docs branch June 11, 2024 17:05
@github-actions github-actions bot locked and limited conversation to collaborators Jun 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants