Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sphinx AutoAPI >=3.2 generates duplicate definitions of class attributes #476

Open
zaneselvans opened this issue Aug 9, 2024 · 4 comments
Labels

Comments

@zaneselvans
Copy link

With the update to the conda-forge feedstock in #465 I've just upgraded from v3.0.0 to v3.2.1 of sphinx-autoapi, and am now getting errors like these when I run sphinx-build, which I never got before:

sphinx.errors.SphinxWarning: /Users/zane/code/catalyst/pudl/docs/autoapi/pudl/analysis/record_linkage/link_cross_year/index.rst:96:duplicate object description of pudl.analysis.record_linkage.link_cross_year.DistanceMatrix.distance_matrix, other instance in autoapi/pudl/analysis/record_linkage/link_cross_year/index, use :no-index: for one of them

or

sphinx.errors.SphinxWarning: /Users/zane/code/catalyst/pudl/docs/autoapi/pudl/analysis/timeseries_cleaning/index.rst:297:duplicate object description of pudl.analysis.timeseries_cleaning.Timeseries.xi, other instance in autoapi/pudl/analysis/timeseries_cleaning/index, use :no-index: for one of them

By installing previous versions of sphinx-autoapi in my environment using pip I see that the problem first appears in v3.2.0. In v3.1.2 the docs build fine.

This sounds vaguely like the error that was coming up in #452?

The RST generated by v3.2.1, including the class and its attributes:

.. py:class:: Timeseries(x: numpy.ndarray | pandas.DataFrame)

   Multivariate timeseries for anomalies detection and imputation.

   .. attribute:: xi

      Reference to the original values (can be null).
      Many methods assume that these represent chronological, regular timeseries.

   .. attribute:: x

      Copy of :attr:`xi` with any flagged values replaced with null.

   .. attribute:: flags

      Flag label for each value, or null if not flagged.

   .. attribute:: flagged

      Running list of flags that have been checked so far.

   .. attribute:: index

      Row index.

   .. attribute:: columns

      Column names.


   .. py:attribute:: xi
      :type:  numpy.ndarray


   .. py:attribute:: index
      :type:  pandas.Index


   .. py:attribute:: columns
      :type:  pandas.Index


   .. py:attribute:: x
      :type:  numpy.ndarray


   .. py:attribute:: flags
      :type:  numpy.ndarray


   .. py:attribute:: flagged
      :type:  list[str]
      :value: []

vs. by v3.1.2:

.. py:class:: Timeseries(x: numpy.ndarray | pandas.DataFrame)

   Multivariate timeseries for anomalies detection and imputation.

   .. attribute:: xi

      Reference to the original values (can be null).
      Many methods assume that these represent chronological, regular timeseries.

   .. attribute:: x

      Copy of :attr:`xi` with any flagged values replaced with null.

   .. attribute:: flags

      Flag label for each value, or null if not flagged.

   .. attribute:: flagged

      Running list of flags that have been checked so far.

   .. attribute:: index

      Row index.

   .. attribute:: columns

      Column names.

So it seems like AutoAPI is generating two versions of the attributes -- one with the py: prefix (which includes type information) and one without (which includes the attribute docstrings).

@AWhetter
Copy link
Collaborator

It should be possible to change AutoAPI to inspect the class docstring and omit the separate documenting of attributes if they're already covered in the class docstring.

@zaneselvans
Copy link
Author

Hmm, I guess the attributes are kind of separately defined in the docstring and the body of the class. I wonder what changed in 3.2 that triggers this now. And I wonder what would happen if I added the types to the docstring. Maybe as a stopgap I'll attach docstrings to the attributes instead of having them in the class docstring.

class Timeseries:
    """Multivariate timeseries for anomalies detection and imputation.

    Attributes:
        xi: Reference to the original values (can be null).
            Many methods assume that these represent chronological, regular timeseries.
        x: Copy of :attr:`xi` with any flagged values replaced with null.
        flags: Flag label for each value, or null if not flagged.
        flagged: Running list of flags that have been checked so far.
        index: Row index.
        columns: Column names.
    """

    def __init__(self, x: np.ndarray | pd.DataFrame) -> None:
        """Initialize a multivariate timeseries.

        Args:
            x: Timeseries with shape (n observations, m variables).
                If :class:`pandas.DataFrame`, :attr:`index` and :attr:`columns`
                are equal to `x.index` and `x.columns`, respectively.
                Otherwise, :attr:`index` and :attr:`columns` are the default
                `pandas.RangeIndex`.
        """
        self.xi: np.ndarray
        self.index: pd.Index
        self.columns: pd.Index
        if isinstance(x, pd.DataFrame):
            self.xi = x.to_numpy()
            self.index = x.index
            self.columns = x.columns
        else:
            self.xi = x
            self.index = pd.RangeIndex(x.shape[0])
            self.columns = pd.RangeIndex(x.shape[1])
        self.x: np.ndarray = self.xi.copy()
        self.flags: np.ndarray = np.empty(self.x.shape, dtype=object)
        self.flagged: list[str] = []

@AWhetter
Copy link
Collaborator

AutoAPI used to ignore instance attributes that didn't have a docstring. That was counter to how every other member worked so it was changed in v3.2 such that instance attributes are documented using the normal rules.

Instance attributes are a bit of an oddball though because how they're documented varies between tools.
AutoAPI and some other tools like autodoc accept a docstring for attributes by putting a docstring directly after the assignment in __init__. This is the preferred way of doing it because things like documenting inherited attributes works as expected.
But Python itself doesn't recognise these types of docstrings and so it practically requires the documenting of attributes in the docstring. A downside with this is that if you don't document the attributes as the last thing in the docstring -- which is particularly problematic with autoapi_python_class_content and autodoc_class_content because it will append the __init__ docstring to the end of the class docstring -- then everything is rendered in an odd way where you get part of the class docstring rendered, then all of the attributes, then the remainder of the docstring, then every other member.

What you're aiming for with your Timeseries class though is certainly a valid use case. Ideally AutoAPI would append the type information into the already existing definitions of the attributes.
Plus the AutoAPI docs could be doing a better job of explaining this behaviour, and informing users about the options available to them when it comes to documenting attributes.

@zaneselvans
Copy link
Author

My stopgap solution of only documenting via docstrings on the attributes themselves seems to work, but documentation of this behavior would be helpful. I was very confused!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants