Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation for load_data function #372

Open
drbenvincent opened this issue Jun 26, 2024 · 4 comments
Open

Improve documentation for load_data function #372

drbenvincent opened this issue Jun 26, 2024 · 4 comments
Assignees
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers

Comments

@drbenvincent
Copy link
Collaborator

  • Improve the docstring
  • Describe what the function does
  • List the available datasets, potentially with brief descriptions
@drbenvincent drbenvincent added documentation Improvements or additions to documentation good first issue Good for newcomers labels Jun 26, 2024
@Ishaanjolly
Copy link

Where abouts is the load_function?

@drbenvincent
Copy link
Collaborator Author

Hi @Ishaanjolly. It's the load_data function and that's defined in datasets.py

@Ishaanjolly
Copy link

@drbenvincent The below function currently has a doc string and I presume that the DATASETS have the names of the datasets. Do you want the doc string to be more detailed as in describing the directory change and then reading the csv?

As for the dictionary DATASETS, do you want the small descriptions of to be added in?


def load_data(dataset: str = None) -> pd.DataFrame:
    """Loads the requested dataset and returns a pandas DataFrame.

    :param dataset: The desired dataset to load
    """

    if dataset in DATASETS:
        data_dir = _get_data_home()
        datafile = DATASETS[dataset]
        file_path = data_dir / datafile["filename"]
        return pd.read_csv(file_path)
    else:
        raise ValueError(f"Dataset {dataset} not found!")


@drbenvincent
Copy link
Collaborator Author

So the idea is simply to provide a more informative docstring that lists out all the datasets (valid values of the dataset kwarg).

Actually, that might involve quite a lot of duplication and potentially be error prone as new datsets are added. Just an idea, but how about a helper function called something like list_datasets which prints out the info in DATASETS in a nice way. And for the docstring of load_data you could refer to the list_datasets function to get a full list of available datasets.

That idea isn't so good actually, because in order for the API docs on the website to be useful, the info will have to be in the docstring. I wonder if there's a sphinx command to automatically generate docstring text based on the DATASETS dict? If not, then I guess it's a case of manually entering the info into the docstring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants