Improve documentation for `load_data` function #372

drbenvincent · 2024-06-26T11:42:09Z

Improve the docstring
Describe what the function does
List the available datasets, potentially with brief descriptions

Ishaanjolly · 2024-07-26T08:23:05Z

Where abouts is the load_function?

drbenvincent · 2024-07-26T11:51:07Z

Hi @Ishaanjolly. It's the load_data function and that's defined in datasets.py

Ishaanjolly · 2024-08-09T16:31:43Z

@drbenvincent The below function currently has a doc string and I presume that the DATASETS have the names of the datasets. Do you want the doc string to be more detailed as in describing the directory change and then reading the csv?

As for the dictionary DATASETS, do you want the small descriptions of to be added in?


def load_data(dataset: str = None) -> pd.DataFrame:
    """Loads the requested dataset and returns a pandas DataFrame.

    :param dataset: The desired dataset to load
    """

    if dataset in DATASETS:
        data_dir = _get_data_home()
        datafile = DATASETS[dataset]
        file_path = data_dir / datafile["filename"]
        return pd.read_csv(file_path)
    else:
        raise ValueError(f"Dataset {dataset} not found!")

drbenvincent · 2024-08-09T16:53:40Z

So the idea is simply to provide a more informative docstring that lists out all the datasets (valid values of the dataset kwarg).

Actually, that might involve quite a lot of duplication and potentially be error prone as new datsets are added. Just an idea, but how about a helper function called something like list_datasets which prints out the info in DATASETS in a nice way. And for the docstring of load_data you could refer to the list_datasets function to get a full list of available datasets.

That idea isn't so good actually, because in order for the API docs on the website to be useful, the info will have to be in the docstring. I wonder if there's a sphinx command to automatically generate docstring text based on the DATASETS dict? If not, then I guess it's a case of manually entering the info into the docstring.

drbenvincent added documentation Improvements or additions to documentation good first issue Good for newcomers labels Jun 26, 2024

drbenvincent assigned Ishaanjolly Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve documentation for `load_data` function #372

Improve documentation for `load_data` function #372

drbenvincent commented Jun 26, 2024

Ishaanjolly commented Jul 26, 2024

drbenvincent commented Jul 26, 2024

Ishaanjolly commented Aug 9, 2024

drbenvincent commented Aug 9, 2024

Improve documentation for load_data function #372

Improve documentation for load_data function #372

Comments

drbenvincent commented Jun 26, 2024

Ishaanjolly commented Jul 26, 2024

drbenvincent commented Jul 26, 2024

Ishaanjolly commented Aug 9, 2024

drbenvincent commented Aug 9, 2024

Improve documentation for `load_data` function #372

Improve documentation for `load_data` function #372