Bug in building DictModule with old version of PyTorch. #91

jintuzhang · 2023-10-16T15:34:20Z

When using old versions of PyTorch (e.g., 1.10), building a mlcolvars.data.DictModule may cause the following error:

raise ValueError("Sum of input lengths does not equal the length of the input dataset!").

And this is caused by the change of the torch.utils.data.random_split method:

random_split In PyTorch 2.1:

def random_split(dataset: Dataset[T], lengths: Sequence[Union[int, float]],
                 generator: Optional[Generator] = default_generator) -> List[Subset[T]]:
    r"""
    Randomly split a dataset into non-overlapping new datasets of given lengths.

    If a list of fractions that sum up to 1 is given,
    the lengths will be computed automatically as
    floor(frac * len(dataset)) for each fraction provided.

    After computing the lengths, if there are any remainders, 1 count will be
    distributed in round-robin fashion to the lengths
    until there are no remainders left.

    Optionally fix the generator for reproducible results, e.g.:

    Example:
        >>> # xdoctest: +SKIP
        >>> generator1 = torch.Generator().manual_seed(42)
        >>> generator2 = torch.Generator().manual_seed(42)
        >>> random_split(range(10), [3, 7], generator=generator1)
        >>> random_split(range(30), [0.3, 0.3, 0.4], generator=generator2)

    Args:
        dataset (Dataset): Dataset to be split
        lengths (sequence): lengths or fractions of splits to be produced
        generator (Generator): Generator used for the random permutation.
    """

random_split In PyTorch 1.10:

def random_split(dataset: Dataset[T], lengths: Sequence[int],
                 generator: Optional[Generator] = default_generator) -> List[Subset[T]]:
    r"""
    Randomly split a dataset into non-overlapping new datasets of given lengths.
    Optionally fix the generator for reproducible results, e.g.:

    >>> random_split(range(10), [3, 7], generator=torch.Generator().manual_seed(42))

    Args:
        dataset (Dataset): Dataset to be split
        lengths (sequence): lengths of splits to be produced
        generator (Generator): Generator used for the random permutation.
    """

This method is invoked by

mlcolvar/mlcolvar/data/datamodule.py

Line 211 in e356f24

def _split(self, dataset):

Apparently, the _split method passes dataset length fractions to the random_split method, but the old random_split method only accepts explicit dataset lengths as parameters. Thus, it may be reasonable to modify the code to pass actual data lengths.

The text was updated successfully, but these errors were encountered:

EnricoTrizio mentioned this issue Oct 24, 2023

Minor fixes #92

Merged

1 task

luigibonati closed this as completed Oct 24, 2023

EnricoTrizio reopened this Oct 25, 2023

EnricoTrizio added the bug Something isn't working label Nov 14, 2023

EnricoTrizio mentioned this issue Nov 15, 2023

general split_dataset functions #100

Merged

1 task

EnricoTrizio closed this as completed in #100 Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in building DictModule with old version of PyTorch. #91

Bug in building DictModule with old version of PyTorch. #91

jintuzhang commented Oct 16, 2023

Bug in building DictModule with old version of PyTorch. #91

Bug in building DictModule with old version of PyTorch. #91

Comments

jintuzhang commented Oct 16, 2023