Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: buffer size must be a multiple of element size #102

Open
awaelchli opened this issue Apr 18, 2024 · 0 comments
Open

ValueError: buffer size must be a multiple of element size #102

awaelchli opened this issue Apr 18, 2024 · 0 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@awaelchli
Copy link
Contributor

🐛 Bug

A very simple example of optimizing a tensor dataset inspired by the README does not work.

To Reproduce

Steps to reproduce the behavior:

Install litdata 0.2.3 or main branch. Run the code below.

Code sample

import torch
from litdata import optimize
from litdata import StreamingDataset
from torch.utils.data import DataLoader


def random_images(index):
    return torch.randint(0, 256, (32, 32, 3))


if __name__ == "__main__":
    optimize(
        fn=random_images,
        inputs=list(range(10)), 
        output_dir="my_optimized_dataset", 
        num_workers=2,
        chunk_bytes="5MB"
    )
    
    dataset = StreamingDataset('my_optimized_dataset')
    dataloader = DataLoader(dataset)

    iterator = iter(dataloader)
    next(iterator)

Error:

Traceback (most recent call last):
  File "/teamspace/studios/this_studio/main.py", line 25, in <module>
    next(iterator)
  File "/teamspace/studios/this_studio/litdata/src/litdata/streaming/dataloader.py", line 598, in __iter__
    for batch in super().__iter__():
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
    data = self._next_data()
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
    data.append(next(self.dataset_iter))
  File "/teamspace/studios/this_studio/litdata/src/litdata/streaming/dataset.py", line 298, in __next__
    data = self.__getitem__(
  File "/teamspace/studios/this_studio/litdata/src/litdata/streaming/dataset.py", line 268, in __getitem__
    return self.cache[index]
  File "/teamspace/studios/this_studio/litdata/src/litdata/streaming/cache.py", line 128, in __getitem__
    return self._reader.read(index)
  File "/teamspace/studios/this_studio/litdata/src/litdata/streaming/reader.py", line 252, in read
    item = self._item_loader.load_item_from_chunk(
  File "/teamspace/studios/this_studio/litdata/src/litdata/streaming/item_loader.py", line 126, in load_item_from_chunk
    return self.deserialize(data)
  File "/teamspace/studios/this_studio/litdata/src/litdata/streaming/item_loader.py", line 136, in deserialize
    data.append(serializer.deserialize(data_bytes))
  File "/teamspace/studios/this_studio/litdata/src/litdata/streaming/serializers.py", line 185, in deserialize
    shape.append(np.frombuffer(data[8 + 4 * shape_idx : 8 + 4 * (shape_idx + 1)], np.uint32).item())
ValueError: buffer size must be a multiple of element size

Expected behavior

Iterator returns the tensor.

Environment

Fresh CPU Studio.

@awaelchli awaelchli added bug Something isn't working help wanted Extra attention is needed labels Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant