Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tif inferred as pickle #423

Closed
robmarkcole opened this issue Nov 27, 2024 · 1 comment · Fixed by #425
Closed

tif inferred as pickle #423

robmarkcole opened this issue Nov 27, 2024 · 1 comment · Fixed by #425
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@robmarkcole
Copy link
Contributor

robmarkcole commented Nov 27, 2024

🐛 Bug

Returning paths to tifs, but they are inferred as pickle:

# dataset yields paths 

test_dataset[0] == {'image_path': '/teamspace/studios/this_studio/dataset/test/T20MMT_20200505T142729_tile_17_11.tif',
 'mask_path': '/teamspace/studios/this_studio/dataset/test/T20MMT_20200505T142729_tile_17_11_mask.tif',
 'image_id': 'T20MMT_20200505T142729_tile_17_11'}

Logs

Rank 1 inferred the following `['str', 'pickle', 'pickle']` data format.

But expect:

Rank 1 inferred the following `['str', 'tif', 'tif']` data format. 

The script generates the same number of chunk files for both train and test, although train has significantly more tifs..

litdata==0.2.19

Reproducible example:

from litdata.streaming.serializers import _SERIALIZERS

image_path = '/teamspace/studios/this_studio/dataset/test/T10VDK_20200519T193911_tile_0_7.tif'

def evaluate_serializer(filename: str) -> str:
    """
    Evaluate which serializer would be used for a given filename.

    Args:
        filename: The name of the file to evaluate.

    Returns:
        The name of the selected serializer.
    """
    # Iterate through serializers in the order defined in _SERIALIZERS
    for serializer_name, serializer in _SERIALIZERS.items():
        if serializer.can_serialize(filename):
            return serializer_name

    # If no serializer can handle the file, raise an exception
    raise ValueError(f"No suitable serializer found for filename: {filename}")

assert evaluate_serializer(image_path) == 'pickle'
@robmarkcole robmarkcole added bug Something isn't working help wanted Extra attention is needed labels Nov 27, 2024
@robmarkcole
Copy link
Contributor Author

robmarkcole commented Nov 27, 2024

OK, I see

# FileSerializer will be removed in the future.

With this going away, appears tif need a new serializer? Prototyped a solution using tifffile which supports multispectral data (pillow is limited)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant