To create Fashion12K German Queries dataset we sampled 12k images from Fashion200K dataset and annotated them with German and English queries using Toloka.
Each row in the dataset consists of three entries:
- image url (link to s3 bucket where the original image is hosted),
- English query,
- German query.
Dataset can be downloaded:
- directly via tsv file,
- or by using docArray from Jina AI (our collaborator on the project) via this python script.
Fashion200K dataset dataset is used under the Apache License 2.0.