You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Loading an HF dataset with several subsets into a single Argilla dataset overrides the records when the dataset has no row id. Subsets have the same structure, so we could load them into the same Argilla dataset, but users need to build the record.id attribute to avoid overrides.
Stacktrace and Code to create the bug
dataset=rg.Dataset(...)
forconfigin ["gpqa_diamond", "gpqa_extended", "gpqa_main"]:
hf_ds=load_dataset("some-dataset-with-several-subsets", name=config, token=HF_TOKEN, split="train")
dataset.records.log(hf_ds.map(lambdar: {"subset": config})) # row_id + split will be use to identify record id -> overriding records
Expected behavior
Environment:
Argilla Version [e.g. 1.0.0]:
ElasticSearch Version [e.g. 7.10.2]:
Docker Image (optional) [e.g. argilla:v1.0.0]:
Additional context
The text was updated successfully, but these errors were encountered:
Describe the bug
Loading an HF dataset with several subsets into a single Argilla dataset overrides the records when the dataset has no row id. Subsets have the same structure, so we could load them into the same Argilla dataset, but users need to build the
record.id
attribute to avoid overrides.Stacktrace and Code to create the bug
Expected behavior
Environment:
Additional context
The text was updated successfully, but these errors were encountered: