Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I am getting this error: ValueError: Trainer: evaluation requires an eval_dataset. #22

Open
furkansherwani opened this issue Mar 7, 2024 · 3 comments

Comments

@furkansherwani
Copy link

ValueError Traceback (most recent call last)
in <cell line: 4>()
2 get_ipython().system(' pip install -U accelerate')
3 get_ipython().system(' pip install -U transformers')
----> 4 model_trainer = t5_exp.train(id_tokenized_ds, **training_args)

6 frames
/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in get_eval_dataloader(self, eval_dataset)
886 """
887 if eval_dataset is None and self.eval_dataset is None:
--> 888 raise ValueError("Trainer: evaluation requires an eval_dataset.")
889 eval_dataset = eval_dataset if eval_dataset is not None else self.eval_dataset
890 data_collator = self.data_collator

ValueError: Trainer: evaluation requires an eval_dataset.

@kevinscaria
Copy link
Owner

The script requires an eval dataset to be provided as an argument. However, please try to debug this as this repo is not actively maintained.

@furkansherwani
Copy link
Author

But there is no .csv file in validation folder in Datasets. It is a .json file. Please help me understand this.

@cyborgrob
Copy link

@furkansherwani Just split the test dataset (or split it however you want really):

# Split the test dataset in half
train_test_split = id_tokenized_ds['test'].train_test_split(test_size=0.5)

Then rename one portion to 'validation':

id_tokenized_ds['test'] = train_test_split['train']
id_tokenized_ds['validation'] = train_test_split['test'] # Use 'test' as the validation set
id_tokenized_ds

DatasetDict({
train: Dataset({
features: ['raw_text', 'aspectTerms', 'labels', 'text', 'index_level_0', 'input_ids', 'attention_mask'],
num_rows: 590
})
test: Dataset({
features: ['raw_text', 'aspectTerms', 'labels', 'text', 'input_ids', 'attention_mask'],
num_rows: 127
})
validation: Dataset({
features: ['raw_text', 'aspectTerms', 'labels', 'text', 'input_ids', 'attention_mask'],
num_rows: 127
})
})

This solved the error in my case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants