Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Run SFT #66

Open
Rui-Yuan91 opened this issue Feb 13, 2024 · 3 comments
Open

Unable to Run SFT #66

Rui-Yuan91 opened this issue Feb 13, 2024 · 3 comments

Comments

@Rui-Yuan91
Copy link

When I run the SFT script in the example by choosing BasicTrainer instead of FSDPTrainer and by disabling wandb logging to avoid other issues:

python -u train.py model=pythia28 datasets=[hh] loss=sft exp_name=anthropic_dpo_pythia28 gradient_accumulation_steps=2 batch_size=64 eval_batch_size=32 trainer=BasicTrainer sample_during_eval=false wandb.enabled=False

I encountered the following error:

building policy
starting single-process worker
Creating trainer on process 0 with world size 1
Loading tokenizer EleutherAI/pythia-2.8b
Loaded train data iterator
Loading HH dataset (test split) from Huggingface...
Error executing job with overrides: ['model=pythia28', 'datasets=[hh]', 'loss=sft', 'exp_name=anthropic_dpo_pythia28', 'gradient_accumulation_steps=2', 'batch_size=64', 'eval_batch_size=32', 'trainer=BasicTrainer', 'sample_during_eval=false', 'wandb.enabled=False']
Traceback (most recent call last):
File "/shared/home/se79359/direct-preference-optimization/train.py", line 114, in main
worker_main(0, 1, config, policy, reference_model)
File "/shared/home/se79359/direct-preference-optimization/train.py", line 42, in worker_main
trainer = TrainerClass(policy, config, config.seed, config.local_run_dir, reference_model=reference_model, rank=rank, world_size=world_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared/home/se79359/direct-preference-optimization/trainers.py", line 179, in init
self.eval_batches = list(self.eval_iterator)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared/home/se79359/direct-preference-optimization/preference_datasets.py", line 320, in get_batch_iterator
for prompt, data in get_dataset(name, split, silent=silent, cache_dir=cache_dir).items():
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared/home/se79359/direct-preference-optimization/preference_datasets.py", line 168, in get_dataset
data = get_hh(split, silent=silent, cache_dir=cache_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared/home/se79359/direct-preference-optimization/preference_datasets.py", line 142, in get_hh
dataset = datasets.load_dataset('Anthropic/hh-rlhf', split=split, cache_dir=cache_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/load.py", line 1773, in load_dataset
builder_instance = load_dataset_builder(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/load.py", line 1502, in load_dataset_builder
dataset_module = dataset_module_factory(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/load.py", line 1219, in dataset_module_factory
raise e1 from None
File "/usr/local/lib/python3.11/site-packages/datasets/load.py", line 1203, in dataset_module_factory
).get_module()
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/load.py", line 769, in get_module
else get_data_patterns_in_dataset_repository(hfh_dataset_info, self.data_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/data_files.py", line 662, in get_data_patterns_in_dataset_repository
return _get_data_files_patterns(resolver)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/data_files.py", line 223, in _get_data_files_patterns
data_files = pattern_resolver(pattern)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/data_files.py", line 473, in _resolve_single_pattern_in_dataset_repository
glob_iter = [PurePath(filepath) for filepath in fs.glob(PurePath(pattern).as_posix()) if fs.isfile(filepath)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fsspec/spec.py", line 606, in glob
pattern = glob_translate(path + ("/" if ends_with_sep else ""))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fsspec/utils.py", line 734, in glob_translate
raise ValueError(
ValueError: Invalid pattern: '**' can only be an entire path component

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

@BigBinnie
Copy link

Have you solved the problem? I also met the same issue.

@BigBinnie
Copy link

I think it's because of the version of datasets package. I just upgraded datasets and it works for me.

@Yanfors
Copy link

Yanfors commented May 13, 2024

@BigBinnie What version of the datasets package are you using? I still have the same problem after updating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants