You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'd like to start with a big thanx for your amazing work. I would like to use your library to fine tune GPT-NEO to a Text2Text task instead of TextGeneration. I'm try to adapt your script run_clm.py to handle not only a Dataset with just [text] but with a structure [text,label].
So I'm now trying to create a train_dataset that is built by these new two tokenized dataset, built this way:
But I'm now really struggling to mix them togheter in a single object "train_dataset" that i want to give to the trainer. Do you have any tips or suggestion to give me?
thank you very much
The text was updated successfully, but these errors were encountered:
Hi, I'd like to start with a big thanx for your amazing work. I would like to use your library to fine tune GPT-NEO to a Text2Text task instead of TextGeneration. I'm try to adapt your script run_clm.py to handle not only a Dataset with just [text] but with a structure [text,label].
So I'm now trying to create a train_dataset that is built by these new two tokenized dataset, built this way:
def tokenize_function_text(examples): return tokenizer(examples["text"])
tokenized_datasets_text = datasets.map( tokenize_function_text, batched=True, num_proc=data_args.preprocessing_num_workers, remove_columns=column_names, load_from_cache_file=not data_args.overwrite_cache)
def tokenize_function_label(examples): return tokenizer(examples["label"])
tokenized_datasets_label = datasets.map( tokenize_function_label, batched=True, num_proc=data_args.preprocessing_num_workers, remove_columns=column_names, load_from_cache_file=not data_args.overwrite_cache, )
But I'm now really struggling to mix them togheter in a single object "train_dataset" that i want to give to the trainer. Do you have any tips or suggestion to give me?
thank you very much
The text was updated successfully, but these errors were encountered: