Using [text,labels] instead of just [text] in Datasets #21

imthebilliejoe · 2023-01-03T00:15:11Z

Hi, I'd like to start with a big thanx for your amazing work. I would like to use your library to fine tune GPT-NEO to a Text2Text task instead of TextGeneration. I'm try to adapt your script run_clm.py to handle not only a Dataset with just [text] but with a structure [text,label].

So I'm now trying to create a train_dataset that is built by these new two tokenized dataset, built this way:

def tokenize_function_text(examples): return tokenizer(examples["text"])

tokenized_datasets_text = datasets.map( tokenize_function_text, batched=True, num_proc=data_args.preprocessing_num_workers, remove_columns=column_names, load_from_cache_file=not data_args.overwrite_cache)

def tokenize_function_label(examples): return tokenizer(examples["label"])

tokenized_datasets_label = datasets.map( tokenize_function_label, batched=True, num_proc=data_args.preprocessing_num_workers, remove_columns=column_names, load_from_cache_file=not data_args.overwrite_cache, )

But I'm now really struggling to mix them togheter in a single object "train_dataset" that i want to give to the trainer. Do you have any tips or suggestion to give me?

thank you very much

The text was updated successfully, but these errors were encountered:

imthebilliejoe · 2023-01-03T12:24:09Z

in case someone was trying to do the same i solved the issue adapting the code on this article:

http://mohitmayank.com/a_lazy_data_science_guide/natural_language_processing/GPTs/#finetuning-gpt-2-for-sentiment-classification

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using [text,labels] instead of just [text] in Datasets #21

Using [text,labels] instead of just [text] in Datasets #21

imthebilliejoe commented Jan 3, 2023

imthebilliejoe commented Jan 3, 2023

Using [text,labels] instead of just [text] in Datasets #21

Using [text,labels] instead of just [text] in Datasets #21

Comments

imthebilliejoe commented Jan 3, 2023

imthebilliejoe commented Jan 3, 2023