You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello I'm interested in adding this feature anding a function in text2csv.py to take a folder of texts and then in run_clm.py pad and truncate them instead of the group_text function.
The text was updated successfully, but these errors were encountered:
I'm using songs for my data the line new line spacing is important and i would like them to be separate while fine tuning so the end of one song isn't the start of another.
I have it create the csv's so that each row is a song but then when it gets group_text applied to it it concatenates them all and make blocks of 1024. looking into trynig to add the DataCollatorWithPadding but not having much luck at the moment
i also notice that its using <|endoftext|> as bos_token and eos_token wondering how that would affect things and if what im doing is even needed if or if i should just have theses tokens between my examples.
from the config.json in the model
"bos_token_id": 50256,
"embed_dropout": 0,
"eos_token_id": 50256,
Hello I'm interested in adding this feature anding a function in text2csv.py to take a folder of texts and then in run_clm.py pad and truncate them instead of the group_text function.
The text was updated successfully, but these errors were encountered: