kaggle titanic tutorial by gpt2 fine-tune

Titanic https://www.kaggle.com/competitions/titanic

Approach

concatenate each features as natural languages

instead of using special tokens like [UNUSED01],
rather assign unused vocabulary for its semantic embeddings.

  pclass = 'seat class'    
  sibsp = 'with sibling'   
  parch = 'parents onboard'   #didn't use same vocab, hoping model won't confuse
  cabin = 'cabin number'   
  embarked = 'embarked port'

If feature name already looks like it has semantic connotation, used it as is.(ex. sex, age, ticket, fare)
didn't use 'Passengerid', 'name' features. assume it will harm accuracy.

inference passenger live or dead with gpt2classification model

train with 200 epochs, cosine annealing with warmup
good fit on train set

HyperParameters

epochs = 200
learning_rate = 0.026
eps = 1e-8
weight_decay = 0
batch_size = 8
cycles = 9 (cosine annealing with warmup)
warmup = training steps/10

Curious things

pad_token = eos_token makes sense?
special token length matter? (1 or more)

Result

🤮Score: 0.67464🤮 (200 epochs)
🤮Score: 0.69377🤮 (50 epochs)

Why it sucks?

maybe no sufficient training data
maybe inappropriate model for task (i think this task not much related to word embeddings)
maybe wrong hyperparameters (some overfitting i guess)

Reference

GPT2 fine tuning for classification https://github.com/gmihaila/ml_things/blob/master/notebooks/pytorch/gpt2_finetune_classification.ipynb
Simple Chit-Chat based on KoGPT2 https://github.com/haven-jeon/KoGPT2-chatbot

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
README.md		README.md
gpt2_titanic.png		gpt2_titanic.png
kill'em_with_the_love.ipynb		kill'em_with_the_love.ipynb
kill'em_with_the_love_inference.ipynb		kill'em_with_the_love_inference.ipynb
plot.png		plot.png
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kaggle titanic tutorial by gpt2 fine-tune

Approach

HyperParameters

Curious things

Result

Why it sucks?

Reference

About

Releases

Packages

Languages

calisolo/titanic_gpt2_fine_tuned

Folders and files

Latest commit

History

Repository files navigation

kaggle titanic tutorial by gpt2 fine-tune

Approach

HyperParameters

Curious things

Result

Why it sucks?

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages