Fix and rework GPT-TF.js #807

JulienVig · 2024-10-16T16:51:51Z

Closes #654

Fix weight initialization from zero to random uniform -> Fixes GPT2 NaN loss
Implement weight sharing between token embeddings and the language modeling head -> -20% memory footprint
Improve generation with top k sampling option
Add seed for deterministic runs
Implement text loaders by byte chunk rather than by line which doesn't require to pad each line to the context length
Waiting on Transformers.js #984 and #1019 to migrate to @huggingface/transformers v3.

tharvik

superb work, thanks for clearing the GPT's mud, every comments makes it more understandable!
yeah, sadly, as I forgot to merge the processing PR (#781) before you branched off, the whole processing pipeline changed a lot. sorry for the toes stepping (hopefully, it will simplify this PR).

btw, it seems that @xenova/transformer has been recently updated to @huggingface/transformer. did you try it out? maybe it'll help with the tokenizer usage (doesn't look much changed to me but you know best)

discojs-node/src/loaders/text.ts

cli/src/benchmark_gpt.ts

discojs-node/src/loaders.spec.ts

docs/CONTRIBUTING.md

webapp/cypress.config.ts

webapp/src/components/dataset_input/TextDatasetInput.vue

webapp/src/components/testing/TestSteps.vue

martinjaggi · 2024-11-14T09:03:47Z

maybe rename block size to context len, that would be more specific

tharvik

only cosmetic comments, thanks for the huge amount of work 🎉

discojs-node/src/loaders/text.ts

discojs/src/dataset/dataset.ts

discojs/src/processing/text.ts

discojs/src/dataset/dataset.ts

docs/CONTRIBUTING.md

JulienVig · 2024-12-03T12:05:04Z

I'll wait for #840 and #825 to be merged before this one.

…odeling head and attention bias

… implement topk sampling

…ers following GPT2 convention, use LogLayer

… and language modeling head

…red task parameter for text tasks

…he model init config

martinjaggi · 2024-12-06T13:45:29Z

well done!

JulienVig self-assigned this Oct 16, 2024

JulienVig force-pushed the 654-improve-gpt-julien branch 20 times, most recently from 1d88d35 to 03e5c7d Compare October 23, 2024 11:58

JulienVig marked this pull request as ready for review October 24, 2024 15:58

JulienVig requested a review from tharvik October 24, 2024 15:58

tharvik requested changes Nov 1, 2024

View reviewed changes

JulienVig force-pushed the 654-improve-gpt-julien branch from 0a28b81 to f7f96dc Compare November 12, 2024 15:49

JulienVig force-pushed the 654-improve-gpt-julien branch 3 times, most recently from 68d957b to 8cbc96e Compare November 14, 2024 16:49

JulienVig added this to the v4.0.0 milestone Nov 14, 2024

JulienVig requested review from tharvik and removed request for tharvik November 14, 2024 16:54

JulienVig mentioned this pull request Nov 19, 2024

Benchmark GPT-tfjs #659

Merged

JulienVig requested a review from tharvik December 2, 2024 10:53

tharvik approved these changes Dec 2, 2024

View reviewed changes

JulienVig force-pushed the 654-improve-gpt-julien branch from 8cbc96e to 0781b7c Compare December 3, 2024 11:55

JulienVig added 18 commits December 5, 2024 12:29

docs/CONTRIBUTING: add documentation on debug statements and cypress

33b9fcf

discojs/src/default_tasks: change default batch size and block size

b31485f

cli: add train_gpt script

52663c5

discojs/src/models/gpt/models: fix loss averaging across iterations

a72a3c6

discojs & cli: change gpt2 vocab size to 50257 instead of 50258

0a219e0

discojs/src/models/gpt: allow model init config to be partial

04dd6c3

discojs/src/models/gpt/index: link source repo

0370a20

discojs/src/models/gpt: always use the token embeddings, a language m…

340e171

…odeling head and attention bias

discojs/src/models/gpt: generation code: clean, document and improve.…

40d51d7

… implement topk sampling

discojs/src/models/gpt/layers: document tensor operations, rename lay…

f8ede3a

…ers following GPT2 convention, use LogLayer

discojs/src/models/gpt/layers: share weights between token embeddings…

84cee62

… and language modeling head

discojs/src/models/gpt/layers: fix weight initializations

d3c0689

discojs/src/models/gpt: add seed, loss is now identical between runs

884e14e

discojs/src/task/training_information: make maxSequenceLength a requi…

ed73afb

…red task parameter for text tasks

discojs/src/default_tasks/wikitext: use task's maxSequenceLength in t…

9af6b1f

…he model init config

*: replace line by line text loaders by chunk by chunk text loaders

8806b7a

discojs*: rename .unbatch() to .flatten()

40eb86c

discojs*,cli*: rename blockSize and maxSequenceLength to contextLength

30de4fb

JulienVig force-pushed the 654-improve-gpt-julien branch from 0781b7c to 30de4fb Compare December 5, 2024 11:29

JulienVig merged commit 7844f97 into develop Dec 6, 2024
23 checks passed

JulienVig deleted the 654-improve-gpt-julien branch December 6, 2024 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix and rework GPT-TF.js #807

Fix and rework GPT-TF.js #807

JulienVig commented Oct 16, 2024 •

edited

Loading

tharvik left a comment

martinjaggi commented Nov 14, 2024

tharvik left a comment

JulienVig commented Dec 3, 2024

martinjaggi commented Dec 6, 2024

Fix and rework GPT-TF.js #807

Fix and rework GPT-TF.js #807

Conversation

JulienVig commented Oct 16, 2024 • edited Loading

tharvik left a comment

Choose a reason for hiding this comment

martinjaggi commented Nov 14, 2024

tharvik left a comment

Choose a reason for hiding this comment

JulienVig commented Dec 3, 2024

martinjaggi commented Dec 6, 2024

JulienVig commented Oct 16, 2024 •

edited

Loading