Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and rework GPT-TF.js #807

Merged
merged 18 commits into from
Dec 6, 2024
Merged

Fix and rework GPT-TF.js #807

merged 18 commits into from
Dec 6, 2024

Conversation

JulienVig
Copy link
Collaborator

@JulienVig JulienVig commented Oct 16, 2024

Closes #654

  • Fix weight initialization from zero to random uniform -> Fixes GPT2 NaN loss
  • Implement weight sharing between token embeddings and the language modeling head -> -20% memory footprint
  • Improve generation with top k sampling option
  • Add seed for deterministic runs
  • Implement text loaders by byte chunk rather than by line which doesn't require to pad each line to the context length
  • Waiting on Transformers.js #984 and #1019 to migrate to @huggingface/transformers v3.

@JulienVig JulienVig self-assigned this Oct 16, 2024
@JulienVig JulienVig force-pushed the 654-improve-gpt-julien branch 20 times, most recently from 1d88d35 to 03e5c7d Compare October 23, 2024 11:58
@JulienVig JulienVig marked this pull request as ready for review October 24, 2024 15:58
@JulienVig JulienVig requested a review from tharvik October 24, 2024 15:58
Copy link
Collaborator

@tharvik tharvik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superb work, thanks for clearing the GPT's mud, every comments makes it more understandable!
yeah, sadly, as I forgot to merge the processing PR (#781) before you branched off, the whole processing pipeline changed a lot. sorry for the toes stepping (hopefully, it will simplify this PR).

btw, it seems that @xenova/transformer has been recently updated to @huggingface/transformer. did you try it out? maybe it'll help with the tokenizer usage (doesn't look much changed to me but you know best)

discojs-node/src/loaders/text.ts Outdated Show resolved Hide resolved
discojs-node/src/loaders/text.ts Outdated Show resolved Hide resolved
discojs-node/src/loaders/text.ts Outdated Show resolved Hide resolved
cli/src/benchmark_gpt.ts Outdated Show resolved Hide resolved
discojs-node/src/loaders.spec.ts Outdated Show resolved Hide resolved
docs/CONTRIBUTING.md Outdated Show resolved Hide resolved
webapp/cypress.config.ts Outdated Show resolved Hide resolved
webapp/src/components/dataset_input/TextDatasetInput.vue Outdated Show resolved Hide resolved
webapp/src/components/dataset_input/TextDatasetInput.vue Outdated Show resolved Hide resolved
webapp/src/components/testing/TestSteps.vue Outdated Show resolved Hide resolved
@JulienVig JulienVig force-pushed the 654-improve-gpt-julien branch from 0a28b81 to f7f96dc Compare November 12, 2024 15:49
@martinjaggi
Copy link
Member

maybe rename block size to context len, that would be more specific

@JulienVig JulienVig force-pushed the 654-improve-gpt-julien branch 3 times, most recently from 68d957b to 8cbc96e Compare November 14, 2024 16:49
@JulienVig JulienVig added this to the v4.0.0 milestone Nov 14, 2024
@JulienVig JulienVig requested review from tharvik and removed request for tharvik November 14, 2024 16:54
@JulienVig JulienVig mentioned this pull request Nov 19, 2024
@JulienVig JulienVig requested a review from tharvik December 2, 2024 10:53
Copy link
Collaborator

@tharvik tharvik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only cosmetic comments, thanks for the huge amount of work 🎉

discojs-node/src/loaders/text.ts Outdated Show resolved Hide resolved
discojs/src/dataset/dataset.ts Outdated Show resolved Hide resolved
discojs/src/dataset/dataset.ts Show resolved Hide resolved
discojs/src/dataset/dataset.ts Outdated Show resolved Hide resolved
discojs/src/processing/text.ts Outdated Show resolved Hide resolved
discojs/src/processing/text.ts Show resolved Hide resolved
discojs/src/dataset/dataset.ts Outdated Show resolved Hide resolved
docs/CONTRIBUTING.md Outdated Show resolved Hide resolved
@JulienVig JulienVig force-pushed the 654-improve-gpt-julien branch from 8cbc96e to 0781b7c Compare December 3, 2024 11:55
@JulienVig
Copy link
Collaborator Author

I'll wait for #840 and #825 to be merged before this one.

@JulienVig JulienVig force-pushed the 654-improve-gpt-julien branch from 0781b7c to 30de4fb Compare December 5, 2024 11:29
@JulienVig JulienVig merged commit 7844f97 into develop Dec 6, 2024
23 checks passed
@JulienVig JulienVig deleted the 654-improve-gpt-julien branch December 6, 2024 13:35
@martinjaggi
Copy link
Member

well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve and rework GPT-tfjs
3 participants