Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would it be possible to make a smaller verison of the model? #1

Open
flatsiedatsie opened this issue Feb 4, 2024 · 4 comments
Open

Comments

@flatsiedatsie
Copy link

Hey Edwin

Supertof project!

I'm trying to use your model in a web-based environment, but apparently the browser doesn't like models that are 4Gb in size. I was trying to create a really easy to use tool for my niece that would allow her to summarize PDF's.

Similarly, I've been trying to use your model on a Raspberry pi 5, but there too it would be great if there was a model that was, say, 2Gb in size. That way it could still comfortably run on devices with 4Gb or ram too.

Curious to hear if that's even possible. Or could I make the model smaller myself? I don't have any experience with that yet, but I'm having fun learning.

@Rijgersberg
Copy link
Owner

Have you tried running any quantized versions? 2 GB is a bit of a stretch, but 3 GB is possible. I don't know what the quality will be.

Quantized models can be found here for example:

See https://github.com/ggerganov/llama.cpp and https://huggingface.co/blog/4bit-transformers-bitsandbytes for more details about quantization.

To get an even better small LLM, someone could reproduce the training process of GEITje on a smaller base model, such as Phi 2 2.7B. The compute cost should scale down about linearly with parameter count.

@flatsiedatsie
Copy link
Author

flatsiedatsie commented Feb 5, 2024

Yes I have tried that.

I first tried the small The Bloke version, and yesterday I took the smallest geitje-7b-ultra.Q3_K_S model and used split -b 500M -a 2 to create 7 parts of 500Mb each. But in the browser I get RangeError: Array buffer allocation failed error, which seem to indicate the model is too big. I've tried without splitting too, but the browser (Brave) won't load files larger than 1500Mb.

Interestingly I seem to get a bit further on Safari:

RuntimeError: Unreachable code should not be executed (evaluating 'malloc(arg.length * 1, 1)')

Which seems to indicate that Safari does allow files that big. There it seems I will have to try to create a specific WASM file - which is what I expected.

Creating a Dutch version of Phi 2 would rock. Just out of curiosity, what do you expect the compute costs for something like that to be?

@Rijgersberg
Copy link
Owner

If you train the same amount of data you could probably get both the pretraining + finetuning done for under a 1000 euros total.

However: Mistral-7B already spoke quite a bit of Dutch, indicating at least some Dutch was part of its training data. In my quick tests right now I don't seem to be able to get any Dutch responses from Phi-2. I'm not sure applying the same training as GEITje would be enough for Phi-2 to turn into a useful Dutch chatbot.

@flatsiedatsie
Copy link
Author

Super interesting and informative. And thank you for trying Phi 2 already.

If Phi 2 would be "relatively cheap" to train..it must have cost quite a bit of money to make Geitje. Thank you for that!

For now I'll continue trying to create a WASM model for Geitje, and see if I can get it running on Safari. And somewhere on my list is testing Geitje as-is on the Rasbperry Pi 5 8GB.

I'll let you know how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants