Would it be possible to make a smaller verison of the model? #1

flatsiedatsie · 2024-02-04T22:49:03Z

Hey Edwin

Supertof project!

I'm trying to use your model in a web-based environment, but apparently the browser doesn't like models that are 4Gb in size. I was trying to create a really easy to use tool for my niece that would allow her to summarize PDF's.

Similarly, I've been trying to use your model on a Raspberry pi 5, but there too it would be great if there was a model that was, say, 2Gb in size. That way it could still comfortably run on devices with 4Gb or ram too.

Curious to hear if that's even possible. Or could I make the model smaller myself? I don't have any experience with that yet, but I'm having fun learning.

Rijgersberg · 2024-02-05T11:49:42Z

Have you tried running any quantized versions? 2 GB is a bit of a stretch, but 3 GB is possible. I don't know what the quality will be.

Quantized models can be found here for example:

See https://github.com/ggerganov/llama.cpp and https://huggingface.co/blog/4bit-transformers-bitsandbytes for more details about quantization.

To get an even better small LLM, someone could reproduce the training process of GEITje on a smaller base model, such as Phi 2 2.7B. The compute cost should scale down about linearly with parameter count.

flatsiedatsie · 2024-02-05T17:37:35Z

Yes I have tried that.

I first tried the small The Bloke version, and yesterday I took the smallest geitje-7b-ultra.Q3_K_S model and used split -b 500M -a 2 to create 7 parts of 500Mb each. But in the browser I get RangeError: Array buffer allocation failed error, which seem to indicate the model is too big. I've tried without splitting too, but the browser (Brave) won't load files larger than 1500Mb.

Interestingly I seem to get a bit further on Safari:

RuntimeError: Unreachable code should not be executed (evaluating 'malloc(arg.length * 1, 1)')

Which seems to indicate that Safari does allow files that big. There it seems I will have to try to create a specific WASM file - which is what I expected.

Creating a Dutch version of Phi 2 would rock. Just out of curiosity, what do you expect the compute costs for something like that to be?

Rijgersberg · 2024-02-05T22:03:22Z

If you train the same amount of data you could probably get both the pretraining + finetuning done for under a 1000 euros total.

However: Mistral-7B already spoke quite a bit of Dutch, indicating at least some Dutch was part of its training data. In my quick tests right now I don't seem to be able to get any Dutch responses from Phi-2. I'm not sure applying the same training as GEITje would be enough for Phi-2 to turn into a useful Dutch chatbot.

flatsiedatsie · 2024-02-06T12:58:09Z

Super interesting and informative. And thank you for trying Phi 2 already.

If Phi 2 would be "relatively cheap" to train..it must have cost quite a bit of money to make Geitje. Thank you for that!

For now I'll continue trying to create a WASM model for Geitje, and see if I can get it running on Safari. And somewhere on my list is testing Geitje as-is on the Rasbperry Pi 5 8GB.

I'll let you know how it goes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would it be possible to make a smaller verison of the model? #1

Would it be possible to make a smaller verison of the model? #1

flatsiedatsie commented Feb 4, 2024

Rijgersberg commented Feb 5, 2024

flatsiedatsie commented Feb 5, 2024 •

edited

Loading

Rijgersberg commented Feb 5, 2024

flatsiedatsie commented Feb 6, 2024

Would it be possible to make a smaller verison of the model? #1

Would it be possible to make a smaller verison of the model? #1

Comments

flatsiedatsie commented Feb 4, 2024

Rijgersberg commented Feb 5, 2024

flatsiedatsie commented Feb 5, 2024 • edited Loading

Rijgersberg commented Feb 5, 2024

flatsiedatsie commented Feb 6, 2024

flatsiedatsie commented Feb 5, 2024 •

edited

Loading