Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WizardCoder-15B-V1.0-q4f32_1 failing to load #179

Closed
jcosta33 opened this issue Aug 24, 2023 · 11 comments
Closed

WizardCoder-15B-V1.0-q4f32_1 failing to load #179

jcosta33 opened this issue Aug 24, 2023 · 11 comments

Comments

@jcosta33
Copy link

jcosta33 commented Aug 24, 2023

Following the available examples in the WebLLM repo such as the next-simple-chat:

I have added the model URL and ID,

{ model_url: "https://huggingface.co/mlc-ai/mlc-chat-WizardCoder-15B-V1.0-q4f32_1/resolve/main/", local_id: "WizardCoder-15B-V1.0-q4f32_1", }

then added the libmap

"WizardCoder-15B-V1.0-q4f32_1": "https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/WizardCoder-15B-V1.0-q4f16_1-webgpu.wasm",

but I end up getting this error immediately after loading the model on the browser:

Init error, Error: Unknown conv template wizard_coder_or_math

@jcosta33 jcosta33 changed the title WizardCoder-15B-V1.0-q4f16_1 failing to load WizardCoder-15B-V1.0-q4f32_1 failing to load Aug 24, 2023
@CharlieFRuan
Copy link
Contributor

Thanks for bringing this up! This should be fixed soon by #174. If you'd like, you could add the template in src/conversation.ts as shown in the draft PR.

@jcosta33
Copy link
Author

Thank you! While I have you, in my UI I'd like the user to be able to control all the LLM settings but from what I see different models require different configs. From the available configs, which do you think I should expose to the user and which do you feel ought to be hardcoded? Lastly, lately there has been a lot of talk about smaller specialised models taking over the LLM space. Is there a roadmap or list of LLMs you are intending to add support for? Code LLAMA comes to mind!

Thank you, I know this goes beyond the scope of this issue but this tech has me on fire, very exciting stuff!

@CharlieFRuan
Copy link
Contributor

Thanks for the questions!

You could look at the documentation regarding the config. Like you said, some configs are model-specific, but some other configs, say system, can be conveniently customized by users.

I believe we just supported Code Llama on mlc-lm: mlc-ai/mlc-llm#809

Regarding roadmap, we made a tracker here: mlc-ai/mlc-llm#692, but it can be not up-to-date at times. I guess the best way is to follow the issues and PRs in the mlc-llm repo. They are usually tagged with [New Models].

@jcosta33
Copy link
Author

jcosta33 commented Aug 28, 2023

Awesome! Any word on whether f32 and wasm variants will be made available?

@CharlieFRuan
Copy link
Contributor

I added the request in mlc-ai/mlc-llm#692! We apologize that it may be hard to follow up with all requests promptly, but feel free to follow the tutorials mentioned in the tracker and see if you could compile yourself:))

@jcosta33
Copy link
Author

jcosta33 commented Aug 29, 2023

Honestly as a FE dev I'm a little intimidated by the lower level stuff and haven't had the time to really delve that far out of my comfort zone but I would love to learn, it would give me a lot of autonomy. It is my understanding that with the right knowledge one could even train a custom version of a supported LLM and package it into wasm? That would be of tremendous value. I'm currently working for one of the large energy companies and we're considering building a POC with this in an attempt to democratise LLMs across the organisation, I can't say more than that at this point as it's all still very speculative. My primary focus is the UI/UX design and implementation as well, though less frequently, backend work with NodeJS. If this moves forward we will bring in people with the right skillset to play with things more fully. Any resources you would recommend I should follow, knowing my skillsets are primarily on the FE side, to learn the basics of this process?

@CharlieFRuan
Copy link
Contributor

This is exciting to hear, thanks for sharing!

It is my understanding that with the right knowledge one could even train a custom version of a supported LLM and package it into wasm?

Yep! As long as the LLM is then packaged like a huggingface model, with the required format like a config file. Afterwards, you can compile it and chat with it, perhaps this tutorial on Extensions to More Model Variants may be relevant. Similarly, any existing huggingface model (say Code Llama) can be compiled into a MLC LLM model (quantized weights and a wasm file in the web case) without too much knowledge required (it is just a workflow to follow, as long as the environment is setup correctly, it shouldn't be too much work, though can be bumpy).

Note that the tutorial above is for cuda/vulkan; for wasm, some other work (not too much hopefully) needs to be done, see tutorial here.

to learn the basics of this process

As for resources, web-llm goes hand-in-hand with the mlc-llm project, which is fully documented here. I believe it'd be helpful to start from mlc-llm. If you run into problems/have questions, please let us know!

@jcosta33
Copy link
Author

jcosta33 commented Aug 29, 2023

any existing huggingface model (say Code Llama) can be compiled into a MLC LLM model

Sorry for harping on you but I have to confirm I got this right. I can grab just about any model from huggingface and compile it to wasm using MLC and load it myself into Web LLM? Is that what you're saying? Because if so that's quite remarkable, specially since we keep hearing more and more about the virtues of smaller LLMs for specialized tasks. I have so many other questions but I don't want to pester you any further, is there a discord channel or community forum I can join to find out more?

Thank you for your help, I really appreciate it.

@CharlieFRuan
Copy link
Contributor

CharlieFRuan commented Aug 29, 2023

I can grab just about any model from huggingface and compile it to wasm using MLC and load it myself into Web LLM?

Yes, that is correct! As long as the architecture is supported by us. For instance, WizardMath has the same architecture as llama, so you could compile these models yourself (following this tutorial on Extensions to More Model Variants).

For models with architectures not supported by us, it requires much more work. You can see the list of model variants/architectures currently supported here: https://mlc.ai/mlc-llm/docs/prebuilt_models.html.

is there a discord channel or community forum I can join to find out more

You can find the link to the discord server here: https://mlc.ai/mlc-llm/docs/index.html

@jcosta33
Copy link
Author

Thank you, again. I'll see if I can give this a try soon!

@CharlieFRuan
Copy link
Contributor

Good luck!

@tqchen tqchen closed this as completed Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants