-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WizardCoder-15B-V1.0-q4f32_1 failing to load #179
Comments
Thanks for bringing this up! This should be fixed soon by #174. If you'd like, you could add the template in |
Thank you! While I have you, in my UI I'd like the user to be able to control all the LLM settings but from what I see different models require different configs. From the available configs, which do you think I should expose to the user and which do you feel ought to be hardcoded? Lastly, lately there has been a lot of talk about smaller specialised models taking over the LLM space. Is there a roadmap or list of LLMs you are intending to add support for? Code LLAMA comes to mind! Thank you, I know this goes beyond the scope of this issue but this tech has me on fire, very exciting stuff! |
Thanks for the questions! You could look at the documentation regarding the config. Like you said, some configs are model-specific, but some other configs, say I believe we just supported Code Llama on mlc-lm: mlc-ai/mlc-llm#809 Regarding roadmap, we made a tracker here: mlc-ai/mlc-llm#692, but it can be not up-to-date at times. I guess the best way is to follow the issues and PRs in the mlc-llm repo. They are usually tagged with |
Awesome! Any word on whether f32 and wasm variants will be made available? |
I added the request in mlc-ai/mlc-llm#692! We apologize that it may be hard to follow up with all requests promptly, but feel free to follow the tutorials mentioned in the tracker and see if you could compile yourself:)) |
Honestly as a FE dev I'm a little intimidated by the lower level stuff and haven't had the time to really delve that far out of my comfort zone but I would love to learn, it would give me a lot of autonomy. It is my understanding that with the right knowledge one could even train a custom version of a supported LLM and package it into wasm? That would be of tremendous value. I'm currently working for one of the large energy companies and we're considering building a POC with this in an attempt to democratise LLMs across the organisation, I can't say more than that at this point as it's all still very speculative. My primary focus is the UI/UX design and implementation as well, though less frequently, backend work with NodeJS. If this moves forward we will bring in people with the right skillset to play with things more fully. Any resources you would recommend I should follow, knowing my skillsets are primarily on the FE side, to learn the basics of this process? |
This is exciting to hear, thanks for sharing!
Yep! As long as the LLM is then packaged like a huggingface model, with the required format like a config file. Afterwards, you can compile it and chat with it, perhaps this tutorial on Extensions to More Model Variants may be relevant. Similarly, any existing huggingface model (say Code Llama) can be compiled into a MLC LLM model (quantized weights and a wasm file in the web case) without too much knowledge required (it is just a workflow to follow, as long as the environment is setup correctly, it shouldn't be too much work, though can be bumpy). Note that the tutorial above is for cuda/vulkan; for wasm, some other work (not too much hopefully) needs to be done, see tutorial here.
As for resources, web-llm goes hand-in-hand with the mlc-llm project, which is fully documented here. I believe it'd be helpful to start from mlc-llm. If you run into problems/have questions, please let us know! |
Sorry for harping on you but I have to confirm I got this right. I can grab just about any model from huggingface and compile it to wasm using MLC and load it myself into Web LLM? Is that what you're saying? Because if so that's quite remarkable, specially since we keep hearing more and more about the virtues of smaller LLMs for specialized tasks. I have so many other questions but I don't want to pester you any further, is there a discord channel or community forum I can join to find out more? Thank you for your help, I really appreciate it. |
Yes, that is correct! As long as the architecture is supported by us. For instance, WizardMath has the same architecture as llama, so you could compile these models yourself (following this tutorial on Extensions to More Model Variants). For models with architectures not supported by us, it requires much more work. You can see the list of model variants/architectures currently supported here: https://mlc.ai/mlc-llm/docs/prebuilt_models.html.
You can find the link to the discord server here: https://mlc.ai/mlc-llm/docs/index.html |
Thank you, again. I'll see if I can give this a try soon! |
Good luck! |
Following the available examples in the WebLLM repo such as the next-simple-chat:
I have added the model URL and ID,
{ model_url: "https://huggingface.co/mlc-ai/mlc-chat-WizardCoder-15B-V1.0-q4f32_1/resolve/main/", local_id: "WizardCoder-15B-V1.0-q4f32_1", }
then added the libmap
"WizardCoder-15B-V1.0-q4f32_1": "https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/WizardCoder-15B-V1.0-q4f16_1-webgpu.wasm",
but I end up getting this error immediately after loading the model on the browser:
Init error, Error: Unknown conv template wizard_coder_or_math
The text was updated successfully, but these errors were encountered: