HuggingFace Space to Compare Different Approaches #32

codelion · 2024-09-24T12:02:10Z

codelion
Sep 24, 2024
Maintainer

You can now use this space to try out optillm.

rhohndorf · 2024-11-13T21:37:24Z

rhohndorf
Nov 13, 2024

The only free model I could get to work on the hf space was the qwen2 vl instruct model and interestingly the base model (i.e. approach = None) outperformed all other methods (cot reflection, leap, rstar) in my admittedly very unscientific tests.
Can somebody verify that ?
Do these methods only work well with larger models ?

0 replies

codelion · 2024-11-16T02:33:39Z

codelion
Nov 16, 2024
Maintainer Author

Yes, the models needs to be good at instruction following for the techniques to work as most of the them are implemented via clever prompting. The HF space has had a lot of traffic recently so it was running out of the free OpenRouter API calls. You can duplicate the space and try it under your own OpenRouter API key if you get errors while running it.

The approaches has been tested on a variety of benchmark over the last few months as we implemented them you can see the results in this discussion - #31

For a few examples that show the differences v/s the base model there are some screenshots posted on the HF space - https://huggingface.co/spaces/codelion/optillm/discussions/2

I am also doing a more comprehensive eval now with the AIME 2024 problems dataset for a variety of models will be sharing the results next week.

0 replies

rhohndorf · 2024-11-18T06:20:50Z

rhohndorf
Nov 18, 2024

Thanks for your response. This time I was able to run meta-llama/llama-3.2-1b-instruct:free.
The benchmark results you posted look promising.

Is it possible to run the gradio app locally? It doesn't seem to be included in the repo.
Also, can you give a client code example how to run optillm with llama.cpp. I don't know what to add in the model field.
I itried "Qwen/Qwen2-7B-Instruct-GGUF" but nope.

I couldn't get the builtin model server to run with Qwen/Qwen2.5-1.5B-Instruct-AWQ
autoawq is installed.
I can create an issue if you think this might be bug. Don't want to spam this thread more than I did already.

1 reply

codelion Nov 19, 2024
Maintainer Author

All HF spaces are git repos. you can clone and run the gradio app locally, here is the code - https://huggingface.co/spaces/codelion/optillm/tree/main (git clone https://huggingface.co/spaces/codelion/optillm)

When running with llama.cpp you may need to set the model name via --alias and the use that in the OpenAI client.

I couldn't get the builtin model server to run with Qwen/Qwen2.5-1.5B-Instruct-AWQ
autoawq is installed.

This looks like a bug, can you please create an issue for it. Thanks.

rhohndorf · 2024-11-18T06:27:17Z

rhohndorf
Nov 18, 2024

Also, my result with the hf spaces demo vary from yours:

1 reply

codelion Nov 19, 2024
Maintainer Author

There will be some variation in individual examples across runs, that's why we report results across large benchmarks like GSM8k, Arena Auto Hard, LiveCodeBench and LiveBench.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HuggingFace Space to Compare Different Approaches #32

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

HuggingFace Space to Compare Different Approaches #32

codelion Sep 24, 2024 Maintainer

Replies: 4 comments · 2 replies

rhohndorf Nov 13, 2024

codelion Nov 16, 2024 Maintainer Author

rhohndorf Nov 18, 2024

codelion Nov 19, 2024 Maintainer Author

rhohndorf Nov 18, 2024

codelion Nov 19, 2024 Maintainer Author

codelion
Sep 24, 2024
Maintainer

Replies: 4 comments 2 replies

rhohndorf
Nov 13, 2024

codelion
Nov 16, 2024
Maintainer Author

rhohndorf
Nov 18, 2024

codelion Nov 19, 2024
Maintainer Author

rhohndorf
Nov 18, 2024

codelion Nov 19, 2024
Maintainer Author