HuggingFace Space to Compare Different Approaches #32
Replies: 4 comments 2 replies
-
The only free model I could get to work on the hf space was the qwen2 vl instruct model and interestingly the base model (i.e. approach = None) outperformed all other methods (cot reflection, leap, rstar) in my admittedly very unscientific tests. |
Beta Was this translation helpful? Give feedback.
-
Yes, the models needs to be good at instruction following for the techniques to work as most of the them are implemented via clever prompting. The HF space has had a lot of traffic recently so it was running out of the free OpenRouter API calls. You can duplicate the space and try it under your own OpenRouter API key if you get errors while running it. The approaches has been tested on a variety of benchmark over the last few months as we implemented them you can see the results in this discussion - #31 For a few examples that show the differences v/s the base model there are some screenshots posted on the HF space - https://huggingface.co/spaces/codelion/optillm/discussions/2 I am also doing a more comprehensive eval now with the AIME 2024 problems dataset for a variety of models will be sharing the results next week. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your response. This time I was able to run meta-llama/llama-3.2-1b-instruct:free. Is it possible to run the gradio app locally? It doesn't seem to be included in the repo. I couldn't get the builtin model server to run with Qwen/Qwen2.5-1.5B-Instruct-AWQ |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
You can now use this space to try out optillm.
Beta Was this translation helpful? Give feedback.
All reactions