-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing Stop Token #8
Comments
Exllama doesn't expose this as a parameter; instead it takes this from the tokenizer. https://github.com/turboderp/exllama/blob/e9da6205f432a86c6446755e8454c1d9a89f96db/example_chatbot.py#L207 If the tokenizer in your model is set correctly, it should stop fine. Are you using the correct prompt format for your model? |
same issue doesn't stop generating for me either and yes I am using the prompt template given in the model card. |
The issue I see here is all of the extra quotation marks, which is probably messing up the tokenization. I would
|
i'm not quite sure i understand what you mean by this. here's what i'm currently using: prompt = f"SYSTEM: Your name is Mindy. You are the smartest assistant in the world. You should always communicate in a professional way. Keep messages short. USER: {user_input}, ASSISTANT:" |
Hey, how do I implement a stop token to prevent the LLM from overgenerating?
The text was updated successfully, but these errors were encountered: