Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seperate models for each agent #5

Open
R-Dson opened this issue Apr 11, 2024 · 4 comments
Open

Seperate models for each agent #5

R-Dson opened this issue Apr 11, 2024 · 4 comments

Comments

@R-Dson
Copy link

R-Dson commented Apr 11, 2024

Nice project. Open source models are usually good at one task, such as coding, writing, etc. And it would be interesting if we could set a parameter in the .env to specify which model that agent should use.

As an example, this could be done by adding a variable in .env such as ENGINEER_MODEL_NAME="gemma:7b-instruct-v1.1-fp16", and the same for the others agents. If none is set then fall back to the default. And if you run jemma --prompt "Trivia Game" --build-prototype --ollama dolphin-mistral:7b-v2.6-dpo-laser-fp16 with ENGINEER_MODEL_NAME set in .env to gemma then engineer is using gemma and all other use dolphin-mistral.

This could potentially perform better than using only one local model, at the expense of time switching models, thus should be optional to use.

@tolitius
Copy link
Owner

tolitius commented Apr 11, 2024

great idea!
was trying to figure out how to mix models better

I tried to generate requirements from Claude:

jemma --prompt "Trivia Game" --build-prototype --claude

and then write code with Ollama (codegemma, deepseek coder, etc.)

jemma --requirements requirements/learning-portal.2024-04-11.16-11-56-382.txt --build-prototype --ollama codegemma:7b-instruct-fp16

but so far I think local instruct / code models are trained to take in short instructions vs. something like detailed requirements and generate short passages of code

I am sure it will change for the better in a very near future
and also I am looking at meta prompting it better


as to your idea, I think model should be assigned to an activity vs. a role
for example

a business owner can:

  • generate requirements from idea
  • generate user stories from idea
  • generate user stories from requirements
  • analyze a visual sketch (trying it right now) to generate requirements

an engineer can:

  • review a business story (language task)
  • implement a user story
  • refactor code
  • etc..

your idea still applies, but I think it makes sense to assign a model for each activity
and if unassigned, use the top level model:

[
 {"task": "idea-to-requirements",
  "model": "claude-3-haiku-20240307",
  "provider": "claude"},
 
 {"task": "review-user-story",
  "model": "mistral:7b-instruct-v0.2-fp16",
  "provider": "ollama"},
 
 {"task": "refactor-code",
  "model": "deepseek-coder:6.7b-instruct-fp16",
  "provider": "ollama"},
]

@R-Dson
Copy link
Author

R-Dson commented Apr 11, 2024

Yes that seems like a reasonable approach.

As for the Ollama and short instructions, you might need to add the parameter num_ctx to options and increase the context window. I believe the default is 2048.

@tolitius
Copy link
Owner

yep, updated per docs

@R-Dson
Copy link
Author

R-Dson commented Apr 12, 2024

Nice, you could include context length in this part since you may want a 7b model to use a longer length than a 33b model, etc.

 {"task": "idea-to-requirements",
  "model": "claude-3-haiku-20240307",
  "provider": "claude"},
 
 {"task": "review-user-story",
  "model": "mistral:7b-instruct-v0.2-fp16",
  "provider": "ollama",
  "context": "16384"},
 
 {"task": "refactor-code",
  "model": "deepseek-coder:33b-instruct-fp16",
  "provider": "ollama",
  "context": "4096"},
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants