Serverless AI for Dummies

Tools for Modal. Intended for personal use and experiments.

Tools

My Llamas

My Llamas is a Modal app that can download models to a Modal volume. You can specify Ollama model names to have ollama manage pulling them to the .ollama directory in the volume, or specify Huggingface paths to files, including multipart files.

The app has a scale-to-zero Ollama inference server with token authentication through FastAPI. The FastAPI app just proxies requests to the Ollama REST server running in the container.

Connect from a chat client of your choice like open-webui. You can also use client.py for quick testing, which I pulled from modal examples.

You get $30 of free credits per month from Modal.

Config

All of the models you want to store, whether through Ollama or Huggingface, are stored in args.py. Create this file in the project root and fill it with the following example.

The beauty here is you can create as many configs as you like and all the models can be downloaded and built to the volume.

from typing import Literal

DOWNLOAD = {
    "qwen": {
        "hf_path": "Qwen/Qwen2.5-0.5B-Instruct-GGUF/qwen2.5-0.5b-instruct-fp16.gguf",
        "pet_name": "Qwen",
        "modelfile": "qwen-test",
        "gpu": "t4:1",
    },
}
DOWNLOAD_DEFAULT = "qwen"

PULL = {"qwq": {"gpu": "l4:1"}}
PULL_DEFAULT = "qwq"

CHOSEN_SOURCE: Literal["download"] | Literal["pull"] = "download"

Set the CHOSEN_SOURCE to pull or download depending on if you want to deploy a model you pulled or deploy one you downloaded. It will use the respective config to create/use the Modal app with that config's GPU size.

Ollama source

modal run tame_llama::pull

This will use PULL_DEFAULT config.

Huggingface source

modal run --detach tame_llama::download
modal run --detach tame_llama::compile

This will use DOWNLOAD_DEFAULT config.

Testing

Test with qwen to make sure everything is working.

Setup

  # first, create a python virtual environment please
  # activate it

  # See Config section to populate this file
  touch args.py

  pip install modal
  modal secret create huggingface-secret HF_TOKEN=<secret>
  modal secret create llama-food LLAMA_FOOD=<secret> # Bearer auth for fastapi
  modal setup

modal run --detach tame_llama::pull

Commands

See the modal CLI for app, shell, deploy, secret, volume commands, etc.

Ollama logs on server

journalctl -u ollama --no-pager

Handy alias for testing

alias chat='python client.py \
  --app-name=myllamas-gpu-l4-1-myllamas \
  --function-name=serve \
  --model=qwq \
  --max-tokens 1000 \
  --api-key $(echo $LLAMA_FOOD) \
  --temperature 0.9 \
  --frequency-penalty 1.03 \
  --chat'

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
modelfiles		modelfiles
.gitignore		.gitignore
README.md		README.md
client.py		client.py
my_llamas.py		my_llamas.py
requirements.txt		requirements.txt
settings.py		settings.py
tame_llama.py		tame_llama.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Serverless AI for Dummies

Tools

My Llamas

Config

Ollama source

Huggingface source

Testing

Setup

Commands

Ollama logs on server

Handy alias for testing

About

Releases

Packages

Languages

ujisati/ai-for-dummies

Folders and files

Latest commit

History

Repository files navigation

Serverless AI for Dummies

Tools

My Llamas

Config

Ollama source

Huggingface source

Testing

Setup

Commands

Ollama logs on server

Handy alias for testing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages