Diffusers / Stable Diffusion in docker with a REST API, supporting various models, pipelines & schedulers. Used by kiri.art, perfect for banana.dev.
Copyright (c) Gadi Cohen, 2022. MIT Licensed. Please give credit and link back to this repo if you use it in a public project.
- Pipelines: txt2img, img2img and inpainting in a single container
- Models: stable-diffusion, waifu-diffusion, and easy to add others (e.g. jp-sd)
- All model inputs supported, including setting nsfw filter per request
- Permute base config to multiple forks based on yaml config with vars
- Optionally send signed event logs / performance data to a REST endpoint
- Can automatically download a checkpoint file and convert to diffusers.
Note: This image was created for kiri.art. Everything is open source but there may be certain request / response assumptions. If anything is unclear, please open an issue.
Official help in our dedicated forum https://banana-forums.dev/c/open-source/docker-diffusers-api/16.
Firstly, fork and clone this repo.
Most of the configuration happens via docker build variables. You can see all the options in the Dockerfile, and edit them there directly, or set via docker command line or e.g. Banana's dashboard UI once support for build variables land (any day now).
If you're only deploying one container, that's all you need! If you
intend to deploy multiple containers each with different variables
(e.g. a few different models), you can edit the example
scripts/permutations.yaml
] file and
run scripts/permute.sh
to create a number
of sub-repos in the permutations
directory.
Lastly, there's an option to set MODEL_ID=ALL
, and all models will
be downloaded, and switched at request time (great for dev, useless for
serverless).
Deploying to banana? That's it! You're done. Commit your changes and push.
Building
- Set
HF_AUTH_TOKEN
environment var if you haven't set it elsewhere. docker build -t banana-sd --build-arg HF_AUTH_TOKEN=$HF_AUTH_TOKEN .
- Optionally add
DOCKER_BUILDKIT=1 BUILDKIT_PROGRESS=plain
to start of the line, depending on your preferences. (Recommended if you're using theroot-cache
feature.) - Note: your first build can take a really long time, depending on
your PC & network speed, and especially when using the
CHECKPOINT_URL
feature. Great time to grab a coffee or take a walk.
Running
docker run -it --gpus all -p 8000:8000 banana-sd python3 server.py
- Note: the
-it
is optional but makes it alot quicker/easier to stop the container usingCtrl-C
. - If you get a
CUDA initialization: CUDA unknown error
after suspend, just stop the container,rmmod nvidia_uvm
, and restart.
The container expects an HTTP POST
request with the following JSON body:
{
"modelInputs": {
"prompt": "Super dog",
"num_inference_steps": 50,
"guidance_scale": 7.5,
"width": 512,
"height": 512,
"seed": 3239022079
},
"callInputs": {
"MODEL_ID": "runwayml/stable-diffusion-v1-5",
"PIPELINE": "StableDiffusionPipeline",
"SCHEDULER": "LMSDiscreteScheduler",
"safety_checker": true,
},
}
If you're using banana's SDK, it looks something like this:
const out = await banana.run(apiKey, modelKey, { "modelInputs": modelInputs, "callInputs": callInputs });
NB: if you're coming from another banana starter repo, note that we
explicitly name modelInputs
above, and send a bigger object (with
modelInputs
and callInputs
keys) for the banana-sdk's
"modelInputs" argument.
If provided, init_image
and mask_image
should be base64 encoded.
Available schedulers: LMSDiscreteScheduler
, DDIMScheduler
, PNDMScheduler
,
EulerAncestralDiscreteScheduler
, EulerDiscreteScheduler
. These cover the
most commonly used / requested schedulers, but we already have code in place to
support every scheduler provided by diffusers, which will work in a later
diffusers release when they have better defaults.
There are also very basic examples in test.py, which you can view
and call python test.py
if the container is already running on port 8000.
You can also specify a specific test, change some options, and run against a
deployed banana image:
$ python test.py
Usage: python3 test.py [--banana] [--xmfe=1/0] [--scheduler=SomeScheduler] [all / test1] [test2] [etc]
# Run against http://localhost:8000/ (Nvidia Quadro RTX 5000)
$ python test.py txt2img
Running test: txt2img
Request took 5.9s (init: 3.2s, inference: 5.9s)
Saved /home/dragon/www/banana/banana-sd-base/tests/output/txt2img.png
# Run against deployed banana image (Nvidia A100)
$ export BANANA_API_KEY=XXX
$ BANANA_MODEL_KEY=XXX python3 test.py --banana txt2img
Running test: txt2img
Request took 19.4s (init: 2.5s, inference: 3.5s)
Saved /home/dragon/www/banana/banana-sd-base/tests/output/txt2img.png
# Note that 2nd runs are much faster (ignore init, that isn't run again)
Request took 3.0s (init: 2.4s, inference: 2.1s)
The best example of course is https://kiri.art/ and it's source code.
-
403 Client Error: Forbidden for url
Make sure you've accepted the license on the model card of the HuggingFace model specified in
MODEL_ID
, and that you correctly passedHF_AUTH_TOKEN
to the container.
You have two options.
-
For a diffusers model, simply set the
MODEL_ID
docker build variable to the name of the model hosted on HuggingFace, and it will be downloaded automatically at build time. -
For a non-diffusers model, simply set the
CHECKPOINT_URL
docker build variable to the URL of a.ckpt
file, which will be downloaded and converted to the diffusers format automatically at build time.
Per your personal preferences, rebase or merge, e.g.
git fetch upstream
git merge upstream/main
git push
Or, if you're confident, do it in one step with no confirmations:
git fetch upstream && git merge upstream/main --no-edit && git push
Check scripts/permute.sh
and your git remotes, some URLs are hardcoded, I'll
make this easier in a future release.
Set CALL_URL
and SIGN_KEY
environment variables to send timing data on init
and inference
start and end data. You'll need to check the source code of here
and sd-mui as the format is in flux.
This info is now logged regardless, and init()
and inference()
times are sent
back via { $timings: { init: timeInMs, inference: timeInMs } }
.
Originally based on https://github.com/bananaml/serverless-template-stable-diffusion.