We benchmarked SD v1.5 on 23 consumer GPUs - To generate 460,000 fancy QR codes.
The best performing GPU/backend combination delivered almost 20,000 images generated per dollar (512x512 resolution).
You can read the full benchmark here: https://blog.salad.com/stable-diffusion-v1-5-benchmark/
Some key observations:
- Do not use the GTX series GPUs for production stable diffusion inference. Absolute performance and cost performance are dismal in the GTX series, and in many cases the benchmark could not be fully completed, with jobs repeatedly running out of CUDA memory. Additionally, many images generated on these GPUs came out all black, instead of as fancy QR codes as desired.
- There are very few surprises for which GPU is fastest for each backend. Newer GPUs with higher model numbers are faster in nearly all situations.
- Batching saves time and money. In most situations, you can expect anywhere from 5-30% savings using batch size 4, vs batch size 1.
- Generation time scales close to linearly with number of pixels. a 768x768px image has 2.25x the pixels as a 512x512px image, and typically takes around 2x the time to generate.
- You can get surprisingly good cost performance out of the 20-series and 30-series RTX GPUs, regardless of the backend you choose.
- If you have a use-case that allows you to take advantage of the optimizations offered by Stable Fast, and the engineering availability to build and maintain an in-house solution, this is a great option that could save you a bunch of money while providing a fast and reliable image generation experience for your users.
- Many factors go into scannability for these stable diffusion QR codes, and consistently getting good results is no simple task. Shorter URLs lead to better results, as there is less data to encode. Using QR codes with lighter backgrounds leads to easier scanning, but less interesting images. Some prompts work much better than others, and some prompts can sustain much higher guidance than others. In addition, iOS and Android phones use different QR scanning implementations, so some codes scan fine on one platform but not the other.
Code and Docker Images
- stable-fast
– https://github.com/chengzeyi/stable-fast - Stable Fast QR Code Generator – https://github.com/SaladTechnologies/stable-fast-qr-demo
- Stable Fast QR Code Generator Docker Image: https://hub.docker.com/r/saladtechnologies/stable-fast-qr-code
- Automatic1111 – https://github.com/AUTOMATIC1111/stable-diffusion-webui
- Automatic1111 Dockerization: https://github.com/SaladTechnologies/stable-diffusion-webui-docker
- Automatic1111 Model Management: https://github.com/SaladTechnologies/a1111-dynamic
- Automatic1111 Docker Image: https://hub.docker.com/r/saladtechnologies/a1111
- SD.Next – https://github.com/vladmandic/automatic
- SD.Next Dockerization: https://github.com/shawnrushefsky/automatic/
- SD.Next Model Management: https://github.com/SaladTechnologies/sdnext-dynamic
- SD.Next Docker Image: https://hub.docker.com/r/saladtechnologies/sdnext
- ComfyUI – https://github.com/comfyanonymous/ComfyUI
- ComfyUI Dockerization: https://github.com/ai-dock/comfyui/
- ComfyUI Model Management: https://github.com/SaladTechnologies/comfyui-dynamic
- ComfyUI Docker Image: https://hub.docker.com/r/saladtechnologies/comfyui
- Benchmark Worker Process: https://github.com/SaladTechnologies/qr-code-worker
- Queue Management Lambda: https://github.com/SaladTechnologies/benchmark-queues
- Database Access Lambda: https://github.com/SaladTechnologies/benchmark-api