You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+17-4
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,7 @@ Inference of Stable Diffusion and Flux in pure C/C++
24
24
- Full CUDA, Metal, Vulkan and SYCL backend for GPU acceleration.
25
25
- Can load ckpt, safetensors and diffusers models/checkpoints. Standalone VAEs models
26
26
- No need to convert to `.ggml` or `.gguf` anymore!
27
-
- Flash Attention for memory usage optimization (only cpu for now)
27
+
- Flash Attention for memory usage optimization
28
28
- Original `txt2img` and `img2img` mode
29
29
- Negative prompt
30
30
-[stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) style tokenizer (not all the features, only token weighting for now)
@@ -182,11 +182,21 @@ Example of text2img by using SYCL backend:
182
182
183
183
##### Using Flash Attention
184
184
185
-
Enabling flash attention reduces memory usage by at least 400 MB. At the moment, it is not supported when CUBLAS is enabled because the kernel implementation is missing.
185
+
Enabling flash attention for the diffusion model reduces memory usage by varying amounts of MB.
186
+
eg.:
187
+
- flux 768x768 ~600mb
188
+
- SD2 768x768 ~1400mb
186
189
190
+
For most backends, it slows things down, but for cuda it generally speeds it up too.
191
+
At the moment, it is only supported for some models and some backends (like cpu, cuda/rocm, metal).
192
+
193
+
Run by adding `--diffusion-fa` to the arguments and watch for:
187
194
```
188
-
cmake .. -DSD_FLASH_ATTN=ON
189
-
cmake --build . --config Release
195
+
[INFO ] stable-diffusion.cpp:312 - Using flash attention in the diffusion model
0 commit comments