|
| 1 | + |
| 2 | +## Multi-Image Generation Streamlit App: Chaining Llama & Stable Diffusion using TorchServe, torch.compile & OpenVINO |
| 3 | + |
| 4 | +This Multi-Image Generation Streamlit app is designed to generate multiple images based on a provided text prompt. Instead of using Stable Diffusion directly, this app chains Llama and Stable Diffusion to enhance the image generation process. Here’s how it works: |
| 5 | +- The app takes a user prompt and uses [Meta-Llama-3.2](https://huggingface.co/meta-llama) to create multiple interesting and relevant prompts. |
| 6 | +- These generated prompts are then sent to Stable Diffusion with [latent-consistency/lcm-sdxl](https://huggingface.co/latent-consistency/lcm-sdxl) model, to generate images. |
| 7 | +- For performance optimization, the models are compiled using [torch.compile using OpenVINO backend.](https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html) |
| 8 | +- The application leverages [TorchServe](https://pytorch.org/serve/) for efficient model serving and management. |
| 9 | + |
| 10 | + |
| 11 | + |
| 12 | +## Quick Start Guide |
| 13 | + |
| 14 | +**Prerequisites**: |
| 15 | +- Docker installed on your system |
| 16 | +- Hugging Face Token: Create a Hugging Face account and obtain a token with access to the [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) model. |
| 17 | + |
| 18 | +To launch the Multi-Image Generation App, follow these steps: |
| 19 | +```bash |
| 20 | +# 1: Set HF Token as Env variable |
| 21 | +export HUGGINGFACE_TOKEN=<HUGGINGFACE_TOKEN> |
| 22 | + |
| 23 | +# 2: Build Docker image for this Multi-Image Generation App |
| 24 | +git clone https://github.com/pytorch/serve.git |
| 25 | +cd serve |
| 26 | +./examples/usecases/llm_diffusion_serving_app/docker/build_image.sh |
| 27 | + |
| 28 | +# 3: Launch the streamlit app for server & client |
| 29 | +# After the Docker build is successful, you will see a "docker run" command printed to the console. |
| 30 | +# Run that "docker run" command to launch the Streamlit app for both the server and client. |
| 31 | +``` |
| 32 | + |
| 33 | +#### Sample Output of Docker Build: |
| 34 | + |
| 35 | +<details> |
| 36 | + |
| 37 | +```console |
| 38 | +ubuntu@ip-10-0-0-137:~/serve$ ./examples/usecases/llm_diffusion_serving_app/docker/build_image.sh |
| 39 | +EXAMPLE_DIR: .//examples/usecases/llm_diffusion_serving_app/docker |
| 40 | +ROOT_DIR: /home/ubuntu/serve |
| 41 | +DOCKER_BUILDKIT=1 docker buildx build --platform=linux/amd64 --file .//examples/usecases/llm_diffusion_serving_app/docker/Dockerfile --build-arg BASE_IMAGE="pytorch/torchserve:latest-cpu" --build-arg EXAMPLE_DIR=".//examples/usecases/llm_diffusion_serving_app/docker" --build-arg HUGGINGFACE_TOKEN=hf_<token> --build-arg HTTP_PROXY= --build-arg HTTPS_PROXY= --build-arg NO_PROXY= -t "pytorch/torchserve:llm_diffusion_serving_app" . |
| 42 | +[+] Building 1.4s (18/18) FINISHED docker:default |
| 43 | + => [internal] load .dockerignore 0.0s |
| 44 | + . |
| 45 | + . |
| 46 | + . |
| 47 | + => => naming to docker.io/pytorch/torchserve:llm_diffusion_serving_app 0.0s |
| 48 | + |
| 49 | +Docker Build Successful ! |
| 50 | + |
| 51 | +............................ Next Steps ............................ |
| 52 | +-------------------------------------------------------------------- |
| 53 | +[Optional] Run the following command to benchmark Stable Diffusion: |
| 54 | +-------------------------------------------------------------------- |
| 55 | + |
| 56 | +docker run --rm --platform linux/amd64 \ |
| 57 | + --name llm_sd_app_bench \ |
| 58 | + -v /home/ubuntu/serve/model-store-local:/home/model-server/model-store \ |
| 59 | + --entrypoint python \ |
| 60 | + pytorch/torchserve:llm_diffusion_serving_app \ |
| 61 | + /home/model-server/llm_diffusion_serving_app/sd-benchmark.py -ni 3 |
| 62 | + |
| 63 | +------------------------------------------------------------------- |
| 64 | +Run the following command to start the Multi-Image generation App: |
| 65 | +------------------------------------------------------------------- |
| 66 | + |
| 67 | +docker run --rm -it --platform linux/amd64 \ |
| 68 | + --name llm_sd_app \ |
| 69 | + -p 127.0.0.1:8080:8080 \ |
| 70 | + -p 127.0.0.1:8081:8081 \ |
| 71 | + -p 127.0.0.1:8082:8082 \ |
| 72 | + -p 127.0.0.1:8084:8084 \ |
| 73 | + -p 127.0.0.1:8085:8085 \ |
| 74 | + -v /home/ubuntu/serve/model-store-local:/home/model-server/model-store \ |
| 75 | + -e MODEL_NAME_LLM=meta-llama/Llama-3.2-3B-Instruct \ |
| 76 | + -e MODEL_NAME_SD=stabilityai/stable-diffusion-xl-base-1.0 \ |
| 77 | + pytorch/torchserve:llm_diffusion_serving_app |
| 78 | + |
| 79 | +Note: You can replace the model identifiers (MODEL_NAME_LLM, MODEL_NAME_SD) as needed. |
| 80 | + |
| 81 | +``` |
| 82 | + |
| 83 | +</details> |
| 84 | + |
| 85 | +## What to expect |
| 86 | +After launching the Docker container using the `docker run ..` command displayed after successful build, you can access two separate Streamlit applications: |
| 87 | +1. TorchServe Server App (running at http://localhost:8084) to start/stop TorchServe, load/register models, scale up/down workers. |
| 88 | +2. Client App (running at http://localhost:8085) where you can enter prompt for Image generation. |
| 89 | + |
| 90 | +> Note: You could also run a quick benchmark comparing performance of Stable Diffusion with Eager, torch.compile with inductor and openvino. |
| 91 | +> Review the `docker run ..` command displayed after successful build for benchmarking |
| 92 | +
|
| 93 | +#### Sample Output of Starting the App: |
| 94 | + |
| 95 | +<details> |
| 96 | + |
| 97 | +```console |
| 98 | +ubuntu@ip-10-0-0-137:~/serve$ docker run --rm -it --platform linux/amd64 \ |
| 99 | + --name llm_sd_app \ |
| 100 | + -p 127.0.0.1:8080:8080 \ |
| 101 | + -p 127.0.0.1:8081:8081 \ |
| 102 | + -p 127.0.0.1:8082:8082 \ |
| 103 | + -p 127.0.0.1:8084:8084 \ |
| 104 | + -p 127.0.0.1:8085:8085 \ |
| 105 | + -v /home/ubuntu/serve/model-store-local:/home/model-server/model-store \ |
| 106 | + -e MODEL_NAME_LLM=meta-llama/Llama-3.2-3B-Instruct \ |
| 107 | + -e MODEL_NAME_SD=stabilityai/stable-diffusion-xl-base-1.0 \ |
| 108 | + pytorch/torchserve:llm_diffusion_serving_app |
| 109 | + |
| 110 | +Preparing meta-llama/Llama-3.2-1B-Instruct |
| 111 | +/home/model-server/llm_diffusion_serving_app/llm /home/model-server/llm_diffusion_serving_app |
| 112 | +Model meta-llama---Llama-3.2-1B-Instruct already downloaded. |
| 113 | +Model archive for meta-llama---Llama-3.2-1B-Instruct exists. |
| 114 | +/home/model-server/llm_diffusion_serving_app |
| 115 | + |
| 116 | +Preparing stabilityai/stable-diffusion-xl-base-1.0 |
| 117 | +/home/model-server/llm_diffusion_serving_app/sd /home/model-server/llm_diffusion_serving_app |
| 118 | +Model stabilityai/stable-diffusion-xl-base-1.0 already downloaded |
| 119 | +Model archive for stabilityai---stable-diffusion-xl-base-1.0 exists. |
| 120 | +/home/model-server/llm_diffusion_serving_app |
| 121 | + |
| 122 | +Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false. |
| 123 | + |
| 124 | +Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false. |
| 125 | + |
| 126 | + You can now view your Streamlit app in your browser. |
| 127 | + |
| 128 | + Local URL: http://localhost:8085 |
| 129 | + Network URL: http://123.11.0.2:8085 |
| 130 | + External URL: http://123.123.12.34:8085 |
| 131 | + |
| 132 | + |
| 133 | + You can now view your Streamlit app in your browser. |
| 134 | + |
| 135 | + Local URL: http://localhost:8084 |
| 136 | + Network URL: http://123.11.0.2:8084 |
| 137 | + External URL: http://123.123.12.34:8084 |
| 138 | +``` |
| 139 | + |
| 140 | +</details> |
| 141 | + |
| 142 | +#### Sample Output of Stable Diffusion Benchmarking: |
| 143 | +To run Stable Diffusion benchmarking, use the `sd-benchmark.py`. See details below for sample. |
| 144 | + |
| 145 | +<details> |
| 146 | + |
| 147 | +```console |
| 148 | +ubuntu@ip-10-0-0-137:~/serve$ docker run --rm --platform linux/amd64 \ |
| 149 | + --name llm_sd_app_bench \ |
| 150 | + -v /home/ubuntu/serve/model-store-local:/home/model-server/model-store \ |
| 151 | + --entrypoint python \ |
| 152 | + pytorch/torchserve:llm_diffusion_serving_app \ |
| 153 | + /home/model-server/llm_diffusion_serving_app/sd-benchmark.py -ni 3 |
| 154 | +. |
| 155 | +. |
| 156 | +. |
| 157 | + |
| 158 | +Hardware Info: |
| 159 | +-------------------------------------------------------------------------------- |
| 160 | +cpu_model: Intel(R) Xeon(R) Platinum 8488C |
| 161 | +cpu_count: 64 |
| 162 | +threads_per_core: 2 |
| 163 | +cores_per_socket: 32 |
| 164 | +socket_count: 1 |
| 165 | +total_memory: 247.71 GB |
| 166 | + |
| 167 | +Software Versions: |
| 168 | +-------------------------------------------------------------------------------- |
| 169 | +Python: 3.9.20 |
| 170 | +TorchServe: 0.12.0 |
| 171 | +OpenVINO: 2024.5.0 |
| 172 | +PyTorch: 2.5.1+cpu |
| 173 | +Transformers: 4.46.3 |
| 174 | +Diffusers: 0.31.0 |
| 175 | + |
| 176 | +Benchmark Summary: |
| 177 | +-------------------------------------------------------------------------------- |
| 178 | ++-------------+----------------+---------------------------+ |
| 179 | +| Run Mode | Warm-up Time | Average Time for 3 iter | |
| 180 | ++=============+================+===========================+ |
| 181 | +| eager | 11.25 seconds | 10.13 +/- 0.02 seconds | |
| 182 | ++-------------+----------------+---------------------------+ |
| 183 | +| tc_inductor | 85.40 seconds | 8.85 +/- 0.03 seconds | |
| 184 | ++-------------+----------------+---------------------------+ |
| 185 | +| tc_openvino | 52.57 seconds | 2.58 +/- 0.04 seconds | |
| 186 | ++-------------+----------------+---------------------------+ |
| 187 | + |
| 188 | +Results saved in directory: /home/model-server/model-store/benchmark_results_20241123_071103 |
| 189 | +Files in the /home/model-server/model-store/benchmark_results_20241123_071103 directory: |
| 190 | +benchmark_results.json |
| 191 | +image-eager-final.png |
| 192 | +image-tc_inductor-final.png |
| 193 | +image-tc_openvino-final.png |
| 194 | + |
| 195 | +Results saved at /home/model-server/model-store/ which is a Docker container mount, corresponds to 'serve/model-store-local/' on the host machine. |
| 196 | + |
| 197 | +``` |
| 198 | + |
| 199 | +</details> |
| 200 | + |
| 201 | +#### Sample Output of Stable Diffusion Benchmarking with Profiling: |
| 202 | +To run Stable Diffusion benchmarking with profiling, use `--run_profiling` or `-rp`. See details below for sample. Sample profiling benchmarking output files are available in [assets/benchmark_results_20241123_044407/](./assets/benchmark_results_20241123_044407/) |
| 203 | + |
| 204 | +<details> |
| 205 | + |
| 206 | +```console |
| 207 | +ubuntu@ip-10-0-0-137:~/serve$ docker run --rm --platform linux/amd64 \ |
| 208 | + --name llm_sd_app_bench \ |
| 209 | + -v /home/ubuntu/serve/model-store-local:/home/model-server/model-store \ |
| 210 | + --entrypoint python \ |
| 211 | + pytorch/torchserve:llm_diffusion_serving_app \ |
| 212 | + /home/model-server/llm_diffusion_serving_app/sd-benchmark.py -rp |
| 213 | +. |
| 214 | +. |
| 215 | +. |
| 216 | +Hardware Info: |
| 217 | +-------------------------------------------------------------------------------- |
| 218 | +cpu_model: Intel(R) Xeon(R) Platinum 8488C |
| 219 | +cpu_count: 64 |
| 220 | +threads_per_core: 2 |
| 221 | +cores_per_socket: 32 |
| 222 | +socket_count: 1 |
| 223 | +total_memory: 247.71 GB |
| 224 | + |
| 225 | +Software Versions: |
| 226 | +-------------------------------------------------------------------------------- |
| 227 | +Python: 3.9.20 |
| 228 | +TorchServe: 0.12.0 |
| 229 | +OpenVINO: 2024.5.0 |
| 230 | +PyTorch: 2.5.1+cpu |
| 231 | +Transformers: 4.46.3 |
| 232 | +Diffusers: 0.31.0 |
| 233 | + |
| 234 | +Benchmark Summary: |
| 235 | +-------------------------------------------------------------------------------- |
| 236 | ++-------------+----------------+---------------------------+ |
| 237 | +| Run Mode | Warm-up Time | Average Time for 1 iter | |
| 238 | ++=============+================+===========================+ |
| 239 | +| eager | 9.33 seconds | 8.57 +/- 0.00 seconds | |
| 240 | ++-------------+----------------+---------------------------+ |
| 241 | +| tc_inductor | 81.11 seconds | 7.20 +/- 0.00 seconds | |
| 242 | ++-------------+----------------+---------------------------+ |
| 243 | +| tc_openvino | 50.76 seconds | 1.72 +/- 0.00 seconds | |
| 244 | ++-------------+----------------+---------------------------+ |
| 245 | + |
| 246 | +Results saved in directory: /home/model-server/model-store/benchmark_results_20241123_071629 |
| 247 | +Files in the /home/model-server/model-store/benchmark_results_20241123_071629 directory: |
| 248 | +benchmark_results.json |
| 249 | +image-eager-final.png |
| 250 | +image-tc_inductor-final.png |
| 251 | +image-tc_openvino-final.png |
| 252 | +profile-eager.txt |
| 253 | +profile-tc_inductor.txt |
| 254 | +profile-tc_openvino.txt |
| 255 | + |
| 256 | +num_iter is set to 1 as run_profiling flag is enabled ! |
| 257 | + |
| 258 | +Results saved at /home/model-server/model-store/ which is a Docker container mount, corresponds to 'serve/model-store-local/' on the host machine. |
| 259 | + |
| 260 | +``` |
| 261 | + |
| 262 | +</details> |
| 263 | + |
| 264 | +## Multi-Image Generation App UI |
| 265 | + |
| 266 | +### App Workflow |
| 267 | + |
| 268 | + |
| 269 | +### App Screenshots |
| 270 | + |
| 271 | +<details> |
| 272 | + |
| 273 | +| Server App Screenshot 1 | Server App Screenshot 2 | Server App Screenshot 3 | |
| 274 | +| --- | --- | --- | |
| 275 | +| <img src="./docker/img/server-app-screen-1.png" width="400"> | <img src="./docker/img/server-app-screen-2.png" width="400"> | <img src="./docker/img/server-app-screen-3.png" width="400"> | |
| 276 | + |
| 277 | +| Client App Screenshot 1 | Client App Screenshot 2 | Client App Screenshot 3 | |
| 278 | +| --- | --- | --- | |
| 279 | +| <img src="./docker/img/client-app-screen-1.png" width="400"> | <img src="./docker/img/client-app-screen-2.png" width="400"> | <img src="./docker/img/client-app-screen-3.png" width="400"> | |
| 280 | + |
| 281 | +</details> |
0 commit comments