Skip to content

Commit 55c2f6b

Browse files
ravi9likholatlikholatsuryasiddagunapal
authored
Adding Multi-Image generation usecase app (#3356)
* Added OV SDXL registration to chat_bot app * sdxl image generation * pass model params * fixes * fixes * llm-sd pipeline * store images * need to fix sd_xl checkbox * fix for num_of_img==1 * fix for 1 img, total time * perf fixes * fixes * llm with torch.compile * fixed tocken auth issue, ui fixes * gpt fast version, bad quality of output prompts * rm extra files, updated readme * added llama params, sd default res 768, better prompts * fix, updated default workers num * button for prompts generation * fix * fix * Changed SDXL to LCM SDXL * updated lcm example * updated lcm example * updated lcm example * add llm_sd_app * Updated llm_diffusion_serving_app * Updated llm_diffusion_serving_app * Update llm_diffusion_serving_app * Update llm_diffusion_serving_app * Update examples/usecases/llm_diffusion_serving_app/Readme.md Co-authored-by: Ankith Gunapal <[email protected]> * Update llm_diffusion_serving_app * update llm_diffusion_serving_app * update llm_diffusion_serving_app * update llm_diffusion_serving_app * Update llm_diffusion_serving_app * Update llm_diffusion_serving_app * Minor Updates, Added sd_benchmark * Add docs for llm_diffusion_serving_app * Apply suggestions from code review Co-authored-by: Ankith Gunapal <[email protected]> * Update llm_diffusion_serving_app, fix linter issues * Update img, add assests * update readme --------- Co-authored-by: likholat <[email protected]> Co-authored-by: likholat <[email protected]> Co-authored-by: suryasidd <[email protected]> Co-authored-by: Ankith Gunapal <[email protected]>
1 parent f4fbcbe commit 55c2f6b

36 files changed

+2423
-1
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ test/model_store/
3030
test/ts_console.log
3131
test/config.properties
3232

33+
model-store-local/
3334

3435
.vscode
3536
.scratch/

docs/genai_use_cases.md

+4
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,7 @@ This document shows interesting usecases with TorchServe for Gen AI deployments.
55
## [Enhancing LLM Serving with Torch Compiled RAG on AWS Graviton](https://pytorch.org/serve/enhancing_llm_serving_compile_rag.html)
66

77
In this blog, we show how to deploy a RAG Endpoint using TorchServe, increase throughput using `torch.compile` and improve the response generated by the Llama Endpoint. We also show how the RAG endpoint can be deployed on CPU using AWS Graviton, while the Llama endpoint is still deployed on a GPU. This kind of microservices-based RAG solution efficiently utilizes compute resources, resulting in potential cost savings for customers.
8+
9+
## [Multi-Image Generation Streamlit App: Chaining Llama & Stable Diffusion using TorchServe, torch.compile & OpenVINO](https://pytorch.org/serve/llm_diffusion_serving_app.html)
10+
11+
This Multi-Image Generation Streamlit app is designed to generate multiple images based on a provided text prompt. Instead of using Stable Diffusion directly, this app chains Llama and Stable Diffusion to enhance the image generation process. This multi-image generation use case exemplifies the powerful synergy of cutting-edge AI technologies: TorchServe, OpenVINO, Torch.compile, Meta-Llama, and Stable Diffusion.

docs/sphinx/Makefile

+2-1
Original file line numberDiff line numberDiff line change
@@ -26,5 +26,6 @@ docset: html
2626
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
2727
%: Makefile
2828
cp ../../SECURITY.md ../security.md
29-
cp ../../examples//usecases/RAG_based_LLM_serving/README.md ../enhancing_llm_serving_compile_rag.md
29+
cp ../../examples/usecases/RAG_based_LLM_serving/README.md ../enhancing_llm_serving_compile_rag.md
30+
cp ../../examples/usecases/llm_diffusion_serving_app/README.md ../llm_diffusion_serving_app.md
3031
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,281 @@
1+
2+
## Multi-Image Generation Streamlit App: Chaining Llama & Stable Diffusion using TorchServe, torch.compile & OpenVINO
3+
4+
This Multi-Image Generation Streamlit app is designed to generate multiple images based on a provided text prompt. Instead of using Stable Diffusion directly, this app chains Llama and Stable Diffusion to enhance the image generation process. Here’s how it works:
5+
- The app takes a user prompt and uses [Meta-Llama-3.2](https://huggingface.co/meta-llama) to create multiple interesting and relevant prompts.
6+
- These generated prompts are then sent to Stable Diffusion with [latent-consistency/lcm-sdxl](https://huggingface.co/latent-consistency/lcm-sdxl) model, to generate images.
7+
- For performance optimization, the models are compiled using [torch.compile using OpenVINO backend.](https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html)
8+
- The application leverages [TorchServe](https://pytorch.org/serve/) for efficient model serving and management.
9+
10+
![Multi-Image Generation App Workflow](./docker/img/workflow-1.png)
11+
12+
## Quick Start Guide
13+
14+
**Prerequisites**:
15+
- Docker installed on your system
16+
- Hugging Face Token: Create a Hugging Face account and obtain a token with access to the [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) model.
17+
18+
To launch the Multi-Image Generation App, follow these steps:
19+
```bash
20+
# 1: Set HF Token as Env variable
21+
export HUGGINGFACE_TOKEN=<HUGGINGFACE_TOKEN>
22+
23+
# 2: Build Docker image for this Multi-Image Generation App
24+
git clone https://github.com/pytorch/serve.git
25+
cd serve
26+
./examples/usecases/llm_diffusion_serving_app/docker/build_image.sh
27+
28+
# 3: Launch the streamlit app for server & client
29+
# After the Docker build is successful, you will see a "docker run" command printed to the console.
30+
# Run that "docker run" command to launch the Streamlit app for both the server and client.
31+
```
32+
33+
#### Sample Output of Docker Build:
34+
35+
<details>
36+
37+
```console
38+
ubuntu@ip-10-0-0-137:~/serve$ ./examples/usecases/llm_diffusion_serving_app/docker/build_image.sh
39+
EXAMPLE_DIR: .//examples/usecases/llm_diffusion_serving_app/docker
40+
ROOT_DIR: /home/ubuntu/serve
41+
DOCKER_BUILDKIT=1 docker buildx build --platform=linux/amd64 --file .//examples/usecases/llm_diffusion_serving_app/docker/Dockerfile --build-arg BASE_IMAGE="pytorch/torchserve:latest-cpu" --build-arg EXAMPLE_DIR=".//examples/usecases/llm_diffusion_serving_app/docker" --build-arg HUGGINGFACE_TOKEN=hf_<token> --build-arg HTTP_PROXY= --build-arg HTTPS_PROXY= --build-arg NO_PROXY= -t "pytorch/torchserve:llm_diffusion_serving_app" .
42+
[+] Building 1.4s (18/18) FINISHED docker:default
43+
=> [internal] load .dockerignore 0.0s
44+
.
45+
.
46+
.
47+
=> => naming to docker.io/pytorch/torchserve:llm_diffusion_serving_app 0.0s
48+
49+
Docker Build Successful !
50+
51+
............................ Next Steps ............................
52+
--------------------------------------------------------------------
53+
[Optional] Run the following command to benchmark Stable Diffusion:
54+
--------------------------------------------------------------------
55+
56+
docker run --rm --platform linux/amd64 \
57+
--name llm_sd_app_bench \
58+
-v /home/ubuntu/serve/model-store-local:/home/model-server/model-store \
59+
--entrypoint python \
60+
pytorch/torchserve:llm_diffusion_serving_app \
61+
/home/model-server/llm_diffusion_serving_app/sd-benchmark.py -ni 3
62+
63+
-------------------------------------------------------------------
64+
Run the following command to start the Multi-Image generation App:
65+
-------------------------------------------------------------------
66+
67+
docker run --rm -it --platform linux/amd64 \
68+
--name llm_sd_app \
69+
-p 127.0.0.1:8080:8080 \
70+
-p 127.0.0.1:8081:8081 \
71+
-p 127.0.0.1:8082:8082 \
72+
-p 127.0.0.1:8084:8084 \
73+
-p 127.0.0.1:8085:8085 \
74+
-v /home/ubuntu/serve/model-store-local:/home/model-server/model-store \
75+
-e MODEL_NAME_LLM=meta-llama/Llama-3.2-3B-Instruct \
76+
-e MODEL_NAME_SD=stabilityai/stable-diffusion-xl-base-1.0 \
77+
pytorch/torchserve:llm_diffusion_serving_app
78+
79+
Note: You can replace the model identifiers (MODEL_NAME_LLM, MODEL_NAME_SD) as needed.
80+
81+
```
82+
83+
</details>
84+
85+
## What to expect
86+
After launching the Docker container using the `docker run ..` command displayed after successful build, you can access two separate Streamlit applications:
87+
1. TorchServe Server App (running at http://localhost:8084) to start/stop TorchServe, load/register models, scale up/down workers.
88+
2. Client App (running at http://localhost:8085) where you can enter prompt for Image generation.
89+
90+
> Note: You could also run a quick benchmark comparing performance of Stable Diffusion with Eager, torch.compile with inductor and openvino.
91+
> Review the `docker run ..` command displayed after successful build for benchmarking
92+
93+
#### Sample Output of Starting the App:
94+
95+
<details>
96+
97+
```console
98+
ubuntu@ip-10-0-0-137:~/serve$ docker run --rm -it --platform linux/amd64 \
99+
--name llm_sd_app \
100+
-p 127.0.0.1:8080:8080 \
101+
-p 127.0.0.1:8081:8081 \
102+
-p 127.0.0.1:8082:8082 \
103+
-p 127.0.0.1:8084:8084 \
104+
-p 127.0.0.1:8085:8085 \
105+
-v /home/ubuntu/serve/model-store-local:/home/model-server/model-store \
106+
-e MODEL_NAME_LLM=meta-llama/Llama-3.2-3B-Instruct \
107+
-e MODEL_NAME_SD=stabilityai/stable-diffusion-xl-base-1.0 \
108+
pytorch/torchserve:llm_diffusion_serving_app
109+
110+
Preparing meta-llama/Llama-3.2-1B-Instruct
111+
/home/model-server/llm_diffusion_serving_app/llm /home/model-server/llm_diffusion_serving_app
112+
Model meta-llama---Llama-3.2-1B-Instruct already downloaded.
113+
Model archive for meta-llama---Llama-3.2-1B-Instruct exists.
114+
/home/model-server/llm_diffusion_serving_app
115+
116+
Preparing stabilityai/stable-diffusion-xl-base-1.0
117+
/home/model-server/llm_diffusion_serving_app/sd /home/model-server/llm_diffusion_serving_app
118+
Model stabilityai/stable-diffusion-xl-base-1.0 already downloaded
119+
Model archive for stabilityai---stable-diffusion-xl-base-1.0 exists.
120+
/home/model-server/llm_diffusion_serving_app
121+
122+
Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
123+
124+
Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
125+
126+
You can now view your Streamlit app in your browser.
127+
128+
Local URL: http://localhost:8085
129+
Network URL: http://123.11.0.2:8085
130+
External URL: http://123.123.12.34:8085
131+
132+
133+
You can now view your Streamlit app in your browser.
134+
135+
Local URL: http://localhost:8084
136+
Network URL: http://123.11.0.2:8084
137+
External URL: http://123.123.12.34:8084
138+
```
139+
140+
</details>
141+
142+
#### Sample Output of Stable Diffusion Benchmarking:
143+
To run Stable Diffusion benchmarking, use the `sd-benchmark.py`. See details below for sample.
144+
145+
<details>
146+
147+
```console
148+
ubuntu@ip-10-0-0-137:~/serve$ docker run --rm --platform linux/amd64 \
149+
--name llm_sd_app_bench \
150+
-v /home/ubuntu/serve/model-store-local:/home/model-server/model-store \
151+
--entrypoint python \
152+
pytorch/torchserve:llm_diffusion_serving_app \
153+
/home/model-server/llm_diffusion_serving_app/sd-benchmark.py -ni 3
154+
.
155+
.
156+
.
157+
158+
Hardware Info:
159+
--------------------------------------------------------------------------------
160+
cpu_model: Intel(R) Xeon(R) Platinum 8488C
161+
cpu_count: 64
162+
threads_per_core: 2
163+
cores_per_socket: 32
164+
socket_count: 1
165+
total_memory: 247.71 GB
166+
167+
Software Versions:
168+
--------------------------------------------------------------------------------
169+
Python: 3.9.20
170+
TorchServe: 0.12.0
171+
OpenVINO: 2024.5.0
172+
PyTorch: 2.5.1+cpu
173+
Transformers: 4.46.3
174+
Diffusers: 0.31.0
175+
176+
Benchmark Summary:
177+
--------------------------------------------------------------------------------
178+
+-------------+----------------+---------------------------+
179+
| Run Mode | Warm-up Time | Average Time for 3 iter |
180+
+=============+================+===========================+
181+
| eager | 11.25 seconds | 10.13 +/- 0.02 seconds |
182+
+-------------+----------------+---------------------------+
183+
| tc_inductor | 85.40 seconds | 8.85 +/- 0.03 seconds |
184+
+-------------+----------------+---------------------------+
185+
| tc_openvino | 52.57 seconds | 2.58 +/- 0.04 seconds |
186+
+-------------+----------------+---------------------------+
187+
188+
Results saved in directory: /home/model-server/model-store/benchmark_results_20241123_071103
189+
Files in the /home/model-server/model-store/benchmark_results_20241123_071103 directory:
190+
benchmark_results.json
191+
image-eager-final.png
192+
image-tc_inductor-final.png
193+
image-tc_openvino-final.png
194+
195+
Results saved at /home/model-server/model-store/ which is a Docker container mount, corresponds to 'serve/model-store-local/' on the host machine.
196+
197+
```
198+
199+
</details>
200+
201+
#### Sample Output of Stable Diffusion Benchmarking with Profiling:
202+
To run Stable Diffusion benchmarking with profiling, use `--run_profiling` or `-rp`. See details below for sample. Sample profiling benchmarking output files are available in [assets/benchmark_results_20241123_044407/](./assets/benchmark_results_20241123_044407/)
203+
204+
<details>
205+
206+
```console
207+
ubuntu@ip-10-0-0-137:~/serve$ docker run --rm --platform linux/amd64 \
208+
--name llm_sd_app_bench \
209+
-v /home/ubuntu/serve/model-store-local:/home/model-server/model-store \
210+
--entrypoint python \
211+
pytorch/torchserve:llm_diffusion_serving_app \
212+
/home/model-server/llm_diffusion_serving_app/sd-benchmark.py -rp
213+
.
214+
.
215+
.
216+
Hardware Info:
217+
--------------------------------------------------------------------------------
218+
cpu_model: Intel(R) Xeon(R) Platinum 8488C
219+
cpu_count: 64
220+
threads_per_core: 2
221+
cores_per_socket: 32
222+
socket_count: 1
223+
total_memory: 247.71 GB
224+
225+
Software Versions:
226+
--------------------------------------------------------------------------------
227+
Python: 3.9.20
228+
TorchServe: 0.12.0
229+
OpenVINO: 2024.5.0
230+
PyTorch: 2.5.1+cpu
231+
Transformers: 4.46.3
232+
Diffusers: 0.31.0
233+
234+
Benchmark Summary:
235+
--------------------------------------------------------------------------------
236+
+-------------+----------------+---------------------------+
237+
| Run Mode | Warm-up Time | Average Time for 1 iter |
238+
+=============+================+===========================+
239+
| eager | 9.33 seconds | 8.57 +/- 0.00 seconds |
240+
+-------------+----------------+---------------------------+
241+
| tc_inductor | 81.11 seconds | 7.20 +/- 0.00 seconds |
242+
+-------------+----------------+---------------------------+
243+
| tc_openvino | 50.76 seconds | 1.72 +/- 0.00 seconds |
244+
+-------------+----------------+---------------------------+
245+
246+
Results saved in directory: /home/model-server/model-store/benchmark_results_20241123_071629
247+
Files in the /home/model-server/model-store/benchmark_results_20241123_071629 directory:
248+
benchmark_results.json
249+
image-eager-final.png
250+
image-tc_inductor-final.png
251+
image-tc_openvino-final.png
252+
profile-eager.txt
253+
profile-tc_inductor.txt
254+
profile-tc_openvino.txt
255+
256+
num_iter is set to 1 as run_profiling flag is enabled !
257+
258+
Results saved at /home/model-server/model-store/ which is a Docker container mount, corresponds to 'serve/model-store-local/' on the host machine.
259+
260+
```
261+
262+
</details>
263+
264+
## Multi-Image Generation App UI
265+
266+
### App Workflow
267+
![Multi-Image Generation App Workflow Gif](./docker/img/multi-image-gen-app.gif)
268+
269+
### App Screenshots
270+
271+
<details>
272+
273+
| Server App Screenshot 1 | Server App Screenshot 2 | Server App Screenshot 3 |
274+
| --- | --- | --- |
275+
| <img src="./docker/img/server-app-screen-1.png" width="400"> | <img src="./docker/img/server-app-screen-2.png" width="400"> | <img src="./docker/img/server-app-screen-3.png" width="400"> |
276+
277+
| Client App Screenshot 1 | Client App Screenshot 2 | Client App Screenshot 3 |
278+
| --- | --- | --- |
279+
| <img src="./docker/img/client-app-screen-1.png" width="400"> | <img src="./docker/img/client-app-screen-2.png" width="400"> | <img src="./docker/img/client-app-screen-3.png" width="400"> |
280+
281+
</details>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
{
2+
"timestamp": "2024-11-23T04:44:07.510110",
3+
"hardware_config": {
4+
"cpu_model": "Intel(R) Xeon(R) Platinum 8488C",
5+
"cpu_count": "64",
6+
"threads_per_core": "2",
7+
"cores_per_socket": "32",
8+
"socket_count": "1",
9+
"total_memory": "247.71 GB"
10+
},
11+
"software_versions": {
12+
"Python": "3.9.20",
13+
"TorchServe": "0.12.0",
14+
"OpenVINO": "2024.5.0",
15+
"PyTorch": "2.5.1+cpu",
16+
"Transformers": "4.46.3",
17+
"Diffusers": "0.31.0"
18+
},
19+
"benchmark_results": [
20+
{
21+
"run_mode": "eager",
22+
"warmup_time": 11.164182662963867,
23+
"statistics": {
24+
"mean": 10.437215328216553,
25+
"std": 0.0,
26+
"all_iterations": [
27+
10.437215328216553
28+
]
29+
}
30+
},
31+
{
32+
"run_mode": "tc_inductor",
33+
"warmup_time": 83.48197150230408,
34+
"statistics": {
35+
"mean": 8.774884462356567,
36+
"std": 0.0,
37+
"all_iterations": [
38+
8.774884462356567
39+
]
40+
}
41+
},
42+
{
43+
"run_mode": "tc_openvino",
44+
"warmup_time": 52.01788377761841,
45+
"statistics": {
46+
"mean": 2.633979082107544,
47+
"std": 0.0,
48+
"all_iterations": [
49+
2.633979082107544
50+
]
51+
}
52+
}
53+
]
54+
}

0 commit comments

Comments
 (0)