Skip to content

Commit 0dcb5ca

Browse files
committed
TorchServe quickstart example
1 parent f0f97de commit 0dcb5ca

27 files changed

+1168
-0
lines changed

README.md

+14
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,20 @@ docker pull pytorch/torchserve-nightly
5555

5656
Refer to [torchserve docker](docker/README.md) for details.
5757

58+
### 🚀 Quick Start Example
59+
60+
```bash
61+
62+
./examples/getting_started/build_image.sh vit
63+
64+
docker run --rm -it --env TORCH_COMPILE=false --env MODEL_NAME=vit --platform linux/amd64 -p 127.0.0.1:8080:8080 -v /home/ubuntu/serve/model_store:/home/model-server/model-store pytorch/torchserve:demo
65+
66+
# In another terminal, run the following command for inference
67+
curl http://127.0.0.1:8080/predictions/vit -T ./examples/image_classifier/kitten.jpg
68+
```
69+
70+
Refer to [TorchServe Quick Start Example](https://github.com/pytorch/serve/blob/master/examples/getting_started/README.md) for details.
71+
5872
## ⚡ Why TorchServe
5973
* Write once, run anywhere, on-prem, on-cloud, supports inference on CPUs, GPUs, AWS Inf1/Inf2/Trn1, Google Cloud TPUs, [Nvidia MPS](docs/nvidia_mps.md)
6074
* [Model Management API](docs/management_api.md): multi model management with optimized worker to model allocation

examples/getting_started/Dockerfile

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
ARG BASE_IMAGE=pytorch/torchserve:latest-cpu
2+
3+
FROM $BASE_IMAGE as server
4+
ARG BASE_IMAGE
5+
ARG EXAMPLE_DIR
6+
ARG HUGGINGFACE_TOKEN
7+
ENV MODEL_NAME=$MODEL_NAME
8+
ENV TORCH_COMPILE=$TORCH_COMPILE
9+
10+
USER root
11+
12+
RUN --mount=type=cache,id=apt-dev,target=/var/cache/apt \
13+
apt-get update && \
14+
apt-get install jq wget -y
15+
16+
17+
COPY $EXAMPLE_DIR/requirements.txt /home/model-server/getting_started/requirements.txt
18+
RUN pip install -r /home/model-server/getting_started/requirements.txt
19+
20+
RUN \
21+
if echo "$MODEL_NAME" | grep -q "BERT"; then \
22+
huggingface-cli login --token $HUGGINGFACE_TOKEN; \
23+
fi
24+
25+
COPY $EXAMPLE_DIR /home/model-server/getting_started
26+
COPY $EXAMPLE_DIR/dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh
27+
COPY $EXAMPLE_DIR/config.properties /home/model-server/config.properties
28+
29+
WORKDIR /home/model-server/getting_started
30+
RUN chmod +x /usr/local/bin/dockerd-entrypoint.sh \
31+
&& chown -R model-server /home/model-server
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
import argparse
2+
import json
3+
import os
4+
5+
import torch
6+
import transformers
7+
from transformers import (
8+
AutoConfig,
9+
AutoModelForCausalLM,
10+
AutoModelForQuestionAnswering,
11+
AutoModelForSequenceClassification,
12+
AutoModelForTokenClassification,
13+
AutoTokenizer,
14+
set_seed,
15+
)
16+
17+
print("Transformers version", transformers.__version__)
18+
set_seed(1)
19+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
20+
21+
22+
def dir_path(path_str):
23+
if os.path.isdir(path_str):
24+
return path_str
25+
else:
26+
print(f"{path_str} does not exist, creating directory")
27+
os.makedirs(path_str)
28+
return path_str
29+
30+
31+
def transformers_model_dowloader(
32+
mode,
33+
pretrained_model_name,
34+
num_labels,
35+
do_lower_case,
36+
max_length,
37+
torchscript,
38+
hardware,
39+
batch_size,
40+
model_path,
41+
):
42+
"""This function, save the checkpoint, config file along with tokenizer config and vocab files
43+
of a transformer model of your choice.
44+
"""
45+
print("Download model and tokenizer", pretrained_model_name)
46+
# loading pre-trained model and tokenizer
47+
if mode == "sequence_classification":
48+
config = AutoConfig.from_pretrained(
49+
pretrained_model_name, num_labels=num_labels, torchscript=torchscript
50+
)
51+
model = AutoModelForSequenceClassification.from_pretrained(
52+
pretrained_model_name, config=config
53+
)
54+
tokenizer = AutoTokenizer.from_pretrained(
55+
pretrained_model_name, do_lower_case=do_lower_case
56+
)
57+
elif mode == "question_answering":
58+
config = AutoConfig.from_pretrained(
59+
pretrained_model_name, torchscript=torchscript
60+
)
61+
model = AutoModelForQuestionAnswering.from_pretrained(
62+
pretrained_model_name, config=config
63+
)
64+
tokenizer = AutoTokenizer.from_pretrained(
65+
pretrained_model_name, do_lower_case=do_lower_case
66+
)
67+
elif mode == "token_classification":
68+
config = AutoConfig.from_pretrained(
69+
pretrained_model_name, num_labels=num_labels, torchscript=torchscript
70+
)
71+
model = AutoModelForTokenClassification.from_pretrained(
72+
pretrained_model_name, config=config
73+
)
74+
tokenizer = AutoTokenizer.from_pretrained(
75+
pretrained_model_name, do_lower_case=do_lower_case
76+
)
77+
elif mode == "text_generation":
78+
config = AutoConfig.from_pretrained(
79+
pretrained_model_name, num_labels=num_labels, torchscript=torchscript
80+
)
81+
model = AutoModelForCausalLM.from_pretrained(
82+
pretrained_model_name, config=config
83+
)
84+
tokenizer = AutoTokenizer.from_pretrained(
85+
pretrained_model_name, do_lower_case=do_lower_case
86+
)
87+
88+
# NOTE : for demonstration purposes, we do not go through the fine-tune processing here.
89+
# A Fine_tunining process based on your needs can be added.
90+
# An example of Fine_tuned model has been provided in the README.
91+
92+
print(
93+
"Save model and tokenizer/ Torchscript model based on the setting from setup_config",
94+
pretrained_model_name,
95+
"in directory",
96+
model_path,
97+
)
98+
if save_mode == "pretrained":
99+
model.save_pretrained(model_path)
100+
tokenizer.save_pretrained(model_path)
101+
elif save_mode == "torchscript":
102+
dummy_input = "This is a dummy input for torch jit trace"
103+
inputs = tokenizer.encode_plus(
104+
dummy_input,
105+
max_length=int(max_length),
106+
pad_to_max_length=True,
107+
add_special_tokens=True,
108+
return_tensors="pt",
109+
)
110+
model.to(device).eval()
111+
if hardware == "neuron":
112+
import torch_neuron
113+
114+
input_ids = torch.cat([inputs["input_ids"]] * batch_size, 0).to(device)
115+
attention_mask = torch.cat([inputs["attention_mask"]] * batch_size, 0).to(
116+
device
117+
)
118+
traced_model = torch_neuron.trace(model, (input_ids, attention_mask))
119+
torch.jit.save(
120+
traced_model,
121+
os.path.join(
122+
NEW_DIR,
123+
"traced_{}_model_neuron_batch_{}.pt".format(model_name, batch_size),
124+
),
125+
)
126+
elif hardware == "neuronx":
127+
import torch_neuronx
128+
129+
input_ids = torch.cat([inputs["input_ids"]] * batch_size, 0).to(device)
130+
attention_mask = torch.cat([inputs["attention_mask"]] * batch_size, 0).to(
131+
device
132+
)
133+
traced_model = torch_neuronx.trace(model, (input_ids, attention_mask))
134+
torch.jit.save(
135+
traced_model,
136+
os.path.join(
137+
NEW_DIR,
138+
"traced_{}_model_neuronx_batch_{}.pt".format(
139+
model_name, batch_size
140+
),
141+
),
142+
)
143+
else:
144+
input_ids = inputs["input_ids"].to(device)
145+
attention_mask = inputs["attention_mask"].to(device)
146+
traced_model = torch.jit.trace(model, (input_ids, attention_mask))
147+
torch.jit.save(traced_model, os.path.join(NEW_DIR, "traced_model.pt"))
148+
return
149+
150+
151+
if __name__ == "__main__":
152+
parser = argparse.ArgumentParser()
153+
parser.add_argument(
154+
"--model_path",
155+
"-o",
156+
type=dir_path,
157+
default="model",
158+
help="Output directory for downloaded model files",
159+
)
160+
parser.add_argument("--cfg", "-c", type=str, required=True, help="Config")
161+
args = parser.parse_args()
162+
dirname = os.path.dirname(__file__)
163+
f = open(args.cfg)
164+
settings = json.load(f)
165+
mode = settings["mode"]
166+
model_name = settings["model_name"]
167+
num_labels = int(settings["num_labels"])
168+
do_lower_case = settings["do_lower_case"]
169+
max_length = settings["max_length"]
170+
save_mode = settings["save_mode"]
171+
if save_mode == "torchscript":
172+
torchscript = True
173+
else:
174+
torchscript = False
175+
hardware = settings.get("hardware")
176+
batch_size = int(settings.get("batch_size", "1"))
177+
178+
transformers_model_dowloader(
179+
mode,
180+
model_name,
181+
num_labels,
182+
do_lower_case,
183+
max_length,
184+
torchscript,
185+
hardware,
186+
batch_size,
187+
args.model_path,
188+
)

examples/getting_started/README.md

+57
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# TorchServe Quick Start Examples
2+
3+
## Pre-requisites
4+
5+
1) Docker for CPU runs. To make use of Nvidia GPU, please make sure you have nvidia-docker installed.
6+
7+
## Quick Start Example
8+
To quickly get started with TorchServe, you can execute the following commands where `serve` is cloned.
9+
10+
```
11+
./examples/getting_started/build_image.sh vit
12+
13+
docker run --rm -it --env TORCH_COMPILE=false --env MODEL_NAME=vit --platform linux/amd64 -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -v /home/ubuntu/serve/model_store_1:/home/model-server/model-store pytorch/torchserve:demo
14+
```
15+
16+
You can point `/home/ubuntu/serve/model_store_1` to a volume where you want the model archives to be stored
17+
18+
In another terminal, run the following command for inference
19+
```
20+
curl http://127.0.0.1:8080/predictions/vit -T ./examples/image_classifier/kitten.jpg
21+
```
22+
23+
### Supported models
24+
25+
The following models are supported in this example
26+
```
27+
resnet, densenet, vit, fasterrcnn, bertsc, berttc, bertqa, berttg
28+
```
29+
30+
We use HuggingFace BERT models. So you need to set `HUGGINGFACE_TOKEN`
31+
32+
```
33+
export HUGGINGFACE_TOKEN=< Your token>
34+
```
35+
36+
### `torch.compile`
37+
38+
To enable `torch.compile` with these models, pass this optional argument `--torch.compile`
39+
40+
```
41+
./examples/getting_started/build_image.sh resnet --torch.compile
42+
```
43+
44+
## Register multiple models
45+
46+
TorchServe supports multi-model endpoints out of the box. Once, you have loaded a model, you can register it along with any other model using TorchServe's management API.
47+
Depending on the amount of memory (or GPU memory) you have on your machine, you can load as many models.
48+
49+
```
50+
curl -X POST "127.0.0.1:8081/models?model_name=resnet&url=/home/ubuntu/serve/model_store_1/resnet"
51+
```
52+
You can check all the loaded models using
53+
```
54+
curl -X GET "127.0.0.1:8081/models"
55+
```
56+
57+
For other management APIs, please refer to the [document](https://github.com/pytorch/serve/blob/master/docs/management_api.md)

0 commit comments

Comments
 (0)