Skip to content

Commit d47b14d

Browse files
Jagadeesh Jchauhangagunapal
authored
feat: add session affinity to k8s TS (#2519)
* feat: add session affinity to k8s TS Signed-off-by: jagadeesh <[email protected]> * fix spell check Signed-off-by: jagadeesh <[email protected]> * fix docs Signed-off-by: jagadeesh <[email protected]> --------- Signed-off-by: jagadeesh <[email protected]> Co-authored-by: Geeta Chauhan <[email protected]> Co-authored-by: Ankith Gunapal <[email protected]>
1 parent 448aad3 commit d47b14d

9 files changed

+132
-11
lines changed

.pre-commit-config.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,12 @@ repos:
2525
- id: python-no-log-warn
2626
- id: python-use-type-annotations
2727
- repo: https://github.com/hadialqattan/pycln
28-
rev: v2.1.3
28+
rev: v2.1.5
2929
hooks:
3030
- id: pycln
3131
args: [--all]
3232
- repo: https://github.com/psf/black
33-
rev: 23.1.0
33+
rev: 23.7.0
3434
hooks:
3535
- id: black
3636
additional_dependencies: ['click==8.0.4']

kubernetes/Helm/templates/torchserve.yaml

+5-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,9 @@ spec:
2020
- name: metrics
2121
port: {{ .Values.torchserve.metrics_port }}
2222
targetPort: ts-metrics
23-
type: LoadBalancer
23+
- name: grpc
24+
port: {{ .Values.torchserve.grpc_inference_port }}
25+
targetPort: ts-grpc
2426
selector:
2527
app: torchserve
2628
---
@@ -55,6 +57,8 @@ spec:
5557
containerPort: {{ .Values.torchserve.management_port }}
5658
- name: ts-metrics
5759
containerPort: {{ .Values.torchserve.metrics_port }}
60+
- name: ts-grpc
61+
containerPort: {{ .Values.torchserve.grpc_inference_port }}
5862
imagePullPolicy: IfNotPresent
5963
volumeMounts:
6064
- mountPath: {{ .Values.torchserve.pvd_mount }}

kubernetes/Helm/values.yaml

+3-1
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,15 @@ torchserve:
88
management_port: 8081
99
inference_port: 8080
1010
metrics_port: 8082
11+
grpc_inference_port: 7070
12+
1113
pvd_mount: /home/model-server/shared/
1214
n_gpu: 4
1315
n_cpu: 16
1416
memory_limit: 32Gi
1517

1618
deployment:
17-
replicas: 1
19+
replicas: 2
1820

1921
persistentVolume:
2022
name: efs-claim

kubernetes/README.md

+46
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ torchserve:
5353
management_port: 8081
5454
inference_port: 8080
5555
metrics_port: 8082
56+
grpc_inference_port: 7070
5657
pvd_mount: /home/model-server/shared/
5758
n_gpu: 1
5859
n_cpu: 1
@@ -290,6 +291,51 @@ Follow the link for log aggregation with EFK Stack.\
290291
## Autoscaling
291292
[Autoscaling with torchserve metrics](autoscale.md)
292293

294+
## Session Affinity with Multiple Torchserve pods
295+
296+
### Pre-requisites
297+
298+
- Follow the instructions above and deploy Torchserve with more than 1 replica to the kubernetes cluster
299+
- Download Istio and add to path as shown [here](https://istio.io/latest/docs/setup/getting-started/#download)
300+
- Install Istio with below command
301+
- `istioctl install --set meshConfig.accessLogFile=/dev/stdout`
302+
303+
### Steps
304+
305+
Now we have multiple replicas of Torchserve running and istio installed. We can apply gateway, virtual service and destination rule to enable session affinity to the user requests.
306+
307+
- Apply the istio gateway via `kubectl apply -f gateway.yaml`
308+
- This gateway exposes all the host behind it via port 80 as defined in the yaml file.
309+
- Apply the virtual service with command `kubectl apply -f virtual_service.yaml`
310+
- This with look for header named `protocol` in the incoming request and forward the request to Torchserve service. If the `protocol` header has a value `rest` then the request is forwarded to port `8080` of Torchserve service and if the `protocol` header has a value `grpc` then the request is forwarded to port `7070` for Torchserve service.
311+
- Apply the destination Rule using the command `kubectl apply -f destination_rule.yaml`.
312+
- The destination rule look for a http cookie with a key `session_id`. The request with `session_id` is served by the same pod that served the previous request with the same `session_id`
313+
314+
### HTTP Inference
315+
316+
- Fetch the external IP from istio-ingress gateway using the below command
317+
318+
```bash
319+
ubuntu@ubuntu$ kubectl get svc -n istio-system
320+
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
321+
istio-ingressgateway LoadBalancer 10.100.84.243 a918b2zzzzzzzzzzzzzzzzzzzzzz-1466623565.us-west-2.elb.amazonaws.com 15021:32270/TCP,80:31978/TCP,443:31775/TCP,70:31778/TCP 2d6h
322+
```
323+
324+
- Make Request as shown below
325+
326+
```bash
327+
curl -v -H "protocol: REST" --cookie "session_id="12345" http://a918b2d70dbddzzzzzzzzzzz49ec8cf03b-1466623565.us-west-2.elb.amazonaws.com:80/predictions/<model_name> -d "data=<input-string>"
328+
```
329+
330+
### gRPC Inference
331+
332+
- Refer [grpc_api](../docs/grpc_api.md) to generate python files and run
333+
334+
```bash
335+
python ts_scripts/torchserve_grpc_client.py infer <model_name> <input-string>
336+
```
337+
338+
293339
## Roadmap
294340

295341
* [] Log / Metrics Aggregation using [AWS Container Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html)

kubernetes/destination_rule.yaml

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
apiVersion: networking.istio.io/v1alpha3
2+
kind: DestinationRule
3+
metadata:
4+
name: torchserve-dr
5+
spec:
6+
host: torchserve.default.svc.cluster.local # <ts-service-name>.<namespace>.svc.cluster.local
7+
trafficPolicy:
8+
loadBalancer:
9+
consistentHash:
10+
# httpHeaderName: x-user
11+
httpCookie:
12+
name: session_id
13+
ttl: 60s

kubernetes/gateway.yaml

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
apiVersion: networking.istio.io/v1beta1
2+
kind: Gateway
3+
metadata:
4+
name: torchserve-gw
5+
spec:
6+
selector:
7+
istio: ingressgateway
8+
servers:
9+
- hosts:
10+
- '*'
11+
port:
12+
name: http
13+
number: 80
14+
protocol: HTTP

kubernetes/virtual_service.yaml

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
apiVersion: networking.istio.io/v1alpha3
2+
kind: VirtualService
3+
metadata:
4+
name: torchserve-vs
5+
spec:
6+
hosts:
7+
- "*"
8+
gateways:
9+
- torchserve-gw
10+
http:
11+
- match:
12+
- uri:
13+
prefix: /metrics
14+
route:
15+
- destination:
16+
host: torchserve.default.svc.cluster.local
17+
port:
18+
number: 8082
19+
- match:
20+
- headers:
21+
protocol:
22+
exact: REST
23+
route:
24+
- destination:
25+
host: torchserve.default.svc.cluster.local # <ts-service-name>.<namespace>.svc.cluster.local
26+
port:
27+
number: 8080
28+
- match:
29+
- headers:
30+
protocol:
31+
exact: gRPC
32+
route:
33+
- destination:
34+
host: torchserve.default.svc.cluster.local # <ts-service-name>.<namespace>.svc.cluster.local
35+
port:
36+
number: 7070

ts_scripts/spellcheck_conf/wordlist.txt

+3
Original file line numberDiff line numberDiff line change
@@ -1065,6 +1065,9 @@ ActionSLAM
10651065
statins
10661066
ci
10671067
chatGPT
1068+
accessLogFile
1069+
istioctl
1070+
meshConfig
10681071
baseimage
10691072
cuDNN
10701073
Xformer

ts_scripts/torchserve_grpc_client.py

+10-7
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,14 @@ def get_management_stub():
1919
return stub
2020

2121

22-
def infer(stub, model_name, model_input):
22+
def infer(stub, model_name, model_input, metadata):
2323
with open(model_input, "rb") as f:
2424
data = f.read()
2525

2626
input_data = {"data": data}
2727
response = stub.Predictions(
28-
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data)
28+
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data),
29+
metadata=metadata,
2930
)
3031

3132
try:
@@ -35,13 +36,14 @@ def infer(stub, model_name, model_input):
3536
exit(1)
3637

3738

38-
def infer_stream(stub, model_name, model_input):
39+
def infer_stream(stub, model_name, model_input, metadata):
3940
with open(model_input, "rb") as f:
4041
data = f.read()
4142

4243
input_data = {"data": data}
4344
responses = stub.StreamPredictions(
44-
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data)
45+
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data),
46+
metadata=metadata,
4547
)
4648

4749
try:
@@ -92,7 +94,6 @@ def unregister(stub, model_name):
9294

9395

9496
if __name__ == "__main__":
95-
9697
parent_parser = argparse.ArgumentParser(add_help=False)
9798
parent_parser.add_argument(
9899
"model_name",
@@ -141,10 +142,12 @@ def unregister(stub, model_name):
141142

142143
args = parser.parse_args()
143144

145+
metadata = (("protocol", "gRPC"), ("session_id", "12345"))
146+
144147
if args.action == "infer":
145-
infer(get_inference_stub(), args.model_name, args.model_input)
148+
infer(get_inference_stub(), args.model_name, args.model_input, metadata)
146149
elif args.action == "infer_stream":
147-
infer_stream(get_inference_stub(), args.model_name, args.model_input)
150+
infer_stream(get_inference_stub(), args.model_name, args.model_input, metadata)
148151
elif args.action == "register":
149152
register(get_management_stub(), args.model_name, args.mar_set)
150153
elif args.action == "unregister":

0 commit comments

Comments
 (0)