Skip to content

Commit 0b4539f

Browse files
lxningmreso
andauthored
Example llama3 on inf2 (#3133)
* add llama3 support * delete model config yaml * update model config * fix typo --------- Co-authored-by: Matthias Reso <[email protected]>
1 parent 239f91e commit 0b4539f

13 files changed

+475
-36
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Large model inference on Inferentia2
22

3-
This folder briefs on serving the [Llama 2](https://huggingface.co/meta-llama) model on [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) for text completion with TorchServe's features:
3+
This folder briefs on serving the [Llama 2 and Llama 3](https://huggingface.co/meta-llama) model on an [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) for text completion with TorchServe's features:
44

55
* demo1: [micro batching](https://github.com/pytorch/serve/tree/96450b9d0ab2a7290221f0e07aea5fda8a83efaf/examples/micro_batching) and [streaming response](https://github.com/pytorch/serve/blob/96450b9d0ab2a7290221f0e07aea5fda8a83efaf/docs/inference_api.md#curl-example-1) support in folder [streamer](streamer).
66
* demo2: continuous batching support in folder [continuous_batching](continuous_batching)

0 commit comments

Comments
 (0)