You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*[Kubernetes](master/kubernetes) with support for [autoscaling](kubernetes#session-affinity-with-multiple-torchserve-pods), session-affinity, monitoring using Grafana works on-prem, AWS EKS, Google GKE, Azure AKS
65
+
*[Kubernetes](kubernetes) with support for [autoscaling](kubernetes#session-affinity-with-multiple-torchserve-pods), session-affinity, monitoring using Grafana works on-prem, AWS EKS, Google GKE, Azure AKS
66
66
*[Kserve](https://kserve.github.io/website/0.8/modelserving/v1beta1/torchserve/): Supports both v1 and v2 API, [autoscaling and canary deployments](kubernetes/kserve/README.md#autoscaling) for A/B testing
@@ -71,11 +71,11 @@ Refer to [torchserve docker](docker/README.md) for details.
71
71
*[Expressive handlers](CONTRIBUTING.md): An expressive handler architecture that makes it trivial to support inferencing for your use case with [many supported out of the box](https://github.com/pytorch/serve/tree/master/ts/torch_handler)
72
72
*[Metrics API](docs/metrics.md): out-of-the-box support for system-level metrics with [Prometheus exports](https://github.com/pytorch/serve/tree/master/examples/custom_metrics), custom metrics,
73
73
*[Large Model Inference Guide](docs/large_model_inference.md): With support for GenAI, LLMs including
74
-
*[SOTA GenAI performance](https://github.com/pytorch/serve/tree/docs/master/examples/pt2#torchcompile-genai-examples) using `torch.compile`
74
+
*[SOTA GenAI performance](https://github.com/pytorch/serve/tree/master/examples/pt2#torchcompile-genai-examples) using `torch.compile`
75
75
* Fast Kernels with FlashAttention v2, continuous batching and streaming response
* Microsoft [DeepSpeed](examples/large_models/deepspeed), [DeepSpeed-Mii](examples/large_models/deepspeed_mii)
78
-
* Hugging Face [Accelerate](large_models/Huggingface_accelerate), [Diffusers](examples/diffusers)
78
+
* Hugging Face [Accelerate](examples/large_models/Huggingface_accelerate), [Diffusers](examples/diffusers)
79
79
* Running large models on AWS [Sagemaker](https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-tutorials-torchserve.html) and [Inferentia2](https://pytorch.org/blog/high-performance-llama/)
80
80
* Running [Llama 2 Chatbot locally on Mac](examples/LLM/llama2)
81
81
* Monitoring using Grafana and [Datadog](https://www.datadoghq.com/blog/ai-integrations/#model-serving-and-deployment-vertex-ai-amazon-sagemaker-torchserve)
@@ -114,7 +114,7 @@ To learn more about how to contribute, see the contributor guide [here](https://
114
114
## 📰 News
115
115
*[High performance Llama 2 deployments with AWS Inferentia2 using TorchServe](https://pytorch.org/blog/high-performance-llama/)
116
116
*[Naver Case Study: Transition From High-Cost GPUs to Intel CPUs and oneAPI powered Software with performance](https://pytorch.org/blog/ml-model-server-resource-saving/)
117
-
*[Run multiple generative AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe and save up to 75% in inference costs](https://aws.amazon.com/blogs/machine-learning/run-multiple-generative-ai-models-on-gpu-using-amazon-sagemaker-multi-model-endpoints-with-torchserve-and-save-up-to-75-in-inference-costs/)
117
+
*[Run multiple generative AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe and save up to 75% in inference costs](https://pytorch.org/blog/amazon-sagemaker-w-torchserve/)
118
118
*[Deploying your Generative AI model in only four steps with Vertex AI and PyTorch](https://cloud.google.com/blog/products/ai-machine-learning/get-your-genai-model-going-in-four-easy-steps)
119
119
*[PyTorch Model Serving on Google Cloud TPU v5](https://cloud.google.com/tpu/docs/v5e-inference#pytorch-model-inference-and-serving)
120
120
*[Monitoring using Datadog](https://www.datadoghq.com/blog/ai-integrations/#model-serving-and-deployment-vertex-ai-amazon-sagemaker-torchserve)
Copy file name to clipboardexpand all lines: docs/performance_guide.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ Models which have been fully optimized with `torch.compile` show performance imp
17
17
18
18
You can find all the examples of `torch.compile` with TorchServe [here](https://github.com/pytorch/serve/tree/master/examples/pt2)
19
19
20
-
Details regarding `torch.compile` GenAI examples can be found in this [link](https://github.com/pytorch/serve/tree/docs/master/examples/pt2#torchcompile-genai-examples)
20
+
Details regarding `torch.compile` GenAI examples can be found in this [link](https://github.com/pytorch/serve/tree/master/examples/pt2#torchcompile-genai-examples)
Copy file name to clipboardexpand all lines: examples/pt2/torch_export_aot_compile/README.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
This example shows how to run TorchServe with Torch exported model with AOTInductor
4
4
5
-
To understand when to use `torch._export.aot_compile`, please refer to this [section](../README.md/#torchexportaotcompile)
5
+
To understand when to use `torch._export.aot_compile`, please refer to this [section](https://github.com/pytorch/serve/tree/master/examples/pt2#torch_exportaot_compile)
0 commit comments