-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document model server compatibility and config options #537
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: liu-cong The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
@@ -61,6 +61,7 @@ nav: | |||
- Getting started: guides/index.md | |||
- Adapter Rollout: guides/adapter-rollout.md | |||
- Metrics: guides/metrics.md | |||
- Supported Model Servers: guides/model-server.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why under guides?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where else do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put it under overview after the implementations section
site-src/guides/model-server.md
Outdated
|
||
## Use Triton with TensorRT-LLM Backend | ||
|
||
You need to specify the metric names when starting the EPP container. Add the following to the `args` of the [EPP deployment](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/296247b07feed430458b8e0e3f496055a88f5e89/config/manifests/inferencepool.yaml#L48). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make this a flag in the helm chart, the flag is the model-server name, and we should be setting those automatically
Co-authored-by: Abdullah Gharaibeh <[email protected]>
Co-authored-by: Abdullah Gharaibeh <[email protected]>
Co-authored-by: Abdullah Gharaibeh <[email protected]>
@liu-cong: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/hold I will wait for the Triton metric PR to be merged. |
Fixes #482
Part of #523