-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow for nightly kubernetes tests #3017
Conversation
|
||
# Check if the CPU cores exceed 2 | ||
if [ $(echo "$cpu" | sed 's/m$//') -gt $ACCEPTABLE_CPU_CORE_USAGE ]; then | ||
echo "✘ Test failed: CPU cores $(echo "$cpu" | sed 's/m$//') for $pod_name exceeded $ACCEPTABLE_CPU_CORE_USAGE" >&2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the context of this PR, why is this an important test? add either a comment here or more details in the PR description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added this in the description
ACCEPTABLE_CPU_CORE_USAGE=2 | ||
DOCKER_IMAGE=pytorch/torchserve-nightly:latest-gpu | ||
|
||
# Get relative path of example dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crazy stuff going on here lol, worth a comment or 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol..added
- cron: '15 6 * * *' | ||
|
||
jobs: | ||
kubernetes-tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there seems to be a lot of code duplication between this workflow and https://github.com/pytorch/serve/blob/master/.github/workflows/kserve_cpu_tests.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is.. Not sure if I want to merge the two. There are testing different tech stacks. KServe tests will also probably get bigger to test OIP. Won't be adding more tests for K8s unless we have a new issue uncovered.
…/serve into feature/k8s_nightly_test
Description
Workflow for running nightly Kubernetes tests
Passing run
In addition to functionality test, this PR also checks the performance in terms of CPU usage in a k8s pod. This is important to prevent CPU throttling in a Kubernetes setup with limits
To do this test, we disable system gpu metrics(because there is higher CPU usage with system GPU metrics enabled) and check the cpu usage when no model is loaded in TorchServe
Fixes #(issue)
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Passing run is attached
Checklist: