-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API server] use loadbalancer to expose ingress by default #4926
Conversation
Signed-off-by: Aylei <[email protected]>
Signed-off-by: Aylei <[email protected]>
Signed-off-by: Aylei <[email protected]>
Signed-off-by: Aylei <[email protected]>
Signed-off-by: Aylei <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks @aylei!
helm upgrade --install -n $NAMESPACE $RELEASE_NAME skypilot/skypilot-nightly --devel \ | ||
--set ingress.nodePortEnabled=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this also delete the old NodePort service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this command is solely for deleting the old NodePort service
# Use ClusterIP here to disable the LoadBalancer created by nginx. | ||
# If not using NodePort, set this to LoadBalancer. | ||
type: ClusterIP | ||
type: LoadBalancer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, in the future this can directly be set to NodePort for users who want to use nodeport? If so, we should probably remove the ingress.nodePortEnabled right before 0.9.0 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for charts/skypilot/templates/ingress-nodeport.yaml
- that file should be removed before 0.9.0. Good to have a TODO there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think setting ingress-nginx.controller.service.*
is recommended after this PR for new deployments. But it requires the similar migration steps for legacy deployments to abandon the additional node port service:
- Enable the nginx NodePort service, since 30050 and 30051 have been used, the new service should pick up new ports;
- Ask clients to migrate to the new server URL (with new ports)
- Set
ingress.nodePortEnabled=false
to delete the old NodePort service
Not sure whether we should keep the additional node port service to simplify upgrades for nightly users, but we can document the above steps and leave the choice to users. It is reasonable to take no action if they are happy with current setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new idea is that we can just error out to user if:
- There is an existing legacy node port service
- and
ingress.nodePortEnabled
is not set
Suggested message: "Service ${serviceName} is deprecated and will be removed in 0.9.0, follow the guide to migrate: ${URL}", in the guide we suggest user to set ingress.nodePortEnabled=true
explicitly first and gradually migrate their users to service managed by ingress-nginx.controller.service.*
and finally set ingress.nodePortEnabled=false
.
Also the message should be updated in 0.9.0 release: "Service ${serviceName} is deprecated and was removed in 0.9.0, migrate the service using the previous helm chart version before upgrade."
This will add some friction to our nightly users that have an API sever chart deployed, but should be okay since the adoption is not that wide.
@romilbhardwaj wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal sounds good to me - essentially we require current users to be explicit about their intent to use NodePort after this PR.
Co-authored-by: Romil Bhardwaj <[email protected]>
Signed-off-by: Aylei <[email protected]>
0111eef
to
f06c659
Compare
Signed-off-by: Aylei <[email protected]>
f06c659
to
cd8ee18
Compare
Signed-off-by: Aylei <[email protected]>
@romilbhardwaj this PR is ready for another look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aylei!
$ NODE_PORT=$(kubectl get svc ${RELEASE_NAME}-ingress-controller-np -n $NAMESPACE -o jsonpath='{.spec.ports[?(@.name=="http")].nodePort}') | ||
$ NODE_IP=$(kubectl get nodes -o jsonpath='{ $.items[0].status.addresses[?(@.type=="ExternalIP")].address }') | ||
$ ENDPOINT=http://${WEB_USERNAME}:${WEB_PASSWORD}@${NODE_IP}:${NODE_PORT} | ||
$ ENDPOINT=$(kubectl get svc ${RELEASE_NAME}-ingress-nginx-controller -n $NAMESPACE -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably retain the ENDPOINT=http://${WEB_USERNAME}:${WEB_PASSWORD}@${NODE_IP}
format so that the output is printed as http://skypilot:[email protected]
and users can directly copy-paste the endpoint in their config.yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea!
:sync: nodeport-tab | ||
|
||
1. Make sure ports 30050 and 30051 are open on your nodes. | ||
.. tab-item:: LoadBalancer (Default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also change the "A Kubernetes cluster with ports 30050 and 30051 available for NodePort services" prerequisite at L15 to "A Kubernetes cluster with LoadBalancer or NodePort service support"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Signed-off-by: Aylei <[email protected]>
close #4920
Motivation of changing the defaults: LoadBalancer (if available) is more stable and secure compared to NodePort, correspondingly better UX and lower support costs.
Note:
LoadBalancer
might be unavailable on some k8s infra, we still make it default because it is the de-facto way to expose L4 endpoint. As an example, nginx-ingress also useLoadBalancer
as default service type https://artifacthub.io/packages/helm/ingress-nginx/ingress-nginxEdit
New behavior for users upgraded from earlier 0.8.0-nightly versions, an error will be raised:
This ensures the warning is acknowledged by users.
Tested (run the relevant ones):
bash format.sh
/smoke-test
(CI) orpytest tests/test_smoke.py
(local)/smoke-test -k test_name
(CI) orpytest tests/test_smoke.py::test_name
(local)/quicktest-core
(CI) orconda deactivate; bash -i tests/backward_compatibility_tests.sh
(local)