You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The InferencePool defines a new backend type. Ignoring a few things, this essentially defines a Service but with a different load balancing selection.
Gateway API already provides two mechanisms to augment the behavior of a backend: a filter on the backendRef, or a Policy attachment.
I would argue one of these methods may be more appropriate than defining a new backend type.
A major problem with using the "new backend type" pattern for something like this, IMO is the lack of composability.
To make things simpler, let me change from discussing InferencePool to a strawman backendRef: a RoundRobinBackend type, which controls how to load balance over a set of pods. For example:
kind: RoundRobinBackendspec:
targetPort: 9000podSelector:
app: bar
Now a user has a use case to add TLS to the backend as well. One approach they could take is to build a new backend type: TLSOriginationBackend. But this has a problem:
If the policy has a podSelector as the target, we cannot compose them at all. A user can chose to use only TLS or round robin LB, when they could perfectly reasonably want to use both.
Maybe I make my policies more complex, so RoundRobinBackend can reference Pod|arbitrary backend. This quick gets quite obnoxious. Not only could I end up with RoundRobinBackend<TLSOriginationBackend<....<Pods>>, implementations need to know about all of the types.
This is not hypothetical either: Envoy Gateway has an AIServiceBackend type, and for the implementation of InferencePool they are considering AiServiceBackend pointing to an InferencePool. So you have AIService<InferencePool<Pod>> (ref) (note: I am not involved in Envoy Gateway, so merely a bystander reading the issue).
Another problem with this approach is on implementations. Because we have made InferencePool essentially "Service lite + some other stuff", each controller needs to become a "Service controller lite". Typically, this job is delegated to the EndpointSlice controller in Kubernetes, and all existing gateway API controllers read EndpointSlice to determine the endpoints to include in Service references.
This leaves a few options:
A controller must implement their own InferencePool<->Pod selection. This sounds simple, but, speaking as an implementation that has done this for other resource types, is incredibly challenging to do correctly
A controller can create a Service object behind the scene. This feels very hacky and not like the proper long term solution
I would propose that InferencePool should instead augment an existing backend type (Service). This allows better composition, simplification for controllers, and is a bit more standard pattern seen in the ecosystem. Additionally, users will probably have a long tail of feature requests for new functionality in InferencePool that already exists in Service (like named target ports, publishNotReadyAddresses, etc) which can be avoided.
The InferencePool defines a new backend type. Ignoring a few things, this essentially defines a Service but with a different load balancing selection.
Gateway API already provides two mechanisms to augment the behavior of a backend: a
filter
on the backendRef, or a Policy attachment.I would argue one of these methods may be more appropriate than defining a new backend type.
A major problem with using the "new backend type" pattern for something like this, IMO is the lack of composability.
To make things simpler, let me change from discussing InferencePool to a strawman backendRef: a RoundRobinBackend type, which controls how to load balance over a set of pods. For example:
Now a user has a use case to add TLS to the backend as well. One approach they could take is to build a new backend type:
TLSOriginationBackend
. But this has a problem:Pod|arbitrary backend
. This quick gets quite obnoxious. Not only could I end up withRoundRobinBackend<TLSOriginationBackend<....<Pods>>
, implementations need to know about all of the types.This is not hypothetical either: Envoy Gateway has an AIServiceBackend type, and for the implementation of InferencePool they are considering AiServiceBackend pointing to an InferencePool. So you have
AIService<InferencePool<Pod>>
(ref) (note: I am not involved in Envoy Gateway, so merely a bystander reading the issue).Another problem with this approach is on implementations. Because we have made InferencePool essentially "Service lite + some other stuff", each controller needs to become a "Service controller lite". Typically, this job is delegated to the EndpointSlice controller in Kubernetes, and all existing gateway API controllers read EndpointSlice to determine the endpoints to include in Service references.
This leaves a few options:
I would propose that InferencePool should instead augment an existing backend type (Service). This allows better composition, simplification for controllers, and is a bit more standard pattern seen in the ecosystem. Additionally, users will probably have a long tail of feature requests for new functionality in InferencePool that already exists in Service (like named target ports, publishNotReadyAddresses, etc) which can be avoided.
cc @LiorLieberman @louiscryan @robscott @danehans
The text was updated successfully, but these errors were encountered: