Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEP: Firewall #3614

Open
shaneutt opened this issue Feb 13, 2025 · 13 comments
Open

GEP: Firewall #3614

shaneutt opened this issue Feb 13, 2025 · 13 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@shaneutt
Copy link
Member

shaneutt commented Feb 13, 2025

What?

The ability to define firewall rules for ingress L3, L4 and L7 Gateway traffic.

Note: Egress will be considered as well, but may be best as a follow-up.
Note: Route level rules will be considered as well, but might be best as a follow-up.

Why?

Gateways are commonly exposed to the internet, which puts them as risk of attack. Internal networks can become compromised as well. We should provide tools for users to restrict and control access to their Gateways.

Note: If we are aligned that we want to move forward with something, one part of the motivation
we will need to address very clearly in the GEP will be: "why is this Gateway, and not NetworkPolicy"? As we are
keenly aware that some of the use cases are covered by that today. Ultimately it comes down to either declining the
GEP in favor of NetworkPolicy, OR we need to explain exactly why it was insufficient in the alternatives considered
section.

User Stories

  • As a cluster operator I want all ingress traffic for my Gateways to be restricted to my CDN.
  • As a cluster operator I want to be able to block traffic from specific geographical regions.
  • As a cluster operator I want to be able to identify and block malformed HTTP requests before they reach backend applications.
  • As a cluster operator I want to filter/sanitize requests to block XSS attacks before they reach my backend applications.
  • As a cluster operator I want to block all requests from specific user agents OR only allow specific user agents.

Note: CORS has some relation to this. At the time of writing there is GEP-1767 for this, which we will need to keep in mind and ensure we have lots of cross-collaboration with as we progress.

How?

How this would be implemented is going to require a LOT of discussion and consensus building. Please do not focus on the "How?" for now, instead please see if you agree with the "What?" and the "Why?" and most importantly if you agree there should be an upstream standard for this.

Note: I expect implementations may have a variety of ways in which they would implement the specified rules, including (but not limited to):

  • cloud WAF integrations
  • internal WAF integrations
  • NetworkPolicies

However these would be implementation details and ultimately we would need to decide whether we want to prescribe any of them. (Currently a lot of Gateway API implementations implement Gateway by creating a Deployment and a Service, but we don't really prescribe that as the "here's how you start implementing" in our implementers guide today).

So most of the "How?" can wait, but I do think we'll have to be thinking about these specific implementation mechanism possibilities.

Note: I anticipate responses to this issue might be something like "What about Policies with policy attachment??".
I'm open to discussing that when we get to figuring out the how, but for now I really just want to see if we all agree
on the goals of creating upstream standards for these things first.

We do need to be careful about ending up with multiple very obvious and distinct ways to do the same thing, but
ultimately all options are on the table if we can navigate tacit rules like that.

Important!

If you have similar use cases, please prefer to respond with user stories and I'll incorporate them. In general though, please comment at least with support so we know who wants to be a stakeholder!

@shaneutt shaneutt added kind/feature Categorizes issue or PR as related to a new feature. triage/needs-information Indicates an issue needs more information in order to work on it. labels Feb 13, 2025
@shaneutt shaneutt self-assigned this Feb 13, 2025
@shaneutt shaneutt moved this to In Progress in Gateway API Pipeline Feb 13, 2025
@shaneutt shaneutt moved this from In Progress to Triage in Gateway API Pipeline Feb 13, 2025
@howardjohn
Copy link
Contributor

👍 to most of this but I would argue CORS is out of place here. CORS is not a server-side authorization policy really, its more like telling a browser they are authorized to access something which is subtle different than actually allowing a request through.

Though it likely doesn't matter much given CORS just merged recently anyways #3435.

@youngnick
Copy link
Contributor

In the past when we've talked about this, the question we haven't been able to answer is "Why do this at the Gateway level and not the NetworkPolicy level"?

Also, there's a strong crossover with what many vendors may consider paid features (particularly WAF style features). Moving forward with this in upstream says "these are table stakes features and should not be monetised". I'm interested to see what vendors who are building paid-for implementations think of that.

@mlavacca
Copy link
Member

This proposal makes sense to me, and the definition of use cases already looks pretty rich. I'm very interested to see the reaction to Nick's comment above though.

@shaneutt
Copy link
Member Author

@youngnick: In the past when we've talked about this, the question we haven't been able to answer is "Why do this at the Gateway level and not the NetworkPolicy level"?
@mlavacca: I'm very interested to see the reaction to Nick's comment above though.

The original description did point to the awareness that NetworkPolicy is something we need to look at and consider, but I added even more emphasis above in a note that we really need to be certain this is not just an alternative to NetworkPolicy, and have that reasoning very well defined before this could move out of Provisional. When I start the actual GEP this will be one of the first things I address.

@youngnick: Also, there's a strong crossover with what many vendors may consider paid features (particularly WAF style features). Moving forward with this in upstream says "these are table stakes features and should not be monetised". I'm interested to see what vendors who are building paid-for implementations think of that.

Indeed. The original description added some awareness that "cloud WAFs" are expected be in the mix here in a note at the bottom. Currently my thinking is that being in part an effect API front-end for cloud WAF integrations is something we will need to suss out in the motivation and goals of the GEP.

@zac-nixon
Copy link

+1 -- We're rolling our own APIs to define what a Firewall looks like in K8s. Having a standard API to define this would be hugely beneficial.

@shraddhabang
Copy link

+1 -- I agree that establishing upstream standards for this is helpful.

@guicassolato
Copy link
Contributor

This proposal makes me wonder how far is too far for Gateway API. Not saying I see anything fundamentally wrong with it, but I can’t help thinking whether it shouldn’t focus on HTTP-related “firewall rules” first, or indeed bring it all under a single umbrella of a Firewall feature for the sake of improved UX.

Hear me out…

I can picture (long shot) a future where NetworkPolicies target Gateway objects, addressing some of the user stories listed here, such as restricting traffic to a CDN, while keeping the level of abstraction I believe it's implied with this proposal. However, other cases—like handling malformed HTTP requests, blocking XSS attacks, or filtering user agents—would likely be better handled by gateway implementations anyway, making the NetworkPolicy API feel also somewhat out of place.

At the same time, I wonder, should Chiriro or Ana care about any of that?

Could it be that the reason why we spot an overlap with NetworkPolicy API is the leaking of a possible implementation detail, and the preconception that some of these "firewall" user stories belong at L3/L4 while others would typically be addressed at L7? Although justified, maybe this preconception lives more in the heads of the implementers than of the users'.

If all these user stories indeed fall into the same category of "firewall rules", then perhaps a good UX goes through by having a standard, unified feature, provided under a single API. The question therefore Is this a NetworkPolicy API issue or Gateway API issue? is IMO to the users, irrelevant, to the implementations, a matter of which project (which group of implementations) can more easily grow to absorb these concerns. I would lean towards Gateway API here as it already provides functionality at all these layers.

Stretching a little further, could NetworkPolicy be an implementation detail and perhaps reappear in the "How" section-rather than discussed too much in the "Why"? IOW, “to provide better UX, there will be now a 'Firewall' feature that aims to abstract and solve all these traffic management problems, including problems that could otherwise be fixed using NetworkPolicy API. Implementations MAY create 'derivative NetworkPolicy' resources to solve a subset of these problems."

Taken to the extreme, should Gateway API completely swallow NetworkPolicy API then-like in a superset/subset kind of relation? (Rhetorical question.)

My point is: if all all traffic management in Kubernetes can be a Gateway API issue (because ultimately all traffic could pass through a gateway), how can a user intuit when to write a Gateway API resource and when to fallback to a NetworkPolicy resource? Where do we draw the line?

Either there’s no line or we should break this list of user stories down and tackle them at a more granular level, rather then one single “Firewall” feature.

@howardjohn
Copy link
Contributor

My 2c is that Gateway API has to large degree already solved this problem -- by providing a common platform in which to write implementation-specific policies against. Maybe we get some lowest-common-denominator common policy into core, but most of the user stories here are getting pretty deep into implementation-specific areas.

Unlike the rest of the gateway API core, these features are pretty highly specialized, opinionated, and often proprietary. I am not sure there is actually much we need to do beyond give a common language/shape for these implementations to use (which we have done)

@shaneutt
Copy link
Member Author

shaneutt commented Feb 18, 2025

This proposal makes me wonder how far is too far for Gateway API. Not saying I see anything fundamentally wrong with it, but I can’t help thinking whether it shouldn’t focus on HTTP-related “firewall rules” first, or indeed bring it all under a single umbrella of a Firewall feature for the sake of improved UX.

Hear me out…

I can picture (long shot) a future where NetworkPolicies target Gateway objects, addressing some of the user stories listed here, such as restricting traffic to a CDN, while keeping the level of abstraction I believe it's implied with this proposal. However, other cases—like handling malformed HTTP requests, blocking XSS attacks, or filtering user agents—would likely be better handled by gateway implementations anyway, making the NetworkPolicy API feel also somewhat out of place.

At the same time, I wonder, should Chiriro or Ana care about any of that?

Could it be that the reason why we spot an overlap with NetworkPolicy API is the leaking of a possible implementation detail, and the preconception that some of these "firewall" user stories belong at L3/L4 while others would typically be addressed at L7? Although justified, maybe this preconception lives more in the heads of the implementers than of the users'.

If all these user stories indeed fall into the same category of "firewall rules", then perhaps a good UX goes through by having a standard, unified feature, provided under a single API. The question therefore Is this a NetworkPolicy API issue or Gateway API issue? is IMO to the users, irrelevant, to the implementations, a matter of which project (which group of implementations) can more easily grow to absorb these concerns. I would lean towards Gateway API here as it already provides functionality at all these layers.

Stretching a little further, could NetworkPolicy be an implementation detail and perhaps reappear in the "How" section-rather than discussed too much in the "Why"? IOW, “to provide better UX, there will be now a 'Firewall' feature that aims to abstract and solve all these traffic management problems, including problems that could otherwise be fixed using NetworkPolicy API. Implementations MAY create 'derivative NetworkPolicy' resources to solve a subset of these problems."

Taken to the extreme, should Gateway API completely swallow NetworkPolicy API then-like in a superset/subset kind of relation? (Rhetorical question.)

My point is: if all all traffic management in Kubernetes can be a Gateway API issue (because ultimately all traffic could pass through a gateway), how can a user intuit when to write a Gateway API resource and when to fallback to a NetworkPolicy resource? Where do we draw the line?

Either there’s no line or we should break this list of user stories down and tackle them at a more granular level, rather then one single “Firewall” feature.

As suggested in the description, I anticipate the possibility that implementations could employ NetworkPolicy to enact rules. In this case I don't know that it would make the API "out of place", it would just make it part of the implementation details.

For a parallel, note that many Gateway API implementations respond to a Gateway by creating a Deployment and a Service as an implementation detail. I don't see the notion that you could use other APIs in a composable way as a way to provision for the declared intent as precluding the need for a more a more focused way to declare that intent, if we all agree the need goes beyond just one or two implementations.

My 2c is that Gateway API has to large degree already solved this problem -- by providing a common platform in which to write implementation-specific policies against. Maybe we get some lowest-common-denominator common policy into core, but most of the user stories here are getting pretty deep into implementation-specific areas.

Unlike the rest of the gateway API core, these features are pretty highly specialized, opinionated, and often proprietary. I am not sure there is actually much we need to do beyond give a common language/shape for these implementations to use (which we have done)

If you're suggesting that policy attachment as a mechanism may be sufficient to consider this problem solved, I'm open to the suggestion that the "How?" ends up being policy attachment if we all agree that's the best route. I'm dedicated to working through the problem deeply, find how much commonality there really is, and explaining (and documenting) incontrovertibly why that is our decision ultimately.

@guicassolato
Copy link
Contributor

I anticipate the possibility that implementations could employ NetworkPolicy to enact rules.

I didn't get that from the description TBH. I did get the Policy Attachment nod tho.

Anyway, I believe we are in agreement about using/recommending NP as an implementation detail.

By "out of place", I meant I don't believe the L7-related features in this proposal would make it into a hypothetical NP update. Hence, my acknowledgement to Gateway being a better host for it.

I reiterate however my concern of potentially ending up with 2 ways of doing the same thing. If this proposal is to thrive, then I don't see a reason to point to NP as one's primary option to solve any of the gateway firewall use cases anymore. So let's make sure to cover them well. Otherwise, it's just weird.

Thanks for taking the time reading all my rambling anyway. I tend to do that ;P

@shaneutt
Copy link
Member Author

I anticipate the possibility that implementations could employ NetworkPolicy to enact rules.

I didn't get that from the description TBH. I did get the Policy Attachment nod tho.

Good feedback, I changed the language in the description from:

> **Note**: I do expect implementations will want to provision rules with at least:
>
>  * cloud WAF integrations
>  * internal WAF integrations
>  * NetworkPolicies

to

> **Note**: I expect implementations may have a variety of ways in which they would implement the specified rules, including (but not limited to):
>
>  * cloud WAF integrations
>  * internal WAF integrations
>  * NetworkPolicies
>
> However these would be implementation details and ultimately we would need to decide whether we want to prescribe any of them. (Currently a lot of Gateway API implementations implement `Gateway` by creating a `Deployment` and a `Service`, but we don't really prescribe that as the "here's how you start implementing" in our [implementers guide today](https://github.com/kubernetes-sigs/gateway-api/blob/d936258b0ed256e2d4eaa31da01fa703f0e954b0/site-src/guides/implementers.md#gateway)).

Anyway, I believe we are in agreement about using/recommending NP as an implementation detail.
By "out of place", I meant I don't believe the L7-related features in this proposal would make it into a hypothetical NP update. Hence, my acknowledgement to Gateway being a better host for it.

👍

I reiterate however my concern of potentially ending up with 2 ways of doing the same thing. If this proposal is to thrive, then I don't see a reason to point to NP as one's primary option to solve any of the gateway firewall use cases anymore. So let's make sure to cover them well. Otherwise, it's just weird.

Indeed, I've updated the description above with an explicit note about this. Your feedback is very much appreciated 🙇

@shaneutt
Copy link
Member Author

shaneutt commented Feb 19, 2025

We've gotten a lot of positive feedback quickly. It seems that that we at least want to explore this.

/triage accepted

My intention will be to move this forward as an initial provisional GEP PR. The initial PR will focus on establishing and building consensus on the goals and motivations.

/assign @shaneutt

From there we can iterate and look at the options for how it can be done and decide if we feel any of those are solid. Best case scenario we move forward with it, worst case scenario we decide against it and mark it declined, but then at least we can document our reasoning for posterity.

@kubernetes-sigs kubernetes-sigs deleted a comment from k8s-ci-robot Feb 19, 2025
@shaneutt shaneutt added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed triage/needs-information Indicates an issue needs more information in order to work on it. labels Feb 19, 2025
@shaneutt shaneutt moved this from Triage to In Progress in Gateway API Pipeline Feb 19, 2025
@shaneutt shaneutt added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Mar 11, 2025
@shaneutt shaneutt added this to the v1.4.0 milestone Mar 11, 2025
@mpstefan
Copy link

Also, there's a strong crossover with what many vendors may consider paid features (particularly WAF style features). Moving forward with this in upstream says "these are table stakes features and should not be monetised". I'm interested to see what vendors who are building paid-for implementations think of that.

This is an interesting point, though I think it is still very worthwhile to define potentially paid feature functionality upstream in a generic way as "implementation-specific." Even if the implementations have different knobs to turn, they can always add extensions to cover them. If we can identify common functionality that fits a real use case for our personas, I would at least like to see what it might look like.

I see three questions we'll need to answer:

  • Across all of these firewall solutions, are there common features that we can standardize?
  • Do those features make sense in Gateway or NetworkPolicy?
  • What is the user experience going to look like to configure common and proprietary firewall features?

I could see NGINX Gateway Fabric doing both; looking at the Gateway/NetworkPolicy and using those fields if defined, but then overriding/throwing an error if an NGINX App Protect (our Firewall product) policy is specified. That way the user can use the common firewall features if they need basic functionality, but then they will need to switch over to our policy if they need capabilities specific to App Protect. We'd have to define that common list to see how common that is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: In Progress
Development

No branches or pull requests

8 participants