Skip to content

GEP 3779 - East/West Identity-Based Authorization #3822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 18, 2025
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 99 additions & 0 deletions geps/gep-3779/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# GEP-3779: Identity Based Authz for East-West Traffic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not sure I understand the relationship to the work going on in NetworkPolicy around this, which has a ~100% overlap but is not mentioned at all in this GEP?

Copy link
Contributor

@mikemorris mikemorris Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've had some preliminary joint discussions with NetPol WG stakeholders, but nothing identity-based is anywhere close to a proposal/KEP state over there AFAIK (AdminNetworkPolicy is still IP-focused).

The last place I remember leaving these discussions was that there was generally a preference for having these as separate layers with no "interleaving" (identity-based policy only applicable if IP-based policy has already allowed the connection) because most implementations - Cilium being the most notable exception - would only be able to or want to support one layer.

There was an open unanswered question on whether any sort of interop/signaling between layers to enable a belt-and-suspenders approach with restrictive IP-based policies would be feasible (or desirable) for use cases like deny-all at L3 but generating policies to allow IP connectivity if identity-based policy would allow the connection - something like mapping ServiceAccounts to pod IPs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to block this GEP, but I think the scenario that comes to my mind if we do strict layering (i.e. NetPol first then identity) looks something like this: if a NetPol provider has an identity concept, I think this GEP means they must make 2 authz passes if the user wants to express something like: "only allow from this CIDR with identity X OR any other CIDR with identity Y". In fact, I don't know how the user effectively authors that policy in a strictly layered API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, +1. thanks Mike.

And I am working (slowly) on pushing more discussions (netpol wg included) for a more broad proposal in this space, no flavored by who/what is implementing it, to allow identity based authorization in Kubernetes.

If something like this ever lands, we'll see how we converge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiorLieberman -- what do you see as the convergence path? Would we have two APIs (which ever one gets completed faster?) or we should merge the two streams?

In my mind, there are separate cases:

  1. Simple case where we are talking about groups of Pods/service accounts talking to each other.
  2. More complex case where we are talking about service endpoints (higher level of abstraction) like a path in an HTTPRoute vs the underlying workload.


* Issue: [#3779](https://github.com/kubernetes-sigs/gateway-api/issues/3779)
* Status: Provisional

(See [status definitions](../overview.md#gep-states).)


## TLDR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that's not clear to me from this GEP is what types of traffic we are targeting.

Currently, GAMMA is specified to only be for HTTP traffic - that is, traffic that is routed using HTTP metadata, which requires the routing agent to have access to the unencrypted HTTP stream. So that rules out TCP, UDP, and TLS handling for GAMMA.

Is this proposal intended to cover other use cases than HTTP traffic? If so, that's a significant expansion of the current GAMMA spec, and we probably need to talk about it separately.

If not, we probably should say that explicitly, maybe via listing non-HTTP protocols in non-goals or something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work within existing GAMMA scope, but should be sufficiently flexible to support more protocols if/when GAMMA allows for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this as a starting point. I hadn't realized the initial GAMMA scope was so narrow, so having an API that could grow beyond that limited scope as needed would be helpful here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Nick's suggestion that we add this as a non-goal for now, but I also agree with Rob's point that there's room for growth here. I suggest we add a non-goal as suggested, and then in "alternatives considered" we add a brief summary about the potential for growth here, but suggest that needs to be a follow-up GEP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other protocols are already standard: https://gateway-api.sigs.k8s.io/geps/gep-1294/?h=xroute

Copy link
Contributor

@mikemorris mikemorris Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, GAMMA is specified to only be for HTTP traffic - that is, traffic that is routed using HTTP metadata, which requires the routing agent to have access to the unencrypted HTTP stream.

I don't think this is accurate?

Is this proposal intended to cover other use cases than HTTP traffic? If so, that's a significant expansion of the current GAMMA spec, and we probably need to talk about it separately.

We've largely focused on HTTP traffic for mesh, but gRPC, TLS, TCP and UDP are defined in GAMMA https://gateway-api.sigs.k8s.io/geps/gep-1294/?h=xroute#route-types (I think John was intending to link this too, just the wrong subheading).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather leave this as is, and make sure we definetly support HTTP traffic. Would prefer to avoid adding to non-goals and have a sufficient flexiblilty to support more protocols if we can. Later iteration of this GEP would obviously explore the implementation in more detail


Provide a method for configuring Gateway API Mesh implementations to enforce east-west identity-based Authorization controls. At the time of writing this we leave Authentication for specific implementation and outside of this proposal scope.


## Goals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any attributes other than identity and port considered?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would mean entering the L7 space (which I think require more research about where we can standardize). So for the first iteration -- no.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest making this clear in the non-goals, and then we can consider this resolved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shane - it is in deffered goals. Think we should add it to Non Goals in addition?


(Using the [Gateway API Personas](../../concepts/roles-and-personas.md))

* A way for Ana the Application Developer to configure a Gateway API for Mesh implementation to enforce authorization policy that **allows** or **denies** identity or multiple identities to talk with some set of the workloads she controls.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many users of k8s see k8s namespace as a security boundary . Is the namespace policy (e.g specifies which identities are allowed to talk to my namespace) a special case of the workload policy?


* A way for Chihiro, the Cluster Admin, to configure a Gateway API for Mesh implementation to enforce non-overridable cluster-wide, authorization policies that **allows** or **denies** identity or multiple identities to talk with some set of the workloads in the cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're introducing hierarchical policies (cluster wide overridable/non-overridable, namespace, workload, port being the scopes i could make out), should we add a goal that Ana should be able to easily and determistically figure out the effective policy for their workload (port)?

feel free to resolve if this doesn't need to be called out or if it doesn't make sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good call. I am not sure this should be a part of this specific proposal. IMO there should be a different effort in gateway to actually help discoverability of policies, and policy attachment.

Will leave this open in case others have other opinions. But its something the community had talked about many times in the past

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the fact that overridable/non-overridable policies gets this a little more complicated to understand effective policy so thats a good point. Will review some of the prior art here, but I dont think it should be part of the first iteration

Copy link
Contributor

@aryan16 aryan16 Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiorLieberman Hierarchical policies are fine, but I am confused about the behavior of non-overridable policies. Can you explain it more?

I was thinking making policies overridable (basically merge all if the policy somehow applies to the workloads).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to stretch goals, we can elaborate more on the community meeting and/or when the we progress with the iterations of this GEP


* A way for both Ana and Chihiro to restrict the scope of the policies they deploy to specific ports.

## Stretch Goals

* A way for Chihiro, the Cluster Admin, to configure a Gateway API for Mesh implementation to enforce default, overridable, cluster-wide, authorization policies that **allows** or **denies** identity or multiple identities to talk with some set of the workloads in the cluster.

## Non-Goals

* Support identity based authorization for north-south traffic or define the composition with this API.

## Deferred Goals

* (Potentially) Support enforcement on attributes beyond identities and ports.

## Introduction

Authorization is positioned as one of core mesh values. Every mesh supports some kind of east/west authorization between the workloads it controls.

Kubernetes core provides NetworkPolicies as one way to do it. Network Policies however falls short in many ways including:

* Network policies leverage labels as identities.
* Labels are mutable at runtime. This opens a path for escalating privileges
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could also be placed under Scale, because watching and reconciling constantly changing labels is less efficient than using static, assigned-on-create identities

* Most implementations of network policies translate labels to IPs, this involves an eventual consistency nature which can and has lea to over permissiveness in the past.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how translating labels to IPs is related to eventual consistency. When you use any CRD to configure policies eventual consistency is involved, and the previous problems with over permissiveness sound more like implementation problems and not API problems


* Scale. Network Policies are enforced using IPs (different selectors in the APIs get translated to IPs). This does not scale well with large clusters or beyond a single cluster

An identity-based authorization API is essential because it provides a structured way to control authorization between identities within the cluster.

### State of the World


| Aspect | Istio | Linkerd | Cilium |
| ----- | ----- | ----- | ----- |
| **Policy CRDs** | `AuthorizationPolicy` (APIs `security.istio.io/v1`) | `AuthorizationPolicy` (CRD `policy.linkerd.io/v1alpha1`), plus supporting CRDs (`Server`, `HTTPRoute`, `MeshTLSAuthentication`) | `CiliumNetworkPolicy` and `CiliumClusterwideNetworkPolicy` (superset of K8s NetworkPolicy) |
| **Identity model** | Identities derived from mTLS peer certificates (bound to SA): <ul><li>SPIFFE-like principal `<trust-domain>/ns/<namespace>/sa/<serviceaccount>`. </li> <li>ServiceAccount name </li> <li>Namespaces</li></ul></br> identity within JWT derived from `request.auth.principal`<br/><br/>IPBlocks and x-forwarded-for ipBlocks | Identities derived from mTLS peer certificates (bound to SA trust domain `identity.linkerd.cluster.local`. Policies reference service accounts or explicit mesh identities (e.g. `webapp.identity.linkerd.cluster.local`). <br/><br/>Policies use `requiredAuthenticationRefs` to reference the entities who get authorization. This is a list of targetRefs and it can include: <ul><li>ServiceAccounts</li> <li>`MeshTLSAuthentication` - which represents a set of mesh identities either with a mesh identities strings or reference to serviceAccounts</li> <li>`NetworkAuthentication` - represents sets of IPs or subnets.</li></ul> |Cilium service mesh can leverage SPIFFE identities in certs that are used for handshake. These SPIFFEE identities are mapped to CiliumIdentities. You can read more about cilium identities in [CiliumIdentity](#CiliumIdentity). <br/><br/>Policies only target abstractions like label selectors, node selectors, CIDR blocks and Cilium predefined [entities](https://docs.cilium.io/en/stable/security/policy/language/#entities-based).|
| **Enforcement** | For Istio with sidecars - a proxy on each pod. For ambient, ztunnel node agent enforces mTLS based L4 authorization, L7 authorization is being enforced in waypoints if any. <br/><br/> Istio supports ALLOW, DENY, CUSTOM (often used for external authorization), and AUDIT. DENY policies in istio's context are used to enforce higher priority deny policies. The allow semantics is that whatever is not allowed explicitly (and assuming there is any policy for the same match) is implicitly denied | Linkerd data-plane proxy (injected into each pod). The proxy enforces policies via mTLS identity checks. <br/><br/> Linkerd supports AUDIT and ALLOW. There is not DENY policies, whats not allowed (and assuming there is any policy for the same match) is implicitly denied. | For L3/4 Ingress Rules, CiliumNetworkPolicy enforcement - an eBPF-based datapath in the Linux kernel on the destination node. If L7 http rules are specified, the packet is redirected for a node-local envoy for further enforcement.<br/><br/>Cilium service mesh also offers a kind of AuthN where a Cilium agent on the src node, is talking to another agent on the destination node, using the pod’s spiffee identities.|
| **Request Match criteria** | Policies can target a group of pods using label selector, a Gateway/Service (this means targeting a waypoint proxy) or a GatewayClass - meaning all the gateways created from this class. Policies without a label selector in a namespace implies the whole namespace is targeted. <br/><br/> Fine-grained L7 and L4 matching: HTTP/gRPC methods, paths, headers, ports, SNI, etc.Policies use logical OR over rules. <br/><br/>All match criterias are inline in the policy. See https://istio.io/latest/docs/reference/config/security/authorization-policy/#Rule-To and https://istio.io/latest/docs/reference/config/security/authorization-policy/#Rule-when | Policies can target: <ul><li>A `Server` which describes a set of pods (using fancy label match expressions), and a single port on those pods.</li> <li>A user can optionally restrict the authorization to a smaller subset of the traffic by targeting an HTTPRoute. (TODO: any plans to support sectionNames?)</li> <li> A namespace - this indicates that the policy applies to all traffic to all Servers and HTTPRoutes defined in the namespace.</li></ul> Note: We leave `ServerAuthorization` outside the scope as it planned to be deprecated (per linkerd website) | Policies can target groups of pods using label selector (`endpointSelector`), or by node-labels (`nodeSelector`). Cilium supports L7 via built-in HTTP parsing: rules can match HTTP methods, paths, Kafka, etc. For example, a CiliumNetworkPolicy can allow only specific HTTP methods/paths on a port. |
| **Default policies and admin policies** | If **no** ALLOW policy matches, traffic is **allowed** by default. You can deploy an overridable - default deny by default by deploying an **allow-nothing** policy in either the namespace or istio-system <br/><br/>AuthorizationPolicies in the `istio-system` namespace apply to the whole mesh and take precedence. These are not overridable by namespace-level policies. | Default inbound policy can be set at install time using `proxy.defaultInboundPolicy`. Supported values are: <ul><li>`all-unauthenticated:` allow all traffic. This is the default.</li> <li>`all-authenticated:` allow traffic from meshed clients in the same or from a different cluster (with multi-cluster).</li> <li>`cluster-authenticated:` allow traffic from meshed clients in the same cluster.</li> <li>`cluster-unauthenticated:` allow traffic from both meshed and non-meshed clients in the same cluster.</li> <li>`deny:` all traffic are denied. </li> <li>`audit:` Same as all-unauthenticated but requests get flagged in logs and metrics.</li> </ul> <br/>Users can override the default policies for namespaces/pods or by setting the [config.linkerd.io/default-inbound-policy](http://config.linkerd.io/default-inbound-policy) annotation There is no support for admin, non overridable policies. | Follows Kubernetes NetworkPolicy semantics by default: if no `CiliumNetworkPolicy` allows the traffic, it is allowed (no implicit deny). Operators must apply explicit deny rules or “default-deny” policies to block traffic. <br/><br/> `CiliumClusterwideNetworkPolicy` exists for admin enforcement.)|


Every mesh vendor has their own API of such authorization. Below we describe the UX for different implementations:

#### Istio
Link: [Istio authorization policy docs](https://istio.io/latest/docs/reference/config/security/authorization-policy/)

TODO

#### Linkerd

Link: [Linkerd authorization policy docs](https://linkerd.io/2-edge/reference/authorization-policy/)

TODO

#### Cilium

##### CiliumIdentity
Cilium has the concept of CiliumIdentity. Pods are assigned identities derived from their Kubernetes labels (namespace, app labels, etc.). Cilium’s policy matches based on these label-derived identities. The CiliumIdentity implementation maps an integer to a group of IP addresses (the pod IPs associated with a group of pods). This “integer” and its mapping to pod IP addresses represents the core identity primitive in Cilium.

More on https://docs.cilium.io/en/stable/internals/security-identities/ & https://docs.cilium.io/en/stable/security/network/identity/


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably also worth reminding everyone that GAMMA has a some implicit authorization policy already - once one or more HTTPRoutes are added to a Service, then anything that does not match something in a HTTPRoute is expected to be denied.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiorLieberman @aryan16 can one of you add a note about this in this GEP?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend not coupling routing and authorization. At least considering a sidecar architecture they are completely different since routing is client-side which cannot be used for security

Copy link
Contributor

@mikemorris mikemorris Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then anything that does not match something in a HTTPRoute is expected to be denied

This might be one of the things that needs clarification as per #3804 - the intent, as John said, is not to imply routing and authorization are linked. If an HTTPRoute is bound a to Service parentRef, Service-targeted traffic (like GET foo.local) not matching a route shouldn't be routed to the backend, but this is not intended to imply AuthZ controls that would prevent e.g. connections dialing the pod IP directly from succeeding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am +1 to @howardjohn and @mikemorris comments.

@robscott @youngnick - any concerns if we leave this outside?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the follow ups everyone!

## API


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an API being proposed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's coming in the next phase - x-ref: #3824


## Conformance Details


#### Feature Names


### Conformance tests


## Alternatives


## References
19 changes: 19 additions & 0 deletions geps/gep-3779/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: internal.gateway.networking.k8s.io/v1alpha1
kind: GEPDetails
number: 3779
name: Identity Based Authz for east-west traffic
status: Provisional
# Any authors who contribute to the GEP in any way should be listed here using
# their GitHub handle.
authors:
- liorlieberman
- aryan16
# references is a list of hyperlinks to relevant external references.
# It's intended to be used for storing GitHub discussions, Google docs, etc.
references: {}
# featureNames is a list of the feature names introduced by the GEP, if there
# are any. This will allow us to track which feature was introduced by which GEP.
featureNames: {}
# changelog is a list of hyperlinks to PRs that make changes to the GEP, in
# ascending date order.
changelog: {}