Skip to content

Add persistent-drainable option for affinity-mode ingress annotation to support draining sticky server sessions #13480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

gulachek
Copy link

@gulachek gulachek commented Jun 5, 2025

What this PR does / why we need it:

Legacy web applications that have sticky session affinity are likely to store important application information in memory on the web server associated with the session. This stateful nature can increase application performance by reducing the amount of contextual information necessary to be supplied by the browser or loaded from a persistent database to handle a request. This performance comes with a maintenance cost, where the state on the web server needs to be carefully handled so that users are not disrupted during web server maintenance.

In a traditional model, where web server application management is hands on, sys admins can mark web servers in a "maintenance" or "draining" state to coordinate offloading user sessions from the draining web servers to other servers that are available. This may take several additional requests to the draining web server, but eventually the number of sessions on the web server will approach zero. This allows the admin to safely install OS updates, etc, on the server without disrupting users.

When these applications are ported to Kubernetes, there is a challenge. Kubernetes may dynamically mark a pod for deletion. While pods are given a preStop hook to gracefully terminate, if the pods' service is exposed via an ingress-nginx controller, even with affinity-mode: persistent, they will immediately stop receiving additional requests that may be necessary for gracefully migrating user sessions to other pods. This is because ingress-nginx removes Endpoint objects from the set of available upstreams when their Endpoint condition is no longer Ready, or in other words when it is Terminating.

This PR addresses this problem by adding an affinity-mode: persistent-drainable option. I'll paste my updated documentation for the annotation here:

The annotation nginx.ingress.kubernetes.io/affinity-mode defines the stickiness of a session.

  • balanced (default)

    Setting this to balanced will redistribute some sessions if a deployment gets scaled up, therefore rebalancing the load on the servers.

  • persistent

    Setting this to persistent will not rebalance sessions to new servers, therefore providing greater stickiness. Sticky sessions will continue to be routed to the same server as long as its Endpoint's condition remains Ready. If the Endpoint stops being Ready, such when a server pod receives a deletion timestamp, sessions will be rebalanced to another server.

  • persistent-drainable <-- NEW

    Setting this to persistent-drainable behaves like persistent, but sticky sessions will continue to be routed to the same server as long as its Endpoint's condition remains Serving, even after the server pod receives a deletion timestamp. This allows graceful session draining during the preStop lifecycle hook. New sessions will not be directed to these draining servers and will only be routed to a server whose Endpoint is Ready, except potentially when all servers are draining.

This issue has been discussed since 2018 in nginx/kubernetes-ingress#5962, which should provide further motivation for this feature. You can see that many of those involved in the discussion have resorted to using jcmoraisjr/haproxy-ingress which has a drain-support feature.

While haproxy-ingress might be a viable alternative, I would like to use ingress-nginx for my organization. There is a strong community of support, and hopefully as you will see through reviewing the PR, it is already almost all the way there to supporting this drain support feature, which is critical for the success of my organization's web application.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • CVE Report (Scanner found CVE and adding report)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation only

Which issue/s this PR fixes

I didn't find one specifically on this repo, but nginx/kubernetes-ingress#5962 looks like the OP was originally trying to use this ingress controller, which implicitly is an issue on this repo. 😄

How Has This Been Tested?

I ran the automated go, lua, and e2e tests to make sure I didn't break anything, and I added additional go and lua unit tests for the new feature.

To test that this worked end to end, I created a very basic PHP deployment that echo's back the pod's hostname. You can download it from affinity-test.tgz.

You can get it set up with:

make dev-env # start the ingress-nginx test cluster
tar xfvz affinity-test.tgz
kubectl create namespace affinity-test
kubectl apply -k affinity-test

You can then validate that it's running by running curl localhost

curl localhost
# Hostname: php-hostname-echo-747df85784-jwd79

You can tune the affinity-mode in the affinity-test/deployment.yaml file to what you want. I mostly tested between persistent and persistent-drainable.

Below are the general test cases I ran.

1. persistent-drainable sticks sessions while draining

  1. Make sure persistent-drainable is applied
  2. Navigate to localhost in your browser
  3. Confirm the session is sticky by reloading several times and seeing the same hostname echo'ed
  4. Copy the pod's hostname (like php-hostname-echo-747df85784-dvpc8)
  5. Delete the pod with kubectl delete pod <your-pod> -n affinity-test
  6. While the pod is deleting (around 30 sec), continue to refresh your browser and confirm that it's hitting the same pod
  7. After the pod is deleted, confirm that refreshing hits a new pod

2. persistent-drainable avoids draining pods for new sessions

  1. Make sure persistent-drainable is applied
  2. Run curl localhost in your shell
  3. Delete the pod with kubectl delete pod <pod name> -n affinity-test
  4. Run curl localhost (from a new shell) many times while the pod is deleting. Confirm it's never the one you deleted

Note

It's possible that you can run curl before the ingress controller picks up the update, so the first one or two requests may still get routed to the old pod.

3. persistent avoids draining pods for new sessions

  1. Make sure pesrsistent is applied
  2. Run the rest of the steps from test case 2 and confirm that you don't route to the draining pod

4. persistent does not stick sessions while draining

  1. Make sure persistent is applied
  2. Navigate to localhost in your browser
  3. Confirm the session is sticky by reloading several times and seeing the same hostname echo'ed
  4. Copy the pod's hostname (like php-hostname-echo-747df85784-dvpc8)
  5. Delete the pod with kubectl delete pod <your-pod> -n affinity-test
  6. While the pod is deleting (around 30 sec), continue to refresh your browser and confirm that you don't hit the same pod

5. persistent-drainable hits the default-backend after deleting a deployment

  1. Make sure persistent-drainable is applied
  2. Delete the deployment with kubectl delete deployment php-hostname-echo -n affinity-test
  3. Once the pods go down, confirm that curl localhost hits the default backend (the http response will tell you)

Note

While the pods are going down, new upstreams do get traffic sent to the Serving upstreams. This seems acceptable. It seems like a tragic state that the application is in when all of the pods are terminating, and it seems somewhat up in the air with what to do with new traffic. Since the pods are still Serving, it seems ok to continue sending new traffic there until inevitably there are no more pods to serve the new requests.

6. persistent hits the default-backend immediately after deleting a deployment

  1. Make sure persistent is applied
  2. Delete the deployment with kubectl delete deployment php-hostname-echo -n affinity-test
  3. While pods are going down, confirm that curl localhost hits the default backend (the http response will tell you)

7. persistent-drainable warns when all upstreams for a service are draining

  1. Make sure persistent-drainable is applied

  2. Delete the deployment with kubectl delete deployment php-hostname-echo -n affinity-test

  3. Check the logs for the ingress-nginx-controller pod

  4. Confirm you see an entry resembling the following

    W0605 23:23:29.792168      11 controller.go:1138] All Endpoints for Service "affinity-test/php-hostname-echo" are draining.
    

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I've read the CONTRIBUTION guide
  • I have added unit and/or e2e tests to cover my changes.
  • All new and existing tests passed.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 5, 2025
Copy link

linux-foundation-easycla bot commented Jun 5, 2025

CLA Not Signed

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. area/docs labels Jun 5, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gulachek
Once this PR has been reviewed and has the lgtm label, please assign cpanato for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. area/lua Issues or PRs related to lua code labels Jun 5, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @gulachek!

It looks like this is your first PR to kubernetes/ingress-nginx 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/ingress-nginx has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 5, 2025
@k8s-ci-robot k8s-ci-robot requested review from Gacko and strongjz June 5, 2025 22:01
@k8s-ci-robot
Copy link
Contributor

Hi @gulachek. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 5, 2025
Copy link

netlify bot commented Jun 5, 2025

Deploy Preview for kubernetes-ingress-nginx ready!

Name Link
🔨 Latest commit d42a4ec
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-ingress-nginx/deploys/68422bab6ee24800082bc061
😎 Deploy Preview https://deploy-preview-13480--kubernetes-ingress-nginx.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

gulachek added 7 commits June 5, 2025 18:43
This currently behaves exactly like `persistent`
after this commit.
This wires up `getEndpointsFromSlices` to have different selection
modes for querying Endpoint objects. This is necessary so that
`persistent-drainable` can eventually select Endpoints that are
`Serving` instead of only `Ready`.

This added a unit test to demonstrate the selection behavior, and
the wiring up in the controller remains to be done.

The Endpoint type was also updated to have an `IsDraining` field
when the Endpoint is not Ready.
I believe that this is the only place that needs to check for
`Serving` Endpoint objects to implement this feature.

There are 6 total invocations of `getEndpointsFromSlices` in
the repo, all in `controller.go`.

2 of them are relating to `L4Service`. I'm assuming L4 is referring
to the OSI model and these seem to relate to TCP and UDP, which
have no knowledge of HTTP cookies, which `persistent-drainable`
relies on, so these can't be required. (They appear to be for
controller level config instead of ingress rules).

A third is in `getDefaultUpstream`. My understanding is this is for
a fallback service should all of the primary services in the ingress
rules be down. This seems like a fallback service handler and not
useful for `persistent-drainable`.

A fourth one is used in `getBackendServers` when building the
`custom-default-backend-` upstream when no Endpoints are available
for the service. Again, this doesn't seem intended for the primary
application that needs session affinity, so exluding the draining
Endpoints in the default backend seems desirable.

The last two are used in `serviceEndpoints`, which itself is called
in two places, both in `createUpstreams`. One of them is for the
default backend service, which discussed above, should not be
included. The other one is for our main ingress rules and **is** the
one that we want to update.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docs area/lua Issues or PRs related to lua code cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants