Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Operator managed quota. #199

Merged
merged 3 commits into from
Aug 30, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions enhancements/operator-quota/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
title: neat-enhancement-idea
authors:
- "@jortel"
reviewers:
- TBD
- "@alicedoe"
approvers:
- TBD
- "@jmontleo"
creation-date: 2024-08-15
last-updated: 2024-08-15
status: implementable
---

# Operator Managed Resource Quota

Enhance the operator to optionally manage a ResourceQuota in the namespace.

## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] User-facing documentation is created

## Summary

Enhance the operator to optionally manage a ResourceQuota in the namespace.
The user may optionally specify a percentage of cluster resources to be utilized by konveyor.
This mainly applies to task pods.

## Motivation

Tasks may be created in batches of thousands. When this happens, the task manager will create
thousands of pods and relies on the node scheduler to run them in the most efficient manner.
This has the following drawbacks:
- Potentially _hogs_ all of the node resources.
- The Task Manager no longer has control over the order in which tasks/pods are executed (other than the order it created all of the pods).
- The `oc get pod` in the namespace produces thousands of entries.
- Puts needless pressure on the k8s controllers and etcd.

Quotas are the Kubernetes mechanism to achieve this.

### Goals

Maximize task manger scheduling features.
Minimize clutter in the namespace.
Be a better citizen on the cluster.

### Non-Goals

## Proposal

Enhance the operator to optionally manage a Resource Quota in the namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that I would be concerned about, but I think there options around this, is this scenario:

  1. Pods are created to MAX of the resource Quota
  2. The HUB pod is killed and now is unable to be scheduled
  3. This leads to konveyor being taken down for some period of time (as the pods complete it will free up space IIUC.

I think one thing to get around this, is have the hub/ui pods be of a higher priority class, and all the task pods be of another priority class, and use the resouce quota on just the task the pods using the resource quota scope of priority clas.

Just a thought, and may be a pre-mature optimization but just a concern.

Copy link
Contributor Author

@jortel jortel Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quota limits the creation of the Pod resource. So, if the hub crashes or is OOM killed, the quota would not impact restarting the container. I think the potentially concerning use case would be related to the operator needing to restart pods. Such as upgrade while tasks are in flight. Or, and edit of the Tackle CR causing the operator to update a deployment.

Some options (not in any order). Though, I think #4 is the simplest/cleanest.

Option 1

Perhaps this could be mitigated by having the operator reconcile in the following order:

  • delete the quota
  • reconcile everything else.
  • reconcile the quota.

This seems fragile and does not address all of the use cases.

Option 2

Priority classes (suggested by @shawn-hurley)
Unfortunately priority classes are cluster scoped and without a default priority class (which does not exist by default), pods without an assigned class default to Zero(0). IIUC, this means that tackle pods will run at higher priority than all other pods in the cluster (including controllers). Also, our operator would need perms to create the cluster-scoped priority classes which I highly doubt could be acceptable. All of this adds up to - we can't use/rely-on priority classes.

$ oc get pc
NAME                      VALUE        GLOBAL-DEFAULT   AGE
system-cluster-critical   2000000000   false            33d
system-node-critical      2000001000   false            33d

Option 3

It is also unfortunate that Quota's don't support labels selectors.

The only option I have considered to address this concern is for task pods (and quota) to be created in another namespace. Example: konveyor-task. This would have the benefits of:

  • Would eliminate clutter in our main in konveyor-tackle.
  • Quota would only affect task pods.

But with the following drawbacks:

  • mainly that dealing with multiple namespaces would be kind of a pain. And, perhaps confusing for users.

Option 4

Perhaps a better option would be to implement the quota in the hub. The hub would list the nodes and calculate the allocatable cpu and memory. Then, (optionally) create task pods within the requested quota also expressed as a % passed to the hub ENVAR. I had a PR for this at one time but discarded it in favor or ResourceQuota. The algorithm/calculation is straight forward. Given the limitations of ResourceQuota, this may be the better way to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if the other suggestions don't work then as described it makes sense, sorry for the wild goose chase. Just thought there as maybe something that we could do.

With that said the usecase I am worried about is an edge case and baring good options (1-3 I agree are out) and I don't want to re-create a scheduler (option 4) then I think what we have is sufficient

Copy link
Contributor Author

@jortel jortel Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we already have a scheduler. Option 4 only proposed to make it a little smarter.

The user would specify a percentage of cluster resources to be utilized. The operator
would list the allocatable resources on cluster nodes and dynamically deploy a
quota. The percentage would be a property on the Tackle CR with a default of 85% (open
to suggestions here). A percentage is used because it's a value that users and define
without knowing anything about node configurations. A percentage will automatically
adjust as nodes are added/removed or reconfigured.

Quotas are routinely use by Development and QE and have been very effective at managing
cluster resources and maximizing the Task manager scheduling features.

## Design Details

### Test Plan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some kind of scale test should be probably defined to prove the quota helps handling higher loads.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we have any upper / lower bounds? 99%, 100% would probably give basically no protection, while 1%, 2% etc. might make it just about impossible to schedule anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps but hard to say. 2% of a huge cluster may be enough. On the flip side, 100% is essentially worthless but not broken.

Copy link
Contributor Author

@jortel jortel Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some kind of scale test should be probably defined to prove the quota helps handling higher loads.

This would be very complicated.


### Upgrade / Downgrade Strategy