-
Notifications
You must be signed in to change notification settings - Fork 369
T7348: Add config CPU thread-count for accel-ppp services #4499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: current
Are you sure you want to change the base?
Conversation
Accel-ppp services should not use all CPU cores to process requests. At the moment accel-ppp services use all available CPU cores to process requests from the subscribers (establish/update session/etc). During mass connection of sessions, this can lead to the fact that it utilizes all CPU, and for other services like FRR, there is not enough CPU time to process their own stable work. services: - L2TP - SSTP - PPPoE - IPoE - PPtP Add this option configurable and use all cores if not set: ``` set service pppoe-server thread-count '2' ```
👍 |
CI integration 👍 passed! Details
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in today's meeting, I do not believe this is a good approach.
The root cause lies in the fact that, when thousands of clients connect, the control-plane CPU becomes heavily loaded. By default, every available CPU core is assigned to Accel-PPP, which causes bgpd to be restarted by watchfrr due to missed deadlines — the process is not scheduled in time to respond appropriately.
Introducing another CLI option to let users manually tweak or mask this behavior — assuming they have the necessary expertise — is, in my opinion, not the right solution. Instead, to safeguard all control-plane processes, we should consider configuring CPUQuota=80% in the systemd service file. This ensures that Accel-PPP receives sufficient CPU resources when needed, but prevents it from monopolizing the CPU, leaving room for other essential control-plane processes to be scheduled.
CPUQuota=
This setting governs the CPU controller under the unified cgroup hierarchy. With this in place, Accel-PPP will spawn multiple child processes to handle requests, each constrained to a percentage of CPU time — similar to how Kubernetes enforces CPU limits per pod. Even if CPU cycles are idle, these limits ensure fair resource allocation.
Regarding VPP and isolcpus: if a user employs VPP with the isolcpus=
kernel option, those CPUs are excluded from the general Linux scheduler and require explicit task binding — which VPP handles. Meanwhile, Accel-PPP remains on the "normal" Linux control-plane cores. This allows us to meet both demands: fast dataplane processing through VPP and reliable control-plane operation (e.g., BRAS session handling, BGP/OSPF/IS-IS routing) with adequate CPU headroom.
Even if I have quotas, how do I ask accel-ppp not to use all CPUs but only 2 or 4 cores? Before it was limited by half of CPUs but was replaced here 6927c0b I disagree with a limit of 80% across all cores by default. This can cause throttling under load, introducing latency and connection delays if accel-ppp uses more CPU than allowed. When multiple processes are sharing the same cores, the operating system needs to frequently switch between them. This context switching introduces some overhead, which can become more noticeable under heavy load. Each core switch requires fetching data from RAM (~100ns latency vs. ~10ns for L3 cache). 8 cores with 50% usage vs 4 cores with full CPU cycle is not the same If it is not acceptable, it is better to close the PR or revert to the old behavior 6927c0b |
Change summary
Accel-ppp services should not use all CPU cores to process requests. At the moment accel-ppp services use all available CPU cores to process requests from the subscribers (establish/update session/etc). During mass connection of sessions, this can lead to the fact that it utilizes all CPU, and for other services like FRR, there is not enough CPU time to process their own stable work.
Services:
Add this option as configurable and use all cores if not set:
Types of changes
Related Task(s)
Related PR(s)
How to test / Smoketest result
Configure
thread-count
and check this value:Smoketest:
Checklist: