T7348: Add config CPU thread-count for accel-ppp services #4499

sever-sever · 2025-05-08T10:53:40Z

Change summary

Accel-ppp services should not use all CPU cores to process requests. At the moment accel-ppp services use all available CPU cores to process requests from the subscribers (establish/update session/etc). During mass connection of sessions, this can lead to the fact that it utilizes all CPU, and for other services like FRR, there is not enough CPU time to process their own stable work.

Services:

L2TP
SSTP
PPPoE
IPoE
PPtP

Add this option as configurable and use all cores if not set:

set service pppoe-server thread-count '2'

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Code style update (formatting, renaming)
Refactoring (no functional changes)
Migration from an old Vyatta component to vyos-1x, please link to related PR inside obsoleted component
Other (please describe): Feature that fixes the bug

Related Task(s)

https://vyos.dev/T7348

Related PR(s)

How to test / Smoketest result

Configure thread-count and check this value:

set service pppoe-server access-concentrator 'vyos-ac'
set service pppoe-server authentication any-login
set service pppoe-server authentication local-users username one password 'one'
set service pppoe-server authentication mode 'local'
set service pppoe-server client-ip-pool FIRST range '100.64.0.0/18'
set service pppoe-server default-pool 'FIRST'
set service pppoe-server gateway-address '100.64.0.1'
set service pppoe-server interface eth0
set service pppoe-server name-server '192.0.2.1'
set service pppoe-server name-server '192.0.2.2'
set service pppoe-server ppp-options disable-ccp
set service pppoe-server thread-count '1'


vyos@r14# cat /run/accel-pppd/pppoe.conf | grep core -B1 -A 2

[core]
thread-count=1

[edit]
vyos@r14# 



vyos@r14# delete service pppoe-server thread-count 
[edit]
vyos@r14# commit
[edit]
vyos@r14# 
[edit]
vyos@r14# cat /run/accel-pppd/pppoe.conf | grep core -B1 -A 2

[core]
thread-count=4

[edit]
vyos@r14#

Smoketest:

vyos@r14:~$ /usr/libexec/vyos/tests/smoke/cli/test_service_pppoe-server.py
test_accel_ipv4_pool (__main__.TestServicePPPoEServer.test_accel_ipv4_pool) ... ok
test_accel_ipv6_pool (__main__.TestServicePPPoEServer.test_accel_ipv6_pool) ... ok
test_accel_limits (__main__.TestServicePPPoEServer.test_accel_limits) ... ok
test_accel_local_authentication (__main__.TestServicePPPoEServer.test_accel_local_authentication) ... ok
test_accel_log_level (__main__.TestServicePPPoEServer.test_accel_log_level) ... ok
test_accel_name_servers (__main__.TestServicePPPoEServer.test_accel_name_servers) ... ok
test_accel_next_pool (__main__.TestServicePPPoEServer.test_accel_next_pool) ... ok
test_accel_ppp_options (__main__.TestServicePPPoEServer.test_accel_ppp_options) ... ok
test_accel_radius_authentication (__main__.TestServicePPPoEServer.test_accel_radius_authentication) ... ok
test_accel_shaper (__main__.TestServicePPPoEServer.test_accel_shaper) ... ok
test_accel_snmp (__main__.TestServicePPPoEServer.test_accel_snmp) ... ok
test_accel_wins_server (__main__.TestServicePPPoEServer.test_accel_wins_server) ... ok
test_pppoe_limits (__main__.TestServicePPPoEServer.test_pppoe_limits) ... ok
test_pppoe_server_accept_service (__main__.TestServicePPPoEServer.test_pppoe_server_accept_service) ... ok
test_pppoe_server_any_login (__main__.TestServicePPPoEServer.test_pppoe_server_any_login) ... ok
test_pppoe_server_authentication_protocols (__main__.TestServicePPPoEServer.test_pppoe_server_authentication_protocols) ... ok
test_pppoe_server_pado_delay (__main__.TestServicePPPoEServer.test_pppoe_server_pado_delay) ... ok
test_pppoe_server_shaper (__main__.TestServicePPPoEServer.test_pppoe_server_shaper) ... ok
test_pppoe_server_vlan (__main__.TestServicePPPoEServer.test_pppoe_server_vlan) ... ok

----------------------------------------------------------------------
Ran 19 tests in 153.131s

OK
vyos@r14:~$

Checklist:

I have read the CONTRIBUTING document
I have linked this PR to one or more Phabricator Task(s)
I have run the components SMOKETESTS if applicable
My commit headlines contain a valid Task id
My change requires a change to the documentation
I have updated the documentation accordingly

Accel-ppp services should not use all CPU cores to process requests. At the moment accel-ppp services use all available CPU cores to process requests from the subscribers (establish/update session/etc). During mass connection of sessions, this can lead to the fact that it utilizes all CPU, and for other services like FRR, there is not enough CPU time to process their own stable work. services: - L2TP - SSTP - PPPoE - IPoE - PPtP Add this option configurable and use all cores if not set: ``` set service pppoe-server thread-count '2' ```

github-actions · 2025-05-08T10:53:55Z

👍
No issues in PR Title / Commit Title

github-actions · 2025-05-08T12:49:41Z

CI integration 👍 passed!

Details

CI logs

CLI Smoketests (no interfaces) 👍 passed
CLI Smoketests (interfaces only) 👍 passed
Config tests 👍 passed
RAID1 tests 👍 passed
TPM tests 👍 passed

c-po

As discussed in today's meeting, I do not believe this is a good approach.

The root cause lies in the fact that, when thousands of clients connect, the control-plane CPU becomes heavily loaded. By default, every available CPU core is assigned to Accel-PPP, which causes bgpd to be restarted by watchfrr due to missed deadlines — the process is not scheduled in time to respond appropriately.

Introducing another CLI option to let users manually tweak or mask this behavior — assuming they have the necessary expertise — is, in my opinion, not the right solution. Instead, to safeguard all control-plane processes, we should consider configuring CPUQuota=80% in the systemd service file. This ensures that Accel-PPP receives sufficient CPU resources when needed, but prevents it from monopolizing the CPU, leaving room for other essential control-plane processes to be scheduled.

CPUQuota= This setting governs the CPU controller under the unified cgroup hierarchy. With this in place, Accel-PPP will spawn multiple child processes to handle requests, each constrained to a percentage of CPU time — similar to how Kubernetes enforces CPU limits per pod. Even if CPU cycles are idle, these limits ensure fair resource allocation.

Regarding VPP and isolcpus: if a user employs VPP with the isolcpus= kernel option, those CPUs are excluded from the general Linux scheduler and require explicit task binding — which VPP handles. Meanwhile, Accel-PPP remains on the "normal" Linux control-plane cores. This allows us to meet both demands: fast dataplane processing through VPP and reliable control-plane operation (e.g., BRAS session handling, BGP/OSPF/IS-IS routing) with adequate CPU headroom.

sever-sever · 2025-05-09T08:24:13Z

As discussed in today's meeting, I do not believe this is a good approach.

The root cause lies in the fact that, when thousands of clients connect, the control-plane CPU becomes heavily loaded. By default, every available CPU core is assigned to Accel-PPP, which causes bgpd to be restarted by watchfrr due to missed deadlines — the process is not scheduled in time to respond appropriately.

Introducing another CLI option to let users manually tweak or mask this behavior — assuming they have the necessary expertise — is, in my opinion, not the right solution. Instead, to safeguard all control-plane processes, we should consider configuring CPUQuota=80% in the systemd service file. This ensures that Accel-PPP receives sufficient CPU resources when needed, but prevents it from monopolizing the CPU, leaving room for other essential control-plane processes to be scheduled.

CPUQuota= This setting governs the CPU controller under the unified cgroup hierarchy. With this in place, Accel-PPP will spawn multiple child processes to handle requests, each constrained to a percentage of CPU time — similar to how Kubernetes enforces CPU limits per pod. Even if CPU cycles are idle, these limits ensure fair resource allocation.

Regarding VPP and isolcpus: if a user employs VPP with the isolcpus= kernel option, those CPUs are excluded from the general Linux scheduler and require explicit task binding — which VPP handles. Meanwhile, Accel-PPP remains on the "normal" Linux control-plane cores. This allows us to meet both demands: fast dataplane processing through VPP and reliable control-plane operation (e.g., BRAS session handling, BGP/OSPF/IS-IS routing) with adequate CPU headroom.

Even if I have quotas, how do I ask accel-ppp not to use all CPUs but only 2 or 4 cores? Before it was limited by half of CPUs but was replaced here 6927c0b
In my opinion, quotas should be a configurable option where needed (SNMP/IPsec/Accel/etc) by another task.

I disagree with a limit of 80% across all cores by default. This can cause throttling under load, introducing latency and connection delays if accel-ppp uses more CPU than allowed.
I need maximum performance for authenticating/processing subscribers per second and the ability to limit it. However, do not use limits for each server/config. It depends.

When multiple processes are sharing the same cores, the operating system needs to frequently switch between them. This context switching introduces some overhead, which can become more noticeable under heavy load.

Each core switch requires fetching data from RAM (~100ns latency vs. ~10ns for L3 cache).

8 cores with 50% usage vs 4 cores with full CPU cycle is not the same

If it is not acceptable, it is better to close the PR or revert to the old behavior 6927c0b

github-actions bot assigned sever-sever May 8, 2025

github-actions bot added the current label May 8, 2025

sever-sever requested review from dmbaturin, c-po and zdc May 8, 2025 11:01

c-po requested changes May 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T7348: Add config CPU thread-count for accel-ppp services #4499

T7348: Add config CPU thread-count for accel-ppp services #4499

sever-sever commented May 8, 2025 •

edited

Loading

github-actions bot commented May 8, 2025 •

edited

Loading

github-actions bot commented May 8, 2025

c-po left a comment •

edited

Loading

sever-sever commented May 9, 2025 •

edited

Loading

T7348: Add config CPU thread-count for accel-ppp services #4499

Are you sure you want to change the base?

T7348: Add config CPU thread-count for accel-ppp services #4499

Conversation

sever-sever commented May 8, 2025 • edited Loading

Change summary

Types of changes

Related Task(s)

Related PR(s)

How to test / Smoketest result

Checklist:

github-actions bot commented May 8, 2025 • edited Loading

github-actions bot commented May 8, 2025

Details

c-po left a comment • edited Loading

Choose a reason for hiding this comment

sever-sever commented May 9, 2025 • edited Loading

sever-sever commented May 8, 2025 •

edited

Loading

github-actions bot commented May 8, 2025 •

edited

Loading

c-po left a comment •

edited

Loading

sever-sever commented May 9, 2025 •

edited

Loading