help request: WebSocket Load Balancing Imbalance Issue After Upstream Node Scaling

### Description

Issue Description
When using APISIX to proxy WebSocket requests, we've observed that when upstream nodes are scaled out, the load distribution of WebSocket connections becomes unbalanced.

Steps to Reproduce
Configure APISIX to proxy WebSocket requests to backend services
Start with 2 upstream nodes providing service
Establish a large number of WebSocket long connections
Scale out the upstream (e.g., from 2 nodes to 3 or more)
Observe the connection distribution across nodes
Current Behavior
After scaling, new connections are evenly distributed across all nodes, but previously established WebSocket long connections remain concentrated on the original two nodes, resulting in an unbalanced load distribution.

Expected Behavior
After scaling, the system should take into account existing long connections, resulting in a more balanced load distribution across all nodes (including newly added ones).

Root Cause Analysis
Based on observation, the issue appears to be caused by:

APISIX maintains a counter mechanism for load balancing. For example, when there are 2 nodes, each node's counter is initialized to 10000. When downstream nodes are scaled out, APISIX resets all counters, but the previously established WebSocket long connections are not recorded in the new count, causing inaccurate load calculations.

Specifically:

A large number of WebSocket long connections have already been established on the original two nodes
After scaling, counters are reset, and these existing connections are "forgotten" in load balancing decisions
New connections will be evenly distributed, but when combined with existing connections, the overall load distribution is unbalanced
Environment Information
APISIX version: latest
Operating system: liunx
Deployment method: Kubernetes
Additional Information
This issue is particularly noticeable in high-concurrency WebSocket application scenarios, especially when long connections persist for extended periods. We hope the load balancing algorithm can be improved to consider existing long connections when nodes are scaled out.

### Environment

- APISIX version (run `apisix version`): latest
- Operating system (run `uname -a`): liunx
- OpenResty / Nginx version (run `openresty -V` or `nginx -V`): latest
- etcd version, if relevant (run `curl http://127.0.0.1:9090/v1/server_info`): NA
- APISIX Dashboard version, if relevant: NA
- Plugin runner version, for issues related to plugin runners: NA
- LuaRocks version, for installation issues (run `luarocks --version`): NA


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

help request: WebSocket Load Balancing Imbalance Issue After Upstream Node Scaling #12217

Description

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

help request: WebSocket Load Balancing Imbalance Issue After Upstream Node Scaling #12217

Description

Description

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions