Concurrent Client Requests Cause Connection Errors #1524

gittb · 2024-06-12T21:25:09Z

gittb
Jun 12, 2024

Good afternoon, before making a bug report, I wanted to make sure I am not configuring the server incorrectly.

When another client tries to query the chat endpoint while the server is generating for another request both clients error out with:
openai.APIConnectionError: Connection error. or httpx.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)

The behavior I am expecting is that client A's request is finished and client B waits until client A's completion, then gets served.

I am currently using the following JSON file to configure:

{
    "host": "0.0.0.0",
    "port": 8000,
    "models": [
        {
            "model": "/large_models/example.gguf",
            "model_alias": "qwen-72b-int-q6",
            "chat_format": "chatml-function-calling",
            "n_gpu_layers": -1,
            "offload_kqv": true,
            "n_threads": 4,
            "n_batch": 1024,
            "n_ctx": 8192
        }
    ],
    "api_key": "example_key",
    "interrupt_requests": false,
    "disable_ping_events": false
}

The server is run with: python llama_cpp.server --config example.json

The server is a Ubuntu 24.04 near fresh install with 3090 GPUs.

Am I encountering a bug in my expectations/configuration or an error in the code?

Thanks a bunch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrent Client Requests Cause Connection Errors #1524

{{title}}

Replies: 0 comments

Select a reply

Concurrent Client Requests Cause Connection Errors #1524

gittb Jun 12, 2024

Replies: 0 comments

gittb
Jun 12, 2024