You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Good afternoon, before making a bug report, I wanted to make sure I am not configuring the server incorrectly.
When another client tries to query the chat endpoint while the server is generating for another request both clients error out with: openai.APIConnectionError: Connection error. or httpx.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
The behavior I am expecting is that client A's request is finished and client B waits until client A's completion, then gets served.
I am currently using the following JSON file to configure:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Good afternoon, before making a bug report, I wanted to make sure I am not configuring the server incorrectly.
When another client tries to query the chat endpoint while the server is generating for another request both clients error out with:
openai.APIConnectionError: Connection error.
orhttpx.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
The behavior I am expecting is that client A's request is finished and client B waits until client A's completion, then gets served.
I am currently using the following JSON file to configure:
The server is run with:
python llama_cpp.server --config example.json
The server is a Ubuntu 24.04 near fresh install with 3090 GPUs.
Am I encountering a bug in my expectations/configuration or an error in the code?
Thanks a bunch.
Beta Was this translation helpful? Give feedback.
All reactions