Description
Describe the bug
When deploying the MCP server in a Kubernetes environment with gunicorn's multi-process configuration, SSE connections disconnect after a period of time, resulting in subsequent messages receiving a 404 error "Could not find session for ID". This occurs because SSE sessions are created in one worker process, but subsequent requests may be routed to different worker processes where the session state is not shared.
Steps to reproduce
- Deploy an MCP server in a Kubernetes environment using gunicorn
- Configure gunicorn with multiple workers (
workers > 1
) - Connect a client to the MCP server and establish an SSE connection (initial connection succeeds)
- Send the first message (successfully processed, returns 202)
- Wait a few seconds or try to send a second message
- Receive error:
WARNING mcp.server.sse Could not find session for ID: xxx-xxx-xxx
Logs
[2025-04-15 19:26:26 +0800] [32] [INFO] connection open
127.0.0.1:48270 - "GET /mcps/sse HTTP/1.1" 200
127.0.0.1:48280 - "POST /mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2 HTTP/1.1" 202
127.0.0.1:48280 - "POST /mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2 HTTP/1.1" 202
127.0.0.1:48280 - "POST /mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2 HTTP/1.1" 202
127.0.0.1:48280 - "POST /mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2 HTTP/1.1" 202
127.0.0.1:48280 - "POST /mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2 HTTP/1.1" 202
127.0.0.1:39380 - "POST /mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2 HTTP/1.1" 202
[2025-04-15 19:26:47 +0800] [32] [WARNING] mcp.server.sse Could not find session for ID: cb7ed84c-8f2f-4109-b571-2f9fb025a5c2
127.0.0.1:53124 - "POST /mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2 HTTP/1.1" 404
[2025-04-15 19:26:47 +0800] [32] [ERROR] mcp.client.sse Error in post_writer: Client error '404 Not Found' for url 'http://localhost:8000/mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2'
Expected behavior
All messages, whether initial or subsequent, should be processed normally without any session not found errors. Even when using multiple workers, SSE sessions should maintain continuous connections.
Environment information
- Deployment environment: Kubernetes
- Web server: gunicorn + uvicorn.workers.UvicornWorker
- Session ID handling: Server converts UUIDs without hyphens to the hyphenated format
- Local environment (running directly with uvicorn, single process) does not exhibit this issue
Reproduction conditions
- Using gunicorn with multiple workers (
workers > 1
) - Proxying multiple services (using ASGI routing)
- Each worker maintains its own session state, with no session sharing between processes
Solution
I resolved this issue by setting the worker count to 1:
# Before
workers=$(nproc --all) # Use all available CPU cores
# After
workers=1 # Use single worker to maintain session consistency
However, this is not an ideal solution as it limits the service's scalability.
Suggested improvements
- Implement distributed session storage (e.g., Redis) in the MCP server to allow multiple workers to share session state
- Document the MCP server's session management limitations in multi-process environments
- Provide session recovery/reconnection mechanisms to handle session disconnections
Additional context
- Server code (
session.py
) shows that session state is stored in process memory - In our implementation, we solved the problem by separating the MCP service from the main application, deploying it independently with a single worker
- Regular ping operations can mitigate the issue but cannot completely solve the session state sharing problem in multi-process environments
Potential solution approach
Could the MCP server be modified to add a session storage abstraction layer that allows users to configure different session storage backends (memory, Redis, file, etc.) to support distributed deployments?