Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HTTP(S) proxies to clients. #1580

Merged
merged 8 commits into from
Feb 1, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/project/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,12 @@ notice.
Backwards-incompatible changes
..............................

.. admonition:: Client connections use SOCKS proxies automatically.
.. admonition:: Client connections use SOCKS and HTTP proxies automatically.
:class: important

If a proxy is configured in the operating system or with an environment
variable, websockets uses it automatically when connecting to a server.
This feature requires installing the third-party library `python-socks`_.
SOCKS proxies require installing the third-party library `python-socks`_.

If you want to disable the proxy, add ``proxy=None`` when calling
:func:`~asyncio.client.connect`. See :doc:`../topics/proxies` for details.
Expand Down
3 changes: 1 addition & 2 deletions docs/reference/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -166,12 +166,11 @@ Client
| Perform HTTP Digest Authentication | ❌ | ❌ | ❌ | ❌ |
| (`#784`_) | | | | |
+------------------------------------+--------+--------+--------+--------+
| Connect via HTTP proxy (`#364`_) | ❌ | | — | ❌ |
| Connect via HTTP proxy | ✅ | | — | ❌ |
+------------------------------------+--------+--------+--------+--------+
| Connect via SOCKS5 proxy | ✅ | ✅ | — | ❌ |
+------------------------------------+--------+--------+--------+--------+

.. _#364: https://github.com/python-websockets/websockets/issues/364
.. _#784: https://github.com/python-websockets/websockets/issues/784

Known limitations
Expand Down
19 changes: 19 additions & 0 deletions docs/topics/proxies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ most common, for `historical reasons`_, and recommended.

.. _historical reasons: https://unix.stackexchange.com/questions/212894/

websockets authenticates automatically when the address of the proxy includes
credentials e.g. ``http://user:password@proxy:8080/``.

.. admonition:: Any environment variable can configure a SOCKS proxy or an HTTP proxy.
:class: tip

Expand Down Expand Up @@ -64,3 +67,19 @@ SOCKS proxy is configured in the operating system, python-socks uses SOCKS5h.

python-socks supports username/password authentication for SOCKS5 (:rfc:`1929`)
but does not support other authentication methods such as GSSAPI (:rfc:`1961`).

HTTP proxies
------------

When the address of the proxy starts with ``https://``, websockets secures the
connection to the proxy with TLS.

When the address of the server starts with ``wss://``, websockets secures the
connection from the proxy to the server with TLS.

These two options are compatible. TLS-in-TLS is supported.

The documentation of :func:`~asyncio.client.connect` describes how to configure
TLS from websockets to the proxy and from the proxy to the server.

websockets supports proxy authentication with Basic Auth.
185 changes: 165 additions & 20 deletions src/websockets/asyncio/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,29 @@
import logging
import os
import socket
import ssl as ssl_module
import traceback
import urllib.parse
from collections.abc import AsyncIterator, Generator, Sequence
from types import TracebackType
from typing import Any, Callable, Literal
from typing import Any, Callable, Literal, cast

from ..client import ClientProtocol, backoff
from ..datastructures import HeadersLike
from ..exceptions import InvalidMessage, InvalidStatus, ProxyError, SecurityError
from ..datastructures import Headers, HeadersLike
from ..exceptions import (
InvalidMessage,
InvalidProxyMessage,
InvalidProxyStatus,
InvalidStatus,
ProxyError,
SecurityError,
)
from ..extensions.base import ClientExtensionFactory
from ..extensions.permessage_deflate import enable_client_permessage_deflate
from ..headers import validate_subprotocols
from ..headers import build_authorization_basic, build_host, validate_subprotocols
from ..http11 import USER_AGENT, Response
from ..protocol import CONNECTING, Event
from ..streams import StreamReader
from ..typing import LoggerLike, Origin, Subprotocol
from ..uri import Proxy, WebSocketURI, get_proxy, parse_proxy, parse_uri
from .compatibility import TimeoutError, asyncio_timeout
Expand Down Expand Up @@ -266,6 +275,16 @@ class connect:
:meth:`~asyncio.loop.create_connection` method) to create a suitable
client socket and customize it.

When using a proxy:

* Prefix keyword arguments with ``proxy_`` for configuring TLS between the
client and an HTTPS proxy: ``proxy_ssl``, ``proxy_server_hostname``,
``proxy_ssl_handshake_timeout``, and ``proxy_ssl_shutdown_timeout``.
* Use the standard keyword arguments for configuring TLS between the proxy
and the WebSocket server: ``ssl``, ``server_hostname``,
``ssl_handshake_timeout``, and ``ssl_shutdown_timeout``.
* Other keyword arguments are used only for connecting to the proxy.

Raises:
InvalidURI: If ``uri`` isn't a valid WebSocket URI.
InvalidProxy: If ``proxy`` isn't a valid proxy.
Expand Down Expand Up @@ -383,16 +402,69 @@ def factory() -> ClientConnection:
if kwargs.pop("unix", False):
_, connection = await loop.create_unix_connection(factory, **kwargs)
elif proxy is not None:
kwargs["sock"] = await connect_proxy(
parse_proxy(proxy),
ws_uri,
local_addr=kwargs.pop("local_addr", None),
)
_, connection = await loop.create_connection(factory, **kwargs)
proxy_parsed = parse_proxy(proxy)
if proxy_parsed.scheme[:5] == "socks":
# Connect to the server through the proxy.
sock = await connect_socks_proxy(
proxy_parsed,
ws_uri,
local_addr=kwargs.pop("local_addr", None),
)
# Initialize WebSocket connection via the proxy.
_, connection = await loop.create_connection(
factory,
sock=sock,
**kwargs,
)
elif proxy_parsed.scheme[:4] == "http":
# Split keyword arguments between the proxy and the server.
all_kwargs, proxy_kwargs, kwargs = kwargs, {}, {}
for key, value in all_kwargs.items():
if key.startswith("ssl") or key == "server_hostname":
kwargs[key] = value
elif key.startswith("proxy_"):
proxy_kwargs[key[6:]] = value
else:
proxy_kwargs[key] = value
# Validate the proxy_ssl argument.
if proxy_parsed.scheme == "https":
proxy_kwargs.setdefault("ssl", True)
if proxy_kwargs.get("ssl") is None:
raise ValueError(
"proxy_ssl=None is incompatible with an https:// proxy"
)
else:
if proxy_kwargs.get("ssl") is not None:
raise ValueError(
"proxy_ssl argument is incompatible with an http:// proxy"
)
# Connect to the server through the proxy.
transport = await connect_http_proxy(
proxy_parsed,
ws_uri,
**proxy_kwargs,
)
# Initialize WebSocket connection via the proxy.
connection = factory()
transport.set_protocol(connection)
ssl = kwargs.pop("ssl", None)
if ssl is True:
ssl = ssl_module.create_default_context()
if ssl is not None:
new_transport = await loop.start_tls(
transport, connection, ssl, **kwargs
)
assert new_transport is not None # help mypy
transport = new_transport
connection.connection_made(transport)
else:
raise AssertionError("unsupported proxy")
else:
# Connect to the server directly.
if kwargs.get("sock") is None:
kwargs.setdefault("host", ws_uri.host)
kwargs.setdefault("port", ws_uri.port)
# Initialize WebSocket connection.
_, connection = await loop.create_connection(factory, **kwargs)
return connection

Expand Down Expand Up @@ -499,9 +571,9 @@ async def __await_impl__(self) -> ClientConnection:
else:
raise SecurityError(f"more than {MAX_REDIRECTS} redirects")

except TimeoutError:
except TimeoutError as exc:
# Re-raise exception with an informative error message.
raise TimeoutError("timed out during handshake") from None
raise TimeoutError("timed out during opening handshake") from exc

# ... = yield from connect(...) - remove when dropping Python < 3.10

Expand Down Expand Up @@ -645,14 +717,87 @@ async def connect_socks_proxy(
raise ImportError("python-socks is required to use a SOCKS proxy")


async def connect_proxy(
def prepare_connect_request(proxy: Proxy, ws_uri: WebSocketURI) -> bytes:
host = build_host(ws_uri.host, ws_uri.port, ws_uri.secure, always_include_port=True)
headers = Headers()
headers["Host"] = build_host(ws_uri.host, ws_uri.port, ws_uri.secure)
if proxy.username is not None:
assert proxy.password is not None # enforced by parse_proxy()
headers["Proxy-Authorization"] = build_authorization_basic(
proxy.username, proxy.password
)
# We cannot use the Request class because it supports only GET requests.
return f"CONNECT {host} HTTP/1.1\r\n".encode() + headers.serialize()


class HTTPProxyConnection(asyncio.Protocol):
def __init__(self, ws_uri: WebSocketURI, proxy: Proxy):
self.ws_uri = ws_uri
self.proxy = proxy

self.reader = StreamReader()
self.parser = Response.parse(
self.reader.read_line,
self.reader.read_exact,
self.reader.read_to_eof,
include_body=False,
)

loop = asyncio.get_running_loop()
self.response: asyncio.Future[Response] = loop.create_future()

def run_parser(self) -> None:
try:
next(self.parser)
except StopIteration as exc:
response = exc.value
if 200 <= response.status_code < 300:
self.response.set_result(response)
else:
self.response.set_exception(InvalidProxyStatus(response))
except Exception as exc:
proxy_exc = InvalidProxyMessage(
"did not receive a valid HTTP response from proxy"
)
proxy_exc.__cause__ = exc
self.response.set_exception(proxy_exc)

def connection_made(self, transport: asyncio.BaseTransport) -> None:
transport = cast(asyncio.Transport, transport)
self.transport = transport
self.transport.write(prepare_connect_request(self.proxy, self.ws_uri))

def data_received(self, data: bytes) -> None:
self.reader.feed_data(data)
self.run_parser()

def eof_received(self) -> None:
self.reader.feed_eof()
self.run_parser()

def connection_lost(self, exc: Exception | None) -> None:
self.reader.feed_eof()
if exc is not None:
self.response.set_exception(exc)


async def connect_http_proxy(
proxy: Proxy,
ws_uri: WebSocketURI,
**kwargs: Any,
) -> socket.socket:
"""Connect via a proxy and return the socket."""
# parse_proxy() validates proxy.scheme.
if proxy.scheme[:5] == "socks":
return await connect_socks_proxy(proxy, ws_uri, **kwargs)
else:
raise AssertionError("unsupported proxy")
) -> asyncio.Transport:
transport, protocol = await asyncio.get_running_loop().create_connection(
lambda: HTTPProxyConnection(ws_uri, proxy),
proxy.host,
proxy.port,
**kwargs,
)

try:
# This raises exceptions if the connection to the proxy fails.
await protocol.response
except Exception:
transport.close()
raise

return transport
10 changes: 8 additions & 2 deletions src/websockets/headers.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,13 @@
T = TypeVar("T")


def build_host(host: str, port: int, secure: bool) -> str:
def build_host(
host: str,
port: int,
secure: bool,
*,
always_include_port: bool = False,
) -> str:
"""
Build a ``Host`` header.

Expand All @@ -53,7 +59,7 @@ def build_host(host: str, port: int, secure: bool) -> str:
if address.version == 6:
host = f"[{host}]"

if port != (443 if secure else 80):
if always_include_port or port != (443 if secure else 80):
host = f"{host}:{port}"

return host
Expand Down
10 changes: 7 additions & 3 deletions src/websockets/http11.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,7 @@ def parse(
read_line: Callable[[int], Generator[None, None, bytes]],
read_exact: Callable[[int], Generator[None, None, bytes]],
read_to_eof: Callable[[int], Generator[None, None, bytes]],
include_body: bool = True,
) -> Generator[None, None, Response]:
"""
Parse a WebSocket handshake response.
Expand Down Expand Up @@ -265,9 +266,12 @@ def parse(

headers = yield from parse_headers(read_line)

body = yield from read_body(
status_code, headers, read_line, read_exact, read_to_eof
)
if include_body:
body = yield from read_body(
status_code, headers, read_line, read_exact, read_to_eof
)
else:
body = b""

return cls(status_code, reason, headers, body)

Expand Down
Loading