Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic workers do not propagate /keys/upload correctly #18146

Open
Stogas opened this issue Feb 10, 2025 · 1 comment
Open

Generic workers do not propagate /keys/upload correctly #18146

Stogas opened this issue Feb 10, 2025 · 1 comment

Comments

@Stogas
Copy link

Stogas commented Feb 10, 2025

Description

With a multi-worker setup (we use 1 main worker, 3 generic workers and 3 sync workers), uploading keys via /keys/upload is successful, but /sync requests (which reach a different worker) still respond with:

"device_one_time_keys_count": {
        "signed_curve25519": 0
    },

However, if the ingress load-balancer is reconfigured to route /keys/upload to the main worker, the /sync responses start to work correctly.


As the documentation lists /keys/upload as available routes for generic workers, and further states:

Synapse 1.72 and older: if handling the ^/_matrix/client/v3/keys/upload endpoint, the HTTP URI for the main process (worker_main_http_uri). This config option is no longer required and is ignored when running Synapse 1.73 and newer.

This leads me to believe and understand that with 1.118, generic workers should handle /keys/upload correctly without any specific configuration.

Since this doesn't work, but works if these requests are routed to the main worker, I believe this is a bug.

Steps to reproduce

  1. Set up a multi-worker synapse deployment (main + generic + sync workers)
  2. Configure load-balancing to route /keys/upload to generic workers
  3. Use a Matrix client that uploads one-time keys via /keys/upload, by being routed to a generic worker
  4. Make a /sync request and check the count of known one-time keys by being routed to a sync worker
  5. Observe that the count of keys is incorrectly zero
  6. Reconfigure /keys/upload to be routed to the main worker
  7. Perform steps 3 and 4 again
  8. Observe the response from /sync to now be correct

Homeserver

Private homeserver

Synapse Version

1.118.0

Installation Method

Other (please mention below)

Database

bitnami/postgresql:15.4.0-debian-11-r45, single server, no restore or port

Workers

Multiple workers

Platform

Kubernetes v1.28.9, Hetzner Cloud VMs.

Deployed via helm/ananace-charts/matrix-synapse chart version 3.10.0

Routing via Traefik and HAProxy (for specific Synapse worker load-balancing)

Configuration

Helm chart relevant values used:

      generic_worker:
        enabled: true
        generic: true
        replicaCount: 3
      synchrotron:
        enabled: true
        generic: true
        replicaCount: 3
        listeners: [client]
        csPaths:
          - "/_matrix/client/(v2_alpha|r0|v3)/sync$"
          - "/_matrix/client/(api/v1|v2_alpha|r0|v3)/events$"
          - "/_matrix/client/(api/v1|r0|v3)/initialSync$"
          - "/_matrix/client/(api/v1|r0|v3)/rooms/[^/]+/initialSync$"

Custom HAProxy load-balancer config, as per documentation:


    global
      log stdout format raw local0
      maxconn 10240
    defaults
      log global
      mode http
      option httplog
      option dontlognull
      timeout connect 30s
      timeout client 50s
      timeout server 50s
      option forwardfor except 127.0.0.1
    resolvers kubernetes
      nameserver dns1 10.208.0.10:53 # Change if kube-dns svc IP changes
      resolve_retries       3
      timeout resolve       5s
      timeout retry         5s
      hold valid            10s

    # Backend for synapse synchrotron workers with dynamic server selection

    backend synapse_sync
        balance hdr(x-mxid-localpart)
        http-request set-header x-mxid-localpart %[var(txn.mxid_localpart)]
        hash-type consistent
        option httpchk GET /health
        option log-health-checks
        retries 3
        server-template srv 10 _listener._tcp.synapse-sync-headless.matrix.svc.cluster.local resolvers kubernetes resolve-prefer ipv4 check

    # Backend for synapse generic workers with dynamic server selection

    backend synapse_generic
        balance roundrobin
        option httpchk GET /health
        option log-health-checks
        retries 3
        server-template srv 10 _listener._tcp.synapse-generic-worker-headless.matrix.svc.cluster.local resolvers kubernetes resolve-prefer ipv4 check
    # Backend for synapse generic workers with the same server selection for a URL

    backend synapse_generic_uri
        balance uri whole
        hash-type consistent
        option httpchk GET /health
        option log-health-checks
        retries 3
        server-template srv 10 _listener._tcp.synapse-generic-worker-headless.matrix.svc.cluster.local resolvers kubernetes resolve-prefer ipv4 check

    # Backend for main synapse worker as a fallback if no others match

    backend synapse_main
        option httpchk GET /health
        server main_server synapse-matrix-synapse.matrix.svc.cluster.local:8008 resolvers kubernetes resolve-prefer ipv4 check

    # Frontend configuration

    frontend http-in
        bind *:8080

        # Extract access token from URL parameter
        http-request set-var(txn.accesstoken_from_urlparam) url_param(access_token)
        # Extract username from access token in URL parameter using regex
        #http-request set-var(txn.username_from_urlparam) url_param(access_token),regsub("^syt_(.*?)_.*", "\1")

        # Extract access token from Authorization header
        #http-request set-var(txn.username_from_authorization) req.hdr(Authorization),regsub("^Bearer syt_(.*?)_.*", "\1")
        http-request set-var(txn.username_from_authorization) req.hdr(Authorization)

        # Set the final username based on Authorization header or URL parameter
        http-request set-var(txn.mxid_localpart) var(txn.username_from_authorization) if { var(txn.username_from_authorization) -m found }
        #http-request set-var(txn.mxid_localpart) var(txn.username_from_urlparam) if !{ var(txn.username_from_authorization) -m found } { var(txn.username_from_urlparam) -m found }
        http-request set-var(txn.mxid_localpart) var(txn.accesstoken_from_urlparam) if !{ var(txn.username_from_authorization) -m found } { var(txn.accesstoken_from_urlparam) -m found }

        # Define ACLs for URL path matching
        acl is_sync_path path_reg ^/_matrix/client/(r0|v3)/sync$
        acl is_events_path path_reg ^/_matrix/client/(api/v1|r0|v3)/events$
        acl is_initial_sync_path path_reg ^/_matrix/client/(api/v1|r0|v3)/initialSync$
        acl is_rooms_initial_sync_path path_reg ^/_matrix/client/(api/v1|r0|v3)/rooms/[^/]+/initialSync$

        # Define ACL for 'since' parameter presence
        acl has_since_param url_param(since) -m found

        # Federation requests
        acl generic_all path_reg ^/_matrix/federation/v1/event/
        acl generic_all path_reg ^/_matrix/federation/v1/state/
        acl generic_all path_reg ^/_matrix/federation/v1/state_ids/
        acl generic_all path_reg ^/_matrix/federation/v1/backfill/
        acl generic_all path_reg ^/_matrix/federation/v1/get_missing_events/
        acl generic_all path_reg ^/_matrix/federation/v1/publicRooms
        acl generic_all path_reg ^/_matrix/federation/v1/query/
        acl generic_all path_reg ^/_matrix/federation/v1/make_join/
        acl generic_all path_reg ^/_matrix/federation/v1/make_leave/
        acl generic_all path_reg ^/_matrix/federation/(v1|v2)/send_join/
        acl generic_all path_reg ^/_matrix/federation/(v1|v2)/send_leave/
        acl generic_all path_reg ^/_matrix/federation/v1/make_knock/
        acl generic_all path_reg ^/_matrix/federation/v1/send_knock/
        acl generic_all path_reg ^/_matrix/federation/(v1|v2)/invite/
        acl generic_all path_reg ^/_matrix/federation/v1/event_auth/
        acl generic_all path_reg ^/_matrix/federation/v1/timestamp_to_event/
        acl generic_all path_reg ^/_matrix/federation/v1/exchange_third_party_invite/
        acl generic_all path_reg ^/_matrix/federation/v1/user/devices/
        acl generic_all path_reg ^/_matrix/key/v2/query
        acl generic_all path_reg ^/_matrix/federation/v1/hierarchy/

        # Inbound federation transaction request
        acl generic_all path_reg ^/_matrix/federation/v1/send/

        # Client API requests
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/createRoom$
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/publicRooms$
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/joined_members$
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/context/.*$
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/members$
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/state$
        acl generic_all path_reg ^/_matrix/client/v1/rooms/.*/hierarchy$
        acl generic_all path_reg ^/_matrix/client/(v1|unstable)/rooms/.*/relations/
        acl generic_all path_reg ^/_matrix/client/v1/rooms/.*/threads$
        acl generic_all path_reg ^/_matrix/client/unstable/im.nheko.summary/summary/.*$
        acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/account/3pid$
        acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/account/whoami$
        acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/devices$
        acl generic_all path_reg ^/_matrix/client/versions$
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/voip/turnServer$
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/event/
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/joined_rooms$
        acl generic_all path_reg ^/_matrix/client/v1/rooms/.*/timestamp_to_event$
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable/.*)/rooms/.*/aliases
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/search$
        acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/user/.*/filter(/|$)
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/directory/room/.*$
        acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/capabilities$
        acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/notifications$

        # Encryption requests
        acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/keys/query$
        acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/keys/changes$
        acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/keys/claim$
        acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/room_keys/
        acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/keys/upload$

        # Registration/login requests
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/login$
        acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/register$
        acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/register/available$
        acl generic_all path_reg ^/_matrix/client/v1/register/m.login.registration_token/validity$
        acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/password_policy$

        # Event sending requests
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/redact
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/send
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/state/
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/(join|invite|leave|ban|unban|kick)$
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/join/
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/knock/
        acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/profile/

        # Account data requests
        # acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/.*/tags
        # acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/.*/account_data

        # Receipts requests
        # acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/rooms/.*/receipt
        # acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/rooms/.*/read_markers

        # Presence requests
        # acl generic_all path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/presence/

        # User directory search requests
        acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/user_directory/search$

        # Pushrules GET request
        acl is_get_method method GET
        acl is_pushrules_path path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/pushrules/

        # ACL for pagination requests, ensuring consistent routing for a given room
        acl generic_pagination_messages path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/messages$

        # Routing logic based on ACLs
        use_backend synapse_sync if is_events_path
        use_backend synapse_sync if is_initial_sync_path
        use_backend synapse_sync if is_rooms_initial_sync_path

        # For sync path, choose backend based on 'since' parameter
        use_backend synapse_sync if is_sync_path has_since_param
        use_backend synapse_sync if is_sync_path !has_since_param

        # Use generic workers backend if it matches the generic routes
        use_backend synapse_generic_uri if generic_pagination_messages
        use_backend synapse_generic if generic_all
        use_backend synapse_generic if is_get_method is_pushrules_path

        # Fallback to main worker backend if no other backend was used
        default_backend synapse_main

Of particular note is the ACL for /keys/upload:

acl generic_all path_reg ^/_matrix/client/(r0|v3|unstable)/keys/upload$

Relevant log output

No relevant logs.

Anything else that would be useful to know?

No response

@Stogas
Copy link
Author

Stogas commented Feb 10, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant