Skip to content

feat: Migrate organizations list to control #92442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dcramer
Copy link
Member

@dcramer dcramer commented May 28, 2025

WIP

Here's the planned changes:

  • Add list_organizations RPC method
  • Migrate /api/0/organizations to list_organizations
  • Introduce context for RPC calls that give full auth/tenant information (request.auth/request.is_active_superuser concerns)
  • Utilize control when possible, and fall back to local
  • Introduce concept of degraded responses (metadata in responses that showcase soft errors)

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label May 28, 2025
order_by = "-date_added"
paginator_cls = DateTimePaginator

# return paginate(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would need migrated, shoudl be easy enough

show: Show | None = None,
sort_by: SortBy | None = "date",
# actor specific stuff
actor_user_id: int,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preferably this is a similar construct to 'request.auth' but something all RPC endpoints gain access to - "the current actor"

ALL = "all"


def list_organizations(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broken out into its own module just because the giant impl file is insanity

order_by = "-member_count"
paginator_cls = OffsetPaginator
elif sort_by == "projects":
queryset = queryset.annotate(project_count=Count("project"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirm: are projects on control? i think they should be

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Projects are not available in Control Silo:

class Project(Model):

Similarly, organizations are not fully available in Control Silo either:

class Organization(ReplicatedRegionModel):

We do have a thin replicated version of organizations in Control though (OrganizationMapping)

I don't have the full context on that particular decision, though @markstory could probably weigh in on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Projects are not available in Control Silo:

Projects are organization scoped, and needed to stay in the region for existing transactions and application logic to work. Because all of this logic will need to be region scoped (because of organization) you'll also be able to use projects.

ALL = "all"


def list_organizations(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the API would call this RPC method, and if its fails remotely (e.g. on control), could try/catch onto calling local?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the API would call this RPC method, and if its fails remotely (e.g. on control), could try/catch onto calling local?

That isn't typically how RPC services work. Instead they choose a mode (rpc proxy or local) based on the silo mode that the application is deployed in.

Copy link

codecov bot commented May 28, 2025

❌ 8 Tests Failed:

Tests completed Failed Passed Skipped
26009 8 26001 207
View the top 3 failed test(s) by shortest run time
tests.sentry.api.endpoints.test_organization_index.OrganizationsListTest::test_status_query
Stack Traces | 1.38s run time
#x1B[1m#x1B[.../api/endpoints/test_organization_index.py#x1B[0m:85: in test_status_query
    response = self.get_success_response(qs_params={"query": "status:pending_deletion"})
#x1B[1m#x1B[.../sentry/testutils/cases.py#x1B[0m:616: in get_success_response
    assert_status_code(response, status.HTTP_200_OK)
#x1B[1m#x1B[.../sentry/testutils/asserts.py#x1B[0m:45: in assert_status_code
    assert minimum <= response.status_code < maximum, (
#x1B[1m#x1B[31mE   AssertionError: (500, b'{"detail":"Internal Error","errorId":null}')#x1B[0m
#x1B[1m#x1B[31mE   assert 500 < 201#x1B[0m
#x1B[1m#x1B[31mE    +  where 500 = <Response status_code=500, "application/json">.status_code#x1B[0m
tests.sentry.api.endpoints.test_organization_index.OrganizationsListTest::test_show_all_without_superuser
Stack Traces | 2.13s run time
#x1B[1m#x1B[.../api/endpoints/test_organization_index.py#x1B[0m:56: in test_show_all_without_superuser
    response = self.get_success_response(qs_params={"show": "all"})
#x1B[1m#x1B[.../sentry/testutils/cases.py#x1B[0m:616: in get_success_response
    assert_status_code(response, status.HTTP_200_OK)
#x1B[1m#x1B[.../sentry/testutils/asserts.py#x1B[0m:45: in assert_status_code
    assert minimum <= response.status_code < maximum, (
#x1B[1m#x1B[31mE   AssertionError: (500, b'{"detail":"Internal Error","errorId":null}')#x1B[0m
#x1B[1m#x1B[31mE   assert 500 < 201#x1B[0m
#x1B[1m#x1B[31mE    +  where 500 = <Response status_code=500, "application/json">.status_code#x1B[0m
tests.sentry.api.endpoints.test_organization_index.OrganizationsListTest::test_member_id_query
Stack Traces | 2.47s run time
#x1B[1m#x1B[.../api/endpoints/test_organization_index.py#x1B[0m:99: in test_member_id_query
    response = self.get_success_response(qs_params={"member": 1})
#x1B[1m#x1B[.../sentry/testutils/cases.py#x1B[0m:616: in get_success_response
    assert_status_code(response, status.HTTP_200_OK)
#x1B[1m#x1B[.../sentry/testutils/asserts.py#x1B[0m:45: in assert_status_code
    assert minimum <= response.status_code < maximum, (
#x1B[1m#x1B[31mE   AssertionError: (500, b'{"detail":"Internal Error","errorId":null}')#x1B[0m
#x1B[1m#x1B[31mE   assert 500 < 201#x1B[0m
#x1B[1m#x1B[31mE    +  where 500 = <Response status_code=500, "application/json">.status_code#x1B[0m

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@dcramer dcramer force-pushed the feat/list-organizations-control-endpoint branch from 91f08bc to 9ebb327 Compare May 28, 2025 21:43
@dcramer dcramer force-pushed the feat/list-organizations-control-endpoint branch from 9ebb327 to 46392e1 Compare May 28, 2025 21:46
@dcramer
Copy link
Member Author

dcramer commented May 28, 2025

@markstory is there a reason I cant make this new method be:

list_organizations(organization_service, ...)

And not even add it to the existing mega class?

@dcramer
Copy link
Member Author

dcramer commented May 28, 2025

@markstory how do i know if somethings on control vs region? i see both org/project with @region_silo_model

@GabeVillalobos
Copy link
Member

I'm curious what the intention is behind this change, and how the full implementation will look.

In particular, these parts:

Utilize control when possible, and fall back to local
Introduce concept of degraded responses (metadata in responses that showcase soft errors)

Is the idea to have a control endpoint fan out to all available regions a user has data in, in order to retrieve the full list of organizations that user belongs to?

@dcramer
Copy link
Member Author

dcramer commented May 28, 2025

I'm curious what the intention is behind this change, and how the full implementation will look.

In particular, these parts:

Utilize control when possible, and fall back to local
Introduce concept of degraded responses (metadata in responses that showcase soft errors)

Is the idea to have a control endpoint fan out to all available regions a user has data in, in order to retrieve the full list of organizations that user belongs to?

The intention behind this change is to fix what is a broken API experience today. Right now the API no longer works for any metadata endpoint, as you'd have to query every region silo that might possibly exist, which is particularly a problem for core user and organization-style metadata. This exposed itself in the MCP server when users with DE-only orgs received empty responses.

My original thinking was:

  • This data must live on control, as its sentry metadata [clearly this is not how the world exists, which is a separate problem]
  • We dont want to rely on control being online if we can avoid it, so if control is offline, read only the local data

I might have to revisit that plan. I'm also going to push us to revisit why its structured this way in the first place as it creates a lot of undue complexity, but I'm sure there was a reason.

If you designed this from the ground up you'd have the following system:

  1. a control plane - core system metadata. This would have information like user accounts, organizations, some other stuff.
  2. the region silos

This is kind of what we have, but clearly not all the way given data appears to almost never? live in the control plane.

You'd then design two APIs: a control plane API (e.g. the /organizations endpoint) and the individual tenants APIs (the region silo endpoints). This is generally speaking how you'd want to design any of these tenant based systems, otherwise you have no way to actually identify data without magic knowledge in a system (e.g. I "know" there is DE and US tenants).

We are half way there. Organization responses give you their regionUrl. The problem is those responses cannot be achieved without.. regional responses.

I think the tl;dr is organization needs migrated to control, just like user is.

@GabeVillalobos
Copy link
Member

My original thinking was:

This data must live on control, as its sentry metadata [clearly this is not how the world exists, which is a separate problem]
We dont want to rely on control being online if we can avoid it, so if control is offline, read only the local data

Got it. I think it makes sense to revisit this decision, since this has caused a fair bit of friction in some of our other APIs (organization/subscription provisioning are the most obvious two I can think of). I do want to call out that this may be a considerable amount of work to actually achieve.

In the meantime, we do support a control-silo endpoint that supplies all available region URLs a user has data in via the https://sentry.sentry.io/api/0/users/me/regions/ endpoint as a point of discovery/bridge for client fan-outs, which is what we use in the CLI to list all available orgs. It's not perfect, but it's allowed us to sidestep this concern for now.

ALL = "all"


def list_organizations(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the API would call this RPC method, and if its fails remotely (e.g. on control), could try/catch onto calling local?

That isn't typically how RPC services work. Instead they choose a mode (rpc proxy or local) based on the silo mode that the application is deployed in.

order_by = "-member_count"
paginator_cls = OffsetPaginator
elif sort_by == "projects":
queryset = queryset.annotate(project_count=Count("project"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Projects are not available in Control Silo:

Projects are organization scoped, and needed to stay in the region for existing transactions and application logic to work. Because all of this logic will need to be region scoped (because of organization) you'll also be able to use projects.

{"organization": serialize(org), "singleOwner": org.has_single_owner()}
)

return Response(org_results)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work as RPC services need to serialize their return values into JSON via pydantic models.

order_by=order_by,
on_results=lambda x: serialize(x, request.user),
paginator_cls=paginator_cls,
result = organization_service.list_organizations(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another wrinkle we'll run into with having org list be available on control is that we don't have a way for an endpoint to be partially in control and partially in region. The other methods on this endpoint need to stay in the region because they mutate the organization. Django might not like having two endpoints bound to the same URL as well.

@dcramer
Copy link
Member Author

dcramer commented May 30, 2025

Quick update here - I chatted more with Mark on this. I'm not totally sure what the right direction is right now. I think the outcome is clear to me: things like /organizations should just list all orgs you're a member of - no matter the region.

I have temporarily worked around this in the MCP, but all versions of fixing this in Sentry are a little complex. I think it MUST get fixed, but I also have a hard time saying it MUST get fixed tomorrow.

IMO the best course of action while this is in conversation: what do we want this to function like, write that down, then we work towards it when time permits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants