Cross Region RPC
When Sentry is deployed in a multi-region deployment (like sentry.io) there are many scenarios and workflows where a region silo requires information that is stored in Control Silo. Similarly there are flows where Control Silo operations need to read or mutate state stored in regions.
When data needs to be read, or mutated synchronously, we use Remote Procedure Calls (RPC).
Design Context
Before we get into specifics, it is useful to understand the requirements that the system was designed under. We needed the following attributes in our solution:
- Needs to have consistent parameter and return types in local development, CI and multi-region production.
- Serialization and transport concerns should be hidden from most application logic, as the transport depends on how the application is deployed.
- Strong support for Python typing
- Solid support for schema generation, and validation tooling.
Components of the RPC framework
The cross-region RPC framework is composed of several RPC services, and RPC models. Each RPC service provides a bundle of methods for different product domains. For example, organizations
, users
and integrations
are separate RPC services.
Each RPC service is composed of a service interface and local implementation.
from sentry.hybridcloud.rpc.resolvers import ByOrganizationSlug
from sentry.hybridcloud.rpc import RpcService, regional_rpc_method
from sentry.silo.base import SiloMode
class OrganizationService(RpcService):
key = "organization"
local_mode = SiloMode.REGION
@classmethod
def get_local_implementation(cls) -> RpcService:
from sentry.organizations.services.organization.impl import (
DatabaseBackedOrganizationService
)
return DatabaseBackedOrganizationService()
@regional_rpc_method(
resolve=ByOrganizationSlug(),
return_none_if_mapping_not_found=True
)
@abstractmethod
def get_org_by_slug(
self,
*,
slug: str,
user_id: int | None = None,
) -> RpcOrganizationSummary | None:
...
Service classes act as stubs that define the interface a service will have. In the above example, we see a few important pieces of the RPC machinery:
key
defines the service name that is used in URLs and as a key when building method indexes.local_mode
defines which silo mode this service uses it’s ‘local implementation’. Services can have local implementations in either silo mode.get_local_implementation
is used by the RPC machinery to find the implementation service when the silo mode matches or isMONOLITH
.
RPC methods like get_org_by_slug
must be defined as abstractmethod
and must have either rpc_method
or regional_rpc_method
applied. If a method has local_mode = REGION
it should use regional_rpc_method
with a resolve ‘resolver’. There are several resolvers that accomodate a variety of method call signatures:
ByOrganizationSlug
will extract theorganization_slug
parameter and use it to locate the region usingsentry_organizationmapping
.ByOrganizationId
will extract theorganization_id
parameter and use it to locate the organization’s region usingsentry_organizationmapping
ByRegionName
uses theregion_name
parameter to choose a destination region.ByOrganizationIsAttribute(param)
Will extractorganization_id
fromparam
to resolve the region.
The implementation for get_org_by_slug
looks like:
from sentry.organizations.services.organization.service import OrganizationService
from sentry.organizations.services.organization.model import RpcOrganizationSummary
from sentry.models.organization import Organization, OrganizationStatus
class DatabaseBackedOrganizationService(OrganizationService):
def get_org_by_slug(
self,
*,
slug: str,
user_id: int | None = None,
) -> RpcOrganizationSummary | None:
query = Organization.objects.filter(slug=slug)
if user_id is not None:
query = query.filter(
status=OrganizationStatus.ACTIVE,
member_set__user_id=user_id,
)
try:
return serialize_organization_summary(query.get())
except Organization.DoesNotExist:
return None
RPC method implementations are simple python methods, but there are a few rules that should be followed:
- All parameters and return values must either be simple values, or
RpcModel
instances. This ensures that requests and responses can be serialized. - Response values should be scalar values or
RpcModel
instances. Avoid tuples or lists that are not sequences of objects.
RPC Authentication
All cross-region RPC requests go to a single endpoint /api/0/internal/rpc/:service/:method
The RPC endpoint uses HMAC based signature comparison to validate authenticity. Before a request is made, the request body has an HMAC signature created using a secret that is shared by region and control silo instances.
When an RPC message is received, the signature header is compared with a locally generated HMAC using the receiver’s secret. The request is only processed if the checksums match.
RPC serialization and transport
Cross-Region RPC uses pydantic and openapi to provide typehinted parameter and response values, and fulfill our requirements for schema generation and validation. How RPC methods behave varies based on how the application is configured.
In monolith mode
- RPC methods are invoked synchronously on the implementation service.
In tests
- During tests we emulate as much of the RPC stack as we can.
- Request bodies and responses are serialized as JSON.
- Requests are made with
django.test.Client
to the RPC endpoint.
In siloed mode
- If a service is in the local silo mode, no serialization or network requests are made.
- Request bodies and response are serialized as JSON
- Requests are made using
requests
and the region configuration.
Adding and changing RPC methods
RPC methods must maintain backwards compatibility between deploys. Because deploys are not atomic, and updating all regions takes time, making changes to RPC methods requires additional care.
Adding RPC methods
You cannot add an RPC method and call it in the same pull request or deploy. Doing so can result in callers being updated before the receiving side has been updated. Instead do the following:
- Deploy the new RPC method to all regions.
- Deploy callers to the new RPC method.
Adding parameters to RPC methods
You cannot add new required parameters to existing RPC methods. To add a new parameter to an RPC method do the following:
- Deploy the new parameter with a default value.
- Update all callsites to provide a value for the parameter. Deploy those changes.
- Make the parameter required by removing the default value.
Removing an RPC method
- Remove all callers of the RPC method. Deploy those changes.
- Once there are no callers, the RPC method is safe to remove.
Removing parameters from an RPC method
- Make the parameter optional. Deploy that change.
- Remove all usage of the parameter. Deploy those changes.
- Remove the parameter.
Renaming RPC methods or parameters
Renaming methods and parameters requires adding a new method, shifting callers to the new method and removing the old method.