Modernizing GraphQL at scale: How Pantheon implemented secure, scalable federation with Grafbase

Pantheon, a leader in website operations for WordPress, Drupal and Next.js sites, had been relying on a custom-built Apollo Server gateway to power its GraphQL Federation for years. The system worked until it didn’t. As more teams adopted GraphQL and the number of GraphQL servers grew, changes became fragile, collaboration broke down, and operational overhead ballooned.

To move faster and operate more reliably, the team turned to Grafbase.

Pantheon had run a custom-built Apollo Server gateway for years. It federated multiple domain-aligned subgraphs, but growth exposed several limits:

  • Manual schema stitching and fragile change management.
  • Minimal observability and limited OpenTelemetry exports.
  • Repeated, inconsistent implementations of auth and rate limiting across teams.
  • Steep learning curve: only a few engineers understood the full system.

We were spending entire sprints just making safe schema changes. Everyone had to understand how the whole system was stitched together. It was not scalable.
Principal Software Engineer Alejandro Baez who leads GraphQL architecture at Pantheon

Pantheon is not in the business of building GraphQL infrastructure, so engineering time spent on platform maintenance was time lost on delivering customer value.

Pantheon case study illustration

After a thorough evaluation, Pantheon chose Grafbase for several concrete capabilities that mapped directly to their pain points.

  • Native support for the Apollo Federation v2 spec.
  • Self-hosted air-gapped deployment options suitable for enterprise security models.
  • Flexible extension system that lets teams add or customize behavior without forks in core code.
  • Built-in observability, operational controls, and policy hooks,
  • A responsive engineering team able to collaborate on migration details.

Pantheon took an incremental path instead of a flag day cutover. The team started with the core GraphQL service that fronts customer and provisioning data. This service sits at the center of Pantheon’s domain model.

The core implementation took about a month or two. From there, we built extensions, onboarded new subgraphs, and began migrating old services one by one
Principal Software Engineer Alejandro Baez

Key steps in the rollout included:

  1. Migrate the core service to Grafbase.
  2. Build extensions for authorization and JWT identity integration.
  3. Launch net new subgraphs directly on Grafbase.
  4. Incrementally migrate legacy subgraphs, cleaning up schemas as they move.

Grafbase support for multiple sandboxed instances was important. Teams could validate schema and infrastructure changes in isolation, then promote through staging to production with no cross-team disruption.

Pantheon now runs Grafbase as the federated gateway for its GraphQL estate. Federation v2 structures subgraphs by domain and ownership, including application, dashboard, and authorization-focused teams.

Collaboration flows through schema pull requests. Teams propose changes, attach the required permissions logic, and trigger automated validation before merge. They now use custom directives to scope authorization rules at the field level, which enables a more granular and maintainable way to enforce access control.

Schema evolution lifecycle:

  • Team proposes schema change by PR.
  • Authorization and other policies applied as directives.
  • Automated validation checks compatibility and policy coverage.
  • Independently deployable builds promote through environments.

Result: changes that once took 2-4 weeks now land in less than a day.

Grafbase’s extension framework is central to the Pantheon deployment. Teams write extensions in Rust and compile to WebAssembly.

Implemented and customized extensions include:

  • Permissions directive: attaches access scopes per field and routes to Pantheon identity services.
  • JWT auth (forked): adapted to Pantheon’s legacy identity model for backward compatibility.
  • Federated subscriptions via NATS: built on a customized Grafbase NATS extension to support live platform events.
  • Rate limiting and operation complexity controls: tuned per subgraph and user type to guard shared infrastructure.

Connecting Grafbase to Pantheon’s existing telemetry stack was an early milestone. OpenTelemetry exporters feed traces into Grafana, giving full request visibility from client through the gateway into downstream systems.

The ability to follow a request across the full stack is huge. We can see where latency comes from, whether it is in a subgraph or a downstream service.
Principal Software Engineer Alejandro Baez

Measured improvements:

  • Subgraph resolver execution in single-digit microseconds for common paths.
  • Gateway level rate limiting decisions in under one millisecond.
  • Automatic persisted queries reduce duplicate traffic and improve cache hit rates.
Pantheon case study illustration

Grafbase reduced the specialized knowledge needed to contribute, allowing faster onboarding, cleaner cross-team boundaries, and higher deployment confidence.

This has enabled:

  • Faster onboarding for new engineers.
  • Cleaner cross-team boundaries and collaboration through schema PRs.
  • Higher confidence in production deployments due to automated checks and isolated sandboxes.
  • Lower cognitive load on platform specialists.

Grafbase is now the standard for all new public-facing services at Pantheon. Any new API work must be federated through the Grafbase gateway, with auth, observability, and schema management built in.

Upcoming focus areas include:

  • Migrating the remaining legacy subgraphs
  • Scaling federated subscriptions across more teams
  • Continuing to expand the extension library for internal needs

The benefits are…clear. Grafbase hasn’t just modernized our GraphQL stack. It’s helped us build a faster, more resilient engineering organization.
Principal Software Engineer Alejandro Baez

Ready to modernize your GraphQL infrastructure?

Get in touch with the Grafbase team to learn how we can help your engineering org move faster, improve reliability, and scale with confidence.