Logging and OpenTelemetry in the Grafbase Gateway
The Grafbase Gateway provides logs, traces, and metrics for monitoring gateway operations and errors. By default, it outputs logs to standard output. Additionally, the gateway can send monitoring data to an endpoint that implements the OpenTelemetry protocols.
You can define the level of information by setting the log level command line argument:
--log <LOG_LEVEL>
Set the logging level, this applies to all spans, logs and trace events.
Beware that *only* 'off', 'error', 'warn' and 'info' can be used safely in
production. More verbose levels, such as 'debug', will include sensitive
information like request variables, responses, etc.
Possible values are: 'off', 'error', 'warn', 'info', 'debug', 'trace' or a
custom string. In the last case, the string is passed on to
[`tracing_subscriber::EnvFilter`] as is and is only meant for debugging
purposes. No stability guarantee is made on the format.
[env: GRAFBASE_LOG=]
[default: info]
This setting affects both traces and logs. The default level is info
. debug
and trace
will include sensitive details and should not be used in production.
If you want to silence all logs but still export them along with traces and metrics to an OpenTelemetry endpoint, direct standard output and standard error to /dev/null
.
The gateway can return the query plan and trace id in the GraphQL response extensions under grafbase
for Pathfinder, our GraphQL query tool. It is enabled by default and is returned whenever x-grafbase-telemetry
request header is present:
# Default configuration
[telemetry.exporters.response_extension]
# Whether the traceId is exposed
trace_id = true
# Whether the queryPlan is exposed
query_plan = true
# Defines who can access the grafbase response extension.
[[telemetry.exporters.response_extension.access_control]]
rule = "header"
name = "x-grafbase-telemetry"
Access can be denied for everyone with:
[[telemetry.exporters.response_extension.access_control]]
rule = "deny"
It is also possible to require a specific value for the header. Only requests with right header value will have the grafbase extension, all others won't.
[[telemetry.exporters.response_extension.access_control]]
rule = "header"
name = "x-grafbase-telemetry"
value = "must-be-this-value"
Environment variables can be used to parameterize the configuration:
[[telemetry.exporters.response_extension.access_control]]
rule = "header"
name = "{{ env.HEADER_NAME }}"
value = "{{ env.SECRET }}"
By default, the system outputs logs to standard output. Logs can appear in two different formats:
--log-style <LOG_STYLE>
Set the style of log output
[env: GRAFBASE_LOG_STYLE=]
[default: pretty]
Possible values:
- pretty: Pretty printed logs, used as the default in the terminal
- text: Standard text, used as the default when piping stdout to a file
- json: JSON objects
The default style is pretty
, inside a terminal, which provides ANSI-colored text for terminal output and a human-friendly formatting. When piping to a file, text
will be used instead.The json
format delivers logs in JSON format, which can be useful if the logging platform supports structured data.
Logs can also be sent to an OpenTelemetry endpoint by enabling the OpenTelemetry exporter in the configuration:
[telemetry.exporters.otlp]
enabled = true
endpoint = "http://localhost:1234"
You can send logs to a different endpoint than the global OpenTelemetry settings:
[telemetry.logs.exporters.otlp]
enabled = true
endpoint = "http://localhost:1235"
Read more about OpenTelemetry options in the configuration section.
Grafbase Gateway monitors the request lifecycle by providing traces. When you supply a valid access token in the GRAFBASE_ACCESS_TOKEN
environment variable, the system automatically sends traces to the Grafbase Dashboard or Grafbase Enterprise platform. The dashboard only displays traces from the Grafbase Gateway. To send traces to a different OpenTelemetry endpoint, configure it in the configuration file. A third-party telemetry platform allows you to combine traces from the gateway with other services in your platform. Traces provide information on the request lifecycle and send data to the OpenTelemetry endpoint from the info
level.
You can change settings for tracing in the gateway configuration:
[telemetry.tracing]
sampling = 1
parent_based_sampler = false
sampling
: Defines the percentage of requests to trace (a floating point from 0 to 1). Set this to1
for testing purposes or with low traffic. For high traffic, sampling every request can be expensive for network, CPU, and storage. (default:0.15
)parent_based_sampler
: Enables the parent based sampler mechanism. When enabled, the gateway looks at the request headers to make trace sampling decisions. It falls back to its default sampling strategy when the request doesn't specify a sampling strategy. This option is disabled by default. Only enable it if you control all the clients, because malicious actors could create more load by manipulating sampling. (default:false
)
[telemetry.tracing.collect]
max_events_per_span = 128
max_attributes_per_span = 128
max_links_per_span = 128
max_attributes_per_event = 128
max_attributes_per_link = 128
max_events_per_span
: Maximum number of events recorded per span (default:128
)max_attributes_per_span
: Maximum number of attributes recorded per span (default:128
)max_links_per_span
: Maximum number of links recorded per span (default:128
)max_attributes_per_event
: Maximum number of attributes one event can have (default:128
)max_attributes_per_link
: Maximum number of attributes one link can have (default:128
)
The propagation
options determine how the router propagates tracing context (trace id, parent span id, and extra context) when it receives requests and passes them to subgraphs. The router supports multiple common standards. Contact us if you need support for additional formats.
Propagation enables you to link spans created in the gateway with spans created in other services. This helps you debug and monitor the entire request lifecycle. The Grafbase dashboard doesn't propagate traces. To propagate traces, define an additional OpenTelemetry endpoint in the gateway configuration.
[telemetry.tracing.propagation]
trace_context = true
baggage = true
aws_xray = false
trace_context
: Enable TraceContext propagation through thetraceparent
header. This is the standard trace parent propagation mechanism in OpenTelemetry. Default: false.baggage
: Enable Baggage context propagation through thebaggage
header. This is the standard context propagation mechanism in OpenTelemetry. Default: false.aws_xray
: Enable AWS X-Ray propagation through thex-amzn-trace-id
header. This is the builtin trace propagation mechanism in AWS X-Ray.
Enable the OpenTelemetry exporter in the configuration to send traces to an additional OpenTelemetry endpoint:
[telemetry.exporters.otlp]
enabled = true
endpoint = "http://localhost:1234"
You can also send traces to a different endpoint than the global value:
[telemetry.tracing.exporters.otlp]
enabled = true
endpoint = "http://localhost:1235"
Read more about OpenTelemetry options in the OpenTelemetry configuration.
To write spans directly to standard output, turn on the global stdout exporter. This helps during evaluation and debugging:
[telemetry.exporters.stdout]
enabled = true
To enable only tracing, use:
[telemetry.tracing.exporters.stdout]
enabled = true
The Grafbase Gateway sends spans in certain points in the request execution.
All the spans will have the following default attributes:
busy_ns
: Time the span remained active in nanoseconds.code.filepath
: Code file path.code.lineno
: Line number in the code where this span originated.code.namespace
: Module name.idle_ns
: Time the span remained idle in nanoseconds.thread.id
: Runtime thread ID.thread.name
: Runtime thread name.
Span name: <VERB> <PATH>
.
The root span monitors the complete request lifecycle. Additional spans descend from this root span.
Attributes:
grafbase.kind
: The span kind, which is alwayshttp-request
for the root span.graphql.operations.name
: The name or names of the executed operation(s).graphql.operations.type
: The type or types of the executed operation(s).graphql.response.errors.count
: The number of errors in the response.graphql.response.errors.count_by_code.codes
: Distinct error codes in the response.graphql.response.errors.count_by_code.counts
: The number of errors for each distinct error code.http.request.body.size
: The size of the request body.http.request.header.x-forwarded-for
: The client IP address.http.request.header.x-grafbase-client-name
: The name of the client.http.request.header.x-grafbase-client-version
: The version of the client.http.request.method
: The HTTP method.http.response.body.size
: The size of the response body.server.address
: The server address.url.path
: The URL path.user_agent.original
: The user agent.
Span name: hook: on-gateway-request
.
You'll only see this span if you define the on-gateway-request
hook.
The span has two child spans:
get instance from the pool
: Taking the instance from a pool.call instance
: Execution of the actual instance code.
Span name: authenticate
.
You'll only see this span if you enable authentication.
Span name: rate limit
.
You'll only see this span if you enable global rate limiting with Redis storage.
Span name: <OPERATION_NAME>
.
Operation execution spans are created for each operation in the request.
Attributes;
grafbase.kind
: The span kind, which is alwaysgraphql-operation
for operation execution spans.grafbase.operation.computed_name
: The name of the operation. For named operations, this shows the operation name. For unnamed operations, this shows a name derived from the query.graphql.operation.document
: The normalized query that hides all possible data.graphql.operation.type
: The type of the operation:query
,mutation
orsubscription
.graphql.response_data.is_present
: Whether the response data is present.graphql.response.errors.count_by_code
: Distinct error codes in the response with their counts.
Span name: prepare operation
.
This span plans the operation execution and operates as a child of the operation execution span.
Span name: hook: authorize-edge-pre-execution
.
When edges use an @authorized
directive with an arguments
argument, the gateway shows this span under the operation execution span and repeats it on each accessed edge that has a matching directive. The span has two child spans:
The span has two child spans:
get instance from the pool
: Taking the instance from a pool.call instance
: Execution of the actual instance code.
Span name: hook: authorize-node-pre-execution
.
When nodes use an @authorized
directive with an arguments
argument, the gateway shows this span under the operation execution span and repeats it on each accessed node that has a matching directive. The span has two child spans:
The span has two child spans:
get instance from the pool
: Taking the instance from a pool.call instance
: Execution of the actual instance code.
Span name: <SUBGRAPH_NAME>
.
The gateway creates subgraph execution spans for each subgraph in the request. These spans become children of the operation execution span.
Attributes:
subgraph.name
: The name of the subgraph.grafbase.kind
: The span kind, which is alwayssubgraph-graphql-request
for subgraph execution spans.graphql.operation.document
: The subgraph receives this query. The system replaces all data with variables to prevent exposure of sensitive data.graphql.operation.type
: The type of the operation:query
,mutation
orsubscription
.graphql.response_data.is_present
: Whether the response data is present.
Span name: hook: on-subgraph-request
.
This span appears when you define the on-subgraph-request
hook and operates as a child of the subgraph execution span.
The span has two child spans:
get instance from the pool
: Taking the instance from a pool.call instance
: Execution of the actual instance code.
Span name: rate limit
.
This span appears when you enable rate limiting to the subgraph with Redis storage.
Attributes:
subgraph.name
: The name of the subgraph.
Span name: POST <PATH>
.
This span tracks HTTP requests to the subgraph. When you enable retries and requests fail, the gateway creates multiple spans. All spans act as children of the subgraph execution span.
Attributes:
http.request.method
: The HTTP method.http.response.status_code
: The HTTP status code.server.address
: The server address.server.port
: The server port.
Span name: hook: on-subgraph-response
.
The gateway creates this span when you define the on-subgraph-response
hook. This span acts as a child of the subgraph execution span.
The span has two child spans:
get instance from the pool
: Taking the instance from a pool.call instance
: Execution of the actual instance code.
Span name: hook: on-operation-response
.
The gateway creates this span when you define the on-subgraph-response
hook. This span acts as a child of the operation execution span.
The span has two child spans:
get instance from the pool
: Taking the instance from a pool.call instance
: Execution of the actual instance code.
Span name: hook: authorize-parent-edge-post-execution
.
When edges use an @authorized
directive with fields
argument, the gateway shows this span under the root span and repeats it on each accessed edge that has a matching directive. The span has two child spans:
The span has two child spans:
get instance from the pool
: Taking the instance from a pool.call instance
: Execution of the actual instance code.
Span name: hook: authorize-edge-node-post-execution
.
When nodes use an @authorized
directive with node
argument, the gateway shows this span under the root span and repeats it on each accessed node that has a matching directive. The span has two child spans:
The span has two child spans:
get instance from the pool
: Taking the instance from a pool.call instance
: Execution of the actual instance code.
Span name: hook: on-http-response
.
The gateway creates this span when you define the on-http-response
hook. This span acts as a child of the root span.
The span has two child spans:
get instance from the pool
: Taking the instance from a pool.call instance
: Execution of the actual instance code.
The Grafbase Gateway delivers metrics for requests and operations to an OpenTelemetry endpoint. Metrics include counters, histograms, and gauges at various points in the system.
Enable the OpenTelemetry exporter in the configuration to send metrics to an OpenTelemetry endpoint:
[telemetry.exporters.otlp]
enabled = true
endpoint = "http://localhost:1234"
You can send metrics to a separate endpoint as well:
[telemetry.metrics.exporters.otlp]
enabled = true
endpoint = "http://localhost:1235"
Read more about OpenTelemetry options in the configuration section.
You can also write spans directly to standard output by enabling the global stdout exporter for evaluation and debugging:
[telemetry.exporters.stdout]
enabled = true
Enable it only for metrics if needed:
[telemetry.metrics.exporters.stdout]
enabled = true
The exponential histograms include a Count
field, which doubles any histogram as a counter metric. If you can't find a specific counter, check if any of the histograms can serve that purpose.
Metric Name: http.server.request.duration
This exponential histogram measures the time in milliseconds for each HTTP request and helps you track the final response time for those requests. It includes the following attributes:
http.response.status_code
: The HTTP status code.http.request.method
: The HTTP request method.http.route
: The request path.network.protocol.version
: The HTTP version of the request.server.address
: The server's listen address.server.port
: The server's listen port.url.scheme
: Eitherhttp
orhttps
, depending on whether TLS is enabled in the gateway.http.headers.x-grafbase-client-name
: The name of the client that triggered this request, if available.http.headers.x-grafbase-client-version
: The version of the client that triggered this request, if available.graphql.response.status
: Indicates whether the underlying GraphQL operation succeeded, if available.
Metric Name: http.server.connected.clients
This up/down counter tracks currently connected clients, incrementing on an incoming request and decrementing upon any response.
Metric Name: http.server.request.body.size
This exponential histogram measures request body sizes.
Metric Name: http.server.response.body.size
This exponential histogram measures response body sizes.
Metric Name: graphql.operation.duration
This exponential histogram measures the time in milliseconds for every valid operation in the GraphQL engine. The metric includes the following attributes:
graphql.document
: The normalized query of this operation, stripped of all variables. This value cannot contain any private data.graphql.operation.type
: The type of the operation (eitherquery
,mutation
, orsubscription
).graphql.operation.name
: The name of the operation, if provided.graphql.response.status
: Indicates if the response succeeded.http.headers.x-grafbase-client-name
: The name of the client that triggered this request, if available.http.headers.x-grafbase-client-version
: The version of the client that triggered this request, if available.
Metric Name: graphql.operation.errors
This counter tracks distinct GraphQL errors per request. The metric contains the following attributes:
graphql.response.error.code
: The error code returned to the user.graphql.operation.name
: The name of the operation, if present.http.headers.x-grafbase-client-name
: The name of the client, if present.http.headers.x-grafbase-client-version
: The version of the client, if present.
Metric Name: graphql.operation.batch.size
This exponential histogram measures the number of batched requests sent to the engine. It counts the total number of batched requests while measuring the number of requests in the batch.
Metric Name: graphql.subgraph.request.duration
This exponential histogram measures the time in milliseconds for every subgraph request. It helps track execution time and includes the following attributes:
graphql.subgraph.name
: The requested subgraph's name.graphql.subgraph.response.status
: Indicates if the response succeeded.http.response.status_code
: The HTTP status code.
Metric Name: graphql.subgraph.request.retries
This counter tracks retried subgraph requests. To enable this counter, you must enable retries. The counter increments when a subgraph request fails and the engine retries it. The metric includes the following attributes:
graphql.subgraph.name
: The requested subgraph's name.graphql.subgraph.aborted
: Indicates if the retries stopped and if the request became an error.
Metric Name: graphql.subgraph.request.body.size
This exponential histogram measures subgraph request body sizes in bytes. The metric includes the following attribute:
graphql.subgraph.name
: The requested subgraph's name.
Metric Name: graphql.subgraph.response.body.size
This exponential histogram measures successful subgraph response body sizes in bytes. The metric includes the following attribute:
graphql.subgraph.name
: The requested subgraph's name.
Metric Name: graphql.subgraph.request.inflight
This up/down counter tracks in-flight subgraph requests. It increments when requesting a subgraph and decrements upon any response. The metric includes the following attribute:
graphql.subgraph.name
: The requested subgraph's name.
Metric Name: graphql.subgraph.request.cache.hit
This counter tracks hits of subgraph entity caches. Enable this counter by activating entity caching. The metric includes the following attribute:
graphql.subgraph.name
: The requested subgraph's name.
Metric Name: graphql.subgraph.request.cache.miss
This counter tracks misses of subgraph entity caches. Enable this counter by activating entity caching. The metric includes the following attribute:
graphql.subgraph.name
: The requested subgraph's name.
Metric Name: graphql.operation.cache.hit
This counter tracks hits for operation plan caches.
Metric Name: graphql.operation.cache.miss
This counter tracks misses for operation plan caches.
Metric Name: graphql.operation.prepare.duration
This exponential histogram measures the time in milliseconds taken to prepare an operation. This includes:
- Fetching a trusted document, if enabled and available.
- Fetching a query plan from the in-memory cache.
- If the plan is not cached, parsing the query into an AST and then determining the plan.
The metric includes the following attributes:
graphql.operation.name
: The name of the operation, if present.graphql.document
: The normalized operation if parsing succeeds.graphql.operation.success
: Indicates if the preparation finished successfully.
Metric Name: grafbase.hook.duration
This exponential histogram measures the time in milliseconds taken to execute a hook. The metric includes the following attributes:
grafbase.name.hook
: The name of the hook function.grafbase.hook.status
: Indicates if the hook call succeeded (SUCCESS
), or if it failed due to errors from Grafbase code (HOST_ERROR
), or from user code (GUEST_ERROR
).
Metric Name: grafbase.hook.pool.instances.busy
This counter counts the number of active instances in the hook instance pool. Each instance processes one request at a time, which is why the gateway utilizes a pool to handle multiple requests concurrently. The metric includes the following attributes:
grafbase.hook.interface
: The instantiated hook interface.
Metric Name: grafbase.gateway.access_log.pending
This counter measures the amount of access log events not yet written to the access log file. Read more on access logs.
Metric Name: grafbase.gateway.rate_limit.duration
This exponential histogram measures the time in milliseconds taken to query the current request rate from Redis. This metric requires enabling the Redis-based rate-limiting.
Metric Name: gdn.request.duration
This exponential histogram measures the time in milliseconds to fetch a graph from the Graph Delivery Network. This metric only activates in hybrid mode. The metric includes the following attributes:
server.address
: The Graph Delivery Network endpoint URL.gdn.response.kind
: The response status kind, eithernew
,unchanged
,http_error
, orgdn_error
.http.response.status_code
: The status code of the request.
Define OpenTelemetry settings in the telemetry
block of your Gateway configuration:
[telemetry]
service_name = "grafbase-gateway"
The service_name
appears in all traces, metrics, and logs and should be unique in your system.
Grafbase sends a standard set of resource attributes for every user. You can also define your own attributes, available in all logs, traces, and metrics:
[telemetry.resource_attributes]
custom_key = "custom_value"
other_key = "other_value"
You can define exporter settings globally for traces, logs, and metrics. If you need different settings for logs, tracing, or metrics, prefix the exporter settings with the appropriate word. For instance, custom settings for tracing use the key telemetry.tracing.exporters.otlp
.
The traces and metrics can also be sent to standard output (logs will always be there):
[telemetry.exporters.stdout]
enabled = true
timeout = 60
enabled
: Enables the OpenTelemetry exporter (default:false
).timeout
: Time in seconds data remains in memory if the collector does not collect it promptly (default:60
).
Send traces, metrics, and logs to an external OpenTelemetry collector:
[telemetry.exporters.otlp]
enabled = true
endpoint = "http://localhost:1234"
protocol = "grpc"
timeout = 60
enabled
: Enables the OpenTelemetry exporter (default:false
).endpoint
: Defines the URL for the OpenTelemetry collector.protocol
: Eithergrpc
orhttp
(default:grpc
).timeout
: Time in seconds data remains in memory if the collector does not collect it promptly (default:60
).
Avoid triggering a request for every single span, trace, and metric event. Instead, batch requests and send data at regular intervals. Configure the OpenTelemetry batch settings:
[telemetry.exporters.otlp.batch_export]
scheduled_delay = 5
max_queue_size = 2048
max_export_batch_size = 512
max_concurrent_exports = 1
scheduled_delay
: Time in seconds between consecutive requests (default:5
).max_queue_size
: Maximum queued items for delayed processing. If the queue fills, the system drops events (default:2048
).max_export_batch_size
: Maximum number of events in a single batch. If more events are collected before the scheduled delay, it queues them (default:512
).max_concurrent_exports
: Number of concurrent senders processing batches (default:1
).
If using grpc
as the protocol
, the Gateway will use the following settings.
For collectors using TLS with a custom certificate, specify the TLS settings:
[telemetry.exporters.otlp.grpc.tls]
domain_name = "custom_name"
key = "/path/to/key.pem"
cert = "/path/to/cert.pem"
ca = "/path/to/ca.crt"
domain_name
: The domain name against which to verify the server's TLS certificate.key
: Path to the secret key.cert
: Path to the X509 certificate file in PEM format.ca
: Path to the X509 CA certificate file in PEM format.
If needed, define custom headers for gRPC collectors:
[[telemetry.exporters.otlp.grpc.headers]]
authorization = "Bearer {{ env.GRPC_TOKEN }}"
[[telemetry.exporters.otlp.grpc.headers]]
custom = "static value"
If you set the protocol
to http
, the Gateway will use the following settings. Define custom headers to send with every request:
[[telemetry.exporters.otlp.http.headers]]
authorization = "Bearer {{ env.GRPC_TOKEN }}"
[[telemetry.exporters.otlp.http.headers]]
custom = "static value"
Currently, the http
exporter does not support TLS. If you need TLS, use the grpc
exporter.