Metrics Reference

Complete reference of all Prometheus metrics exposed by the Rastir collector server on the /metrics endpoint.

All metrics are derived server-side from ingested span data. The client library does not expose a metrics endpoint.

Span Counters

Core counters tracking span ingestion and call volumes.

Metric	Type	Labels	Description
`rastir_spans_ingested_total`	Counter	service, env, span_type, status	Total spans ingested
`rastir_llm_calls_total`	Counter	service, env, model, provider, agent	LLM invocations
`rastir_tokens_input_total`	Counter	service, env, model, provider, agent	Input (prompt) tokens
`rastir_tokens_output_total`	Counter	service, env, model, provider, agent	Output (completion) tokens
`rastir_tool_calls_total`	Counter	service, env, tool_name, agent, model, provider	Tool invocations
`rastir_retrieval_calls_total`	Counter	service, env, agent, model, provider	Retrieval operations
`rastir_errors_total`	Counter	service, env, span_type, error_type, model, provider, agent	Error spans by normalised category

Histograms

Histograms track the distribution of values, enabling percentile calculations via PromQL’s histogram_quantile().

Metric	Type	Labels	Default Buckets	Unit
`rastir_duration_seconds`	Histogram	service, env, span_type, model, provider	0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0, 30.0, 60.0	seconds
`rastir_tokens_per_call`	Histogram	service, env, model, provider	10, 50, 100, 250, 500, 1000, 2000, 4000, 8000, 16000, 32000	tokens
`rastir_cost_per_call_usd`	Histogram	service, env, model	0.0001, 0.0005, 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100	USD
`rastir_ttft_seconds`	Histogram	service, env, model, provider	0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10	seconds

Maximum of 20 buckets per histogram. Buckets are configurable via YAML or environment variables — see Configuration.

What Prometheus Exposes

For each histogram, Prometheus creates:

rastir_duration_seconds_bucket{..., le="0.25"} → count of spans ≤ 0.25s
rastir_duration_seconds_bucket{..., le="1.0"}  → count of spans ≤ 1.0s
rastir_duration_seconds_bucket{..., le="+Inf"} → total count
rastir_duration_seconds_sum{...}               → sum of all values
rastir_duration_seconds_count{...}             → same as +Inf bucket

Percentile Queries — P50, P95, P99

# P50 (median) LLM call duration
histogram_quantile(0.50,
  rate(rastir_duration_seconds_bucket{span_type="llm"}[5m])
)

# P95 LLM call duration
histogram_quantile(0.95,
  rate(rastir_duration_seconds_bucket{span_type="llm"}[5m])
)

# P99 LLM call duration — tail latency
histogram_quantile(0.99,
  rate(rastir_duration_seconds_bucket{span_type="llm"}[5m])
)

# P95 duration per model
histogram_quantile(0.95,
  sum by (model, le) (
    rate(rastir_duration_seconds_bucket{span_type="llm"}[5m])
  )
)

# P50 tokens per LLM call
histogram_quantile(0.50,
  rate(rastir_tokens_per_call_bucket[5m])
)

# P95 tool execution time
histogram_quantile(0.95,
  rate(rastir_duration_seconds_bucket{span_type="tool"}[5m])
)

Average & Throughput Queries

# Average LLM call duration
rate(rastir_duration_seconds_sum{span_type="llm"}[5m])
  /
rate(rastir_duration_seconds_count{span_type="llm"}[5m])

# LLM calls per second
rate(rastir_llm_calls_total[5m])

# Error rate as a percentage
rate(rastir_errors_total[5m])
  /
rate(rastir_spans_ingested_total[5m]) * 100

# Average tokens per call
rate(rastir_tokens_per_call_sum[5m])
  /
rate(rastir_tokens_per_call_count[5m])

Cost & TTFT Metrics

Metric	Type	Labels	Description
`rastir_cost_total`	Counter	service, env, model, provider, agent, pricing_profile	Accumulated USD cost
`rastir_cost_per_call_usd`	Histogram	service, env, model	Cost distribution per LLM call
`rastir_pricing_missing_total`	Counter	service, env, model, provider	LLM calls missing pricing data
`rastir_ttft_seconds`	Histogram	service, env, model, provider	Time-To-First-Token for streaming calls

Cost metrics are only recorded when the client sends cost_usd as a span attribute (requires enable_cost_calculation=True). TTFT metrics are only recorded for streaming LLM spans that include ttft_ms.

The pricing_profile label on rastir_cost_total is cardinality-guarded with a cap of 20 distinct values. The rastir_cost_per_call_usd histogram intentionally excludes pricing_profile to prevent cardinality explosion.

Guardrail Metrics

Metric	Type	Labels	Description
`rastir_guardrail_requests_total`	Counter	service, env, provider, model, agent, guardrail_id, guardrail_version	Guardrail-enabled LLM calls
`rastir_guardrail_violations_total`	Counter	service, env, provider, model, agent, guardrail_id, guardrail_action, guardrail_category	Guardrail interventions

Guardrail labels are cardinality-guarded with bounded enum validation:

Label	Allowed Values
`guardrail_category`	`CONTENT_POLICY`, `SENSITIVE_INFORMATION_POLICY`, `WORD_POLICY`, `TOPIC_POLICY`, `CONTEXTUAL_GROUNDING_POLICY`, `DENIED_TOPIC`
`guardrail_action`	`GUARDRAIL_INTERVENED`, `NONE`
`guardrail_id`	Subject to cardinality cap (default: 100)

Unknown values are replaced with __cardinality_overflow__. Validation runs on both the client adapter and the server.

Evaluation Metrics

Metric	Type	Labels	Description
`rastir_evaluation_runs_total`	Counter	service, env, model, provider, agent, evaluation_type, evaluator_model, evaluator_provider	Evaluation runs
`rastir_evaluation_failures_total`	Counter	service, env, model, provider, agent, evaluation_type, evaluator_model, evaluator_provider	Failed evaluations
`rastir_evaluation_latency_seconds`	Histogram	service, env, model, provider, agent, evaluation_type, evaluator_model, evaluator_provider	Evaluation execution time
`rastir_evaluation_score`	Gauge	service, env, model, provider, agent, evaluation_type, evaluator_model, evaluator_provider	Evaluation score
`rastir_evaluation_queue_size`	Gauge	—	Evaluation queue depth
`rastir_evaluation_queue_utilization_percent`	Gauge	—	Evaluation queue fill percentage
`rastir_evaluation_dropped_total`	Counter	service, env	Evaluations dropped due to full queue

Operational Metrics

Server health and performance metrics.

Metric	Type	Description
`rastir_queue_size`	Gauge	Current ingestion queue depth
`rastir_queue_utilization_percent`	Gauge	Queue fill percentage
`rastir_memory_bytes`	Gauge	Server process RSS memory
`rastir_trace_store_size`	Gauge	Total spans in trace store
`rastir_active_traces`	Gauge	Distinct trace count in store
`rastir_ingestion_rate`	Gauge	Spans ingested per second
`rastir_ingestion_rejections_total`	Counter	Rejected spans (backpressure)
`rastir_export_failures_total`	Counter	OTLP export failures
`rastir_redaction_applied_total`	Counter	Redaction rules applied (labels: service, env)
`rastir_redaction_failures_total`	Counter	Redaction processing failures (labels: service, env)

Sampling & Backpressure Metrics

Metric	Type	Description
`rastir_spans_sampled_total`	Counter	Spans retained after sampling
`rastir_spans_dropped_by_sampling_total`	Counter	Spans dropped by sampling
`rastir_backpressure_warnings_total`	Counter	Soft limit warnings
`rastir_spans_dropped_by_backpressure_total`	Counter	Spans dropped by backpressure
`rastir_rate_limited_total`	Counter	Rate-limited requests (by dimension)

Error Type Normalisation

The rastir_errors_total counter uses normalised error categories instead of raw exception class names, preventing unbounded label cardinality.

Normalised category	Matched exception patterns
`timeout`	`TimeoutError`, `asyncio.TimeoutError`, `httpx.TimeoutException`, `httpx.ReadTimeout`, `httpx.ConnectTimeout`, `requests.exceptions.Timeout`, `openai.APITimeoutError`
`rate_limit`	`RateLimitError`, `openai.RateLimitError`, `anthropic.RateLimitError`
`validation_error`	`ValueError`, `TypeError`, `ValidationError`, `pydantic.ValidationError`
`provider_error`	`openai.APIError`, `openai.APIConnectionError`, `openai.APIStatusError`, `anthropic.APIError`, `anthropic.APIConnectionError`, `anthropic.APIStatusError`, `botocore.exceptions.ClientError`
`internal_error`	`RuntimeError`, `Exception`
`unknown`	Any unrecognised exception type

Normalisation uses exact match first, then substring heuristics (e.g., any exception with “timeout” in the name maps to timeout).

Cardinality Guards

All high-cardinality labels are subject to per-dimension caps. Values exceeding the cap are replaced with __cardinality_overflow__.

Label	Default Cap	Applies To
`model`	50	`llm_calls`, `tokens_`, `cost_`, `guardrail_`, `evaluation_`
`provider`	10	Same as model
`tool_name`	200	`tool_calls`
`agent`	200	`llm_calls`, `tokens_`, `tool_calls`, `guardrail_`
`error_type`	50	`errors`
`guardrail_id`	100	`guardrail_*`
`pricing_profile`	20	`cost_total`

Caps are configurable via server config — see Configuration.

SRE Config Gauges

These gauges are set once at server startup from the sre: configuration section. They are consumed by Prometheus recording rules to compute error budgets, cost budgets, burn rates, and exhaustion estimates.

Metric	Type	Labels	Description
`rastir_slo_error_rate`	Gauge	agent	Configured SLO error rate (e.g. 0.01 = 1%). Set per agent with fallback via `agent="unknown"`
`rastir_cost_budget_usd`	Gauge	agent	Configured cost budget in USD per period. Set per agent with fallback via `agent="unknown"`

These gauges are not updated at runtime — they reflect the static configuration. The derived SRE metrics (error budget remaining, burn rates, days to exhaustion, etc.) are computed entirely by Prometheus recording rules. See Server — SRE Recording Rules for details.

Derived Recording Rules

The following recording-rule time-series are generated by Prometheus (not by the Rastir server) and consumed by the SRE Budgets dashboard:

Recording Rule	Description
`rastir:volume:{week,month}`	Request volume capped at month boundary
`rastir:expected_volume:{week,month}`	Rolling request volume for projection
`rastir:errors:{week,month}`	Error count capped at month boundary
`rastir:errors_by_model:{week,month}`	Error count by model (for pie charts)
`rastir:error_budget_total:{week,month}`	Total error budget (volume × SLO)
`rastir:error_budget_remaining:{week,month}`	Remaining error budget
`rastir:error_budget_consumed_pct:{week,month}`	Consumed error budget %
`rastir:sla_status:{week,month}`	SLA health (1 = healthy, 0 = breached)
`rastir:error_days_to_exhaustion:{week,month}`	Days until error budget is exhausted
`rastir:error_burn_rate:{1h,6h}`	Short/long window error burn rate
`rastir:cost:{week,month}`	Cost consumed in period
`rastir:cost_budget_total:{week,month}`	Configured cost budget
`rastir:cost_budget_remaining:{week,month}`	Remaining cost budget
`rastir:cost_budget_consumed_pct:{week,month}`	Cost budget consumed %
`rastir:cost_burn_rate_daily:{week,month}`	Average daily cost burn rate
`rastir:cost_days_to_exhaustion:{week,month}`	Days until cost budget is exhausted

Exemplar Support

Exemplars attach a trace_id to histogram observations and counter increments, creating a direct link from a Prometheus metric to the distributed trace that produced it.

Metrics That Carry Exemplars

Metric	Exemplar Label
`rastir_duration_seconds`	`trace_id`
`rastir_llm_calls_total`	`trace_id`

Enabling Exemplars

# server-config.yml
exemplars:
  enabled: true

Or: export RASTIR_SERVER_EXEMPLARS_ENABLED=true

When enabled, the /metrics endpoint automatically switches to OpenMetrics format (required for exemplars).

Output Format

# Without exemplars (classic Prometheus)
rastir_duration_seconds_bucket{...,le="1.0"} 42

# With exemplars (OpenMetrics)
rastir_duration_seconds_bucket{...,le="1.0"} 42 # {trace_id="a1b2c3d4"} 0.847 1709042400.0

Grafana Integration

Edit your Prometheus data source → enable Exemplars toggle
Set Internal link to your Jaeger/Tempo data source
Map label trace_id to trace ID field
In panel queries, toggle Exemplars on — they appear as diamond markers
Click a diamond to jump directly to the trace

Grafana: P95 latency spike at 14:32 →
  Click exemplar diamond →
    Tempo: trace_id=a1b2c3d4 →
      research_agent (2.3s)
        ├─ plan_step (0.8s)        OK
        ├─ web_search (1.2s)       ← slow!
        └─ synthesize (0.3s)       OK