Collector Server
The Rastir collector server is a FastAPI application that receives span batches from client libraries, derives Prometheus metrics, stores traces in memory, and optionally forwards data via OTLP.
Running the Server
# Console script (installed with pip install rastir[server])
rastir-server
# Python module
python -m rastir.server
# Docker
docker run -p 8080:8080 rastir-server
Endpoints
| Method | Path | Description |
|---|---|---|
POST | /v1/telemetry | Ingest span batches |
GET | /metrics | Prometheus metrics exposition |
GET | /v1/traces | Query recent traces |
GET | /v1/traces/{trace_id} | Get spans for a specific trace |
GET | /health | Liveness probe |
GET | /ready | Readiness probe |
POST /v1/telemetry
Accepts JSON payloads with span batches:
{
"service": "my-app",
"env": "production",
"version": "1.0.0",
"spans": [
{
"name": "ask_llm",
"span_type": "llm",
"trace_id": "abc-123",
"status": "OK",
"duration_ms": 1234.5,
"attributes": {
"model": "gpt-4",
"provider": "openai",
"tokens_input": 150,
"tokens_output": 300
}
}
]
}
Response: 202 Accepted with {"status": "accepted", "spans_received": N}
Error responses:
400— Invalid JSON or missingspans429— Rate limited or queue full
GET /v1/traces
Query parameters:
trace_id— Look up a specific traceservice— Filter traces by service namelimit— Max results (default: 20)
GET /v1/traces/{trace_id}
Returns all spans for a specific trace by path parameter.
GET /ready
Returns 200 when healthy, 503 when degraded:
{
"status": "ready",
"queue_pct": 12.5
}
If unhealthy:
{
"status": "not_ready",
"queue_pct": 96.2,
"reasons": ["queue_pct=96.2% >= hard_limit=95.0%"]
}
Prometheus Metrics
The server derives all Prometheus metrics from ingested span data and exposes them on the /metrics endpoint. See the Metrics Reference page for the complete list of counters, histograms, gauges, cardinality guards, error normalisation rules, exemplar support, and PromQL examples.
Sampling
Control which spans are stored/exported (metrics are always recorded):
sampling:
rate: 0.1 # Keep 10% of spans
Backpressure
Configure queue-based flow control:
backpressure:
soft_limit_pct: 80 # Warning threshold
hard_limit_pct: 95 # Rejection/drop threshold
mode: reject # "reject" or "drop_oldest"
Rate Limiting
Optional per-IP and per-service rate limits:
rate_limit:
enabled: true
per_ip_rpm: 600 # Requests per minute per IP
per_service_rpm: 3000 # Requests per minute per service
Multi-Tenant Mode
Inject a tenant label from HTTP headers:
multi_tenant:
enabled: true
header_name: X-Tenant-ID
OTLP Export
Forward spans to Jaeger, Tempo, or any OTLP-compatible backend:
exporter:
otlp_endpoint: http://jaeger:4317
batch_size: 200
flush_interval: 5
Graceful Shutdown
shutdown:
grace_period_seconds: 30
drain_queue: true
The server drains the ingestion queue and flushes exporter buffers before exiting.
SRE — Prometheus Recording Rules
Rastir’s SRE layer uses a Prometheus-native approach: the server exposes static config gauges (rastir_slo_error_rate, rastir_cost_budget_usd) and Prometheus recording rules compute all derived SRE metrics from raw counters.
Architecture
Rastir Server Prometheus
┌──────────────────────┐ ┌──────────────────────────────┐
│ rastir_slo_error_rate │───────▸│ Recording Rules │
│ rastir_cost_budget_usd│ scrape│ ├─ rastir:errors:week/month │
│ rastir_llm_calls_total│ │ ├─ rastir:error_budget_* │
│ rastir_errors_total │ │ ├─ rastir:cost:week/month │
│ rastir_cost_total │ │ ├─ rastir:cost_budget_* │
└──────────────────────┘ │ ├─ rastir:error_burn_rate:* │
│ └─ rastir:*_days_to_exhaust* │
└──────────────┬───────────────┘
│ query
▼
┌──────────────────────────────┐
│ Grafana SRE Dashboard │
└──────────────────────────────┘
Why Recording Rules?
- No server-side state — no in-memory rolling accumulators or snapshot files
- Survives server restarts — Prometheus retains all history
- Standard PromQL — budget calculations are transparent and auditable
- Alertable — recording rules can feed Alertmanager rules directly
Deploying the Rules
The recording rules file is at grafana/prometheus/rastir-sre-rules.yml. Mount it into Prometheus:
# docker-compose.yml (excerpt)
prometheus:
volumes:
- ./prometheus/rastir-sre-rules.yml:/etc/prometheus/rules/rastir-sre-rules.yaml
Ensure prometheus.yml includes the rules directory:
rule_files:
- /etc/prometheus/rules/*.yaml
After deploying, verify rules are loaded: http://localhost:9090/rules
Rule Groups
| Group | Interval | Description |
|---|---|---|
rastir_sre_weekly | 15s | 7-day rolling error/cost budgets, volume, exhaustion |
rastir_sre_monthly | 15s | 30-day rolling error/cost budgets, volume, exhaustion |
rastir_sre_burn_rates | 15s | 1h and 6h error burn rate windows |
Month-Boundary Scaling
All period-based rules use day_of_month() scaling to produce correct values early in the month. For example, on March 3 with a 7-day window:
increase(counter[7d]) × clamp_max(day_of_month / 7, 1)
This ensures that at the start of a new month (day 1–2), the increase(7d) value (which spans into the previous month) is scaled down proportionally.
Server Configuration
Enable SRE in rastir-server-config.yaml:
sre:
enabled: true
default_slo_error_rate: 0.01 # 1% error budget
default_cost_budget_usd: 25.0 # $25/period
agents:
my_agent:
slo_error_rate: 0.02 # agent-specific override
See Configuration — SRE for all options.
Structured Logging
Enable JSON-structured logs for production:
logging:
structured: true
level: INFO
Output:
{"timestamp": "2026-02-27 10:30:00", "level": "INFO", "logger": "rastir.server", "message": "Span batch ingested", "service": "my-app"}
Capacity & Performance
The collector is a single-process async FastAPI application. The push endpoint (POST /v1/telemetry) is the hot path — it validates the payload, enqueues the span batch, and returns 202. All heavy work (metrics derivation, trace storage, OTLP export) happens asynchronously in a background worker.
Benchmark Results (single uvicorn worker)
Tested with 10 spans per request (typical for one agent invocation):
| Push Rate | p50 Latency | p95 Latency | p99 Latency | Spans/min |
|---|---|---|---|---|
| 100 req/s | 1.3 ms | 3.4 ms | 6 ms | 60,000 |
| 500 req/s | 0.6 ms | 1.9 ms | 3.9 ms | 300,000 |
| 1,000 req/s | 5.9 ms | 45 ms | 523 ms | 600,000 |
| 2,000 req/s | 191 ms | 344 ms | 19 s | 1,200,000 |
The practical ceiling on a single worker is ~1,000 requests/sec (600,000 spans/min). Beyond that, latency degrades significantly.
Sizing Recommendations
| Setup | Comfortable Push Rate | Concurrent Agents (1 call/min) |
|---|---|---|
| 1 vCPU / 512 MB | 500 req/s | ~5,000 |
| 1 vCPU / 1 GB | 500 req/s | ~8,000 |
| 2 vCPU / 2 GB | 1,000 req/s (2 workers) | ~20,000 |
| 4 vCPU / 4 GB | 2,000 req/s (4 workers) | ~50,000 |
Scaling tips:
- Run multiple uvicorn workers (
--workers N) to scale linearly with CPU cores - The ingestion queue (default 50,000) absorbs traffic bursts — tune
max_queue_sizefor spike-heavy workloads - Rate limiting (
per_ip_rpm,per_service_rpm) protects against runaway clients - Memory is driven by queue depth and trace store retention — each queued span batch is ~1–5 KB