Collector Server

The Rastir collector server is a FastAPI application that receives span batches from client libraries, derives Prometheus metrics, stores traces in memory, and optionally forwards data via OTLP.


Running the Server

# Console script (installed with pip install rastir[server])
rastir-server

# Python module
python -m rastir.server

# Docker
docker run -p 8080:8080 rastir-server

Endpoints

Method Path Description
POST /v1/telemetry Ingest span batches
GET /metrics Prometheus metrics exposition
GET /v1/traces Query recent traces
GET /v1/traces/{trace_id} Get spans for a specific trace
GET /health Liveness probe
GET /ready Readiness probe

POST /v1/telemetry

Accepts JSON payloads with span batches:

{
  "service": "my-app",
  "env": "production",
  "version": "1.0.0",
  "spans": [
    {
      "name": "ask_llm",
      "span_type": "llm",
      "trace_id": "abc-123",
      "status": "OK",
      "duration_ms": 1234.5,
      "attributes": {
        "model": "gpt-4",
        "provider": "openai",
        "tokens_input": 150,
        "tokens_output": 300
      }
    }
  ]
}

Response: 202 Accepted with {"status": "accepted", "spans_received": N}

Error responses:

  • 400 — Invalid JSON or missing spans
  • 429 — Rate limited or queue full

GET /v1/traces

Query parameters:

  • trace_id — Look up a specific trace
  • service — Filter traces by service name
  • limit — Max results (default: 20)

GET /v1/traces/{trace_id}

Returns all spans for a specific trace by path parameter.

GET /ready

Returns 200 when healthy, 503 when degraded:

{
  "status": "ready",
  "queue_pct": 12.5
}

If unhealthy:

{
  "status": "not_ready",
  "queue_pct": 96.2,
  "reasons": ["queue_pct=96.2% >= hard_limit=95.0%"]
}

Prometheus Metrics

The server derives all Prometheus metrics from ingested span data and exposes them on the /metrics endpoint. See the Metrics Reference page for the complete list of counters, histograms, gauges, cardinality guards, error normalisation rules, exemplar support, and PromQL examples.


Sampling

Control which spans are stored/exported (metrics are always recorded):

sampling:
  rate: 0.1               # Keep 10% of spans

Backpressure

Configure queue-based flow control:

backpressure:
  soft_limit_pct: 80   # Warning threshold
  hard_limit_pct: 95   # Rejection/drop threshold
  mode: reject          # "reject" or "drop_oldest"

Rate Limiting

Optional per-IP and per-service rate limits:

rate_limit:
  enabled: true
  per_ip_rpm: 600       # Requests per minute per IP
  per_service_rpm: 3000 # Requests per minute per service

Multi-Tenant Mode

Inject a tenant label from HTTP headers:

multi_tenant:
  enabled: true
  header_name: X-Tenant-ID

OTLP Export

Forward spans to Jaeger, Tempo, or any OTLP-compatible backend:

exporter:
  otlp_endpoint: http://jaeger:4317
  batch_size: 200
  flush_interval: 5

Graceful Shutdown

shutdown:
  grace_period_seconds: 30
  drain_queue: true

The server drains the ingestion queue and flushes exporter buffers before exiting.


SRE — Prometheus Recording Rules

Rastir’s SRE layer uses a Prometheus-native approach: the server exposes static config gauges (rastir_slo_error_rate, rastir_cost_budget_usd) and Prometheus recording rules compute all derived SRE metrics from raw counters.

Architecture

Rastir Server                    Prometheus
┌──────────────────────┐        ┌──────────────────────────────┐
│ rastir_slo_error_rate │───────▸│ Recording Rules              │
│ rastir_cost_budget_usd│  scrape│ ├─ rastir:errors:week/month  │
│ rastir_llm_calls_total│       │ ├─ rastir:error_budget_*     │
│ rastir_errors_total   │       │ ├─ rastir:cost:week/month    │
│ rastir_cost_total     │       │ ├─ rastir:cost_budget_*      │
└──────────────────────┘        │ ├─ rastir:error_burn_rate:*  │
                                │ └─ rastir:*_days_to_exhaust* │
                                └──────────────┬───────────────┘
                                               │ query
                                               ▼
                                ┌──────────────────────────────┐
                                │ Grafana SRE Dashboard        │
                                └──────────────────────────────┘

Why Recording Rules?

  • No server-side state — no in-memory rolling accumulators or snapshot files
  • Survives server restarts — Prometheus retains all history
  • Standard PromQL — budget calculations are transparent and auditable
  • Alertable — recording rules can feed Alertmanager rules directly

Deploying the Rules

The recording rules file is at grafana/prometheus/rastir-sre-rules.yml. Mount it into Prometheus:

# docker-compose.yml (excerpt)
prometheus:
  volumes:
    - ./prometheus/rastir-sre-rules.yml:/etc/prometheus/rules/rastir-sre-rules.yaml

Ensure prometheus.yml includes the rules directory:

rule_files:
  - /etc/prometheus/rules/*.yaml

After deploying, verify rules are loaded: http://localhost:9090/rules

Rule Groups

Group Interval Description
rastir_sre_weekly 15s 7-day rolling error/cost budgets, volume, exhaustion
rastir_sre_monthly 15s 30-day rolling error/cost budgets, volume, exhaustion
rastir_sre_burn_rates 15s 1h and 6h error burn rate windows

Month-Boundary Scaling

All period-based rules use day_of_month() scaling to produce correct values early in the month. For example, on March 3 with a 7-day window:

increase(counter[7d]) × clamp_max(day_of_month / 7, 1)

This ensures that at the start of a new month (day 1–2), the increase(7d) value (which spans into the previous month) is scaled down proportionally.

Server Configuration

Enable SRE in rastir-server-config.yaml:

sre:
  enabled: true
  default_slo_error_rate: 0.01    # 1% error budget
  default_cost_budget_usd: 25.0   # $25/period
  agents:
    my_agent:
      slo_error_rate: 0.02        # agent-specific override

See Configuration — SRE for all options.


Structured Logging

Enable JSON-structured logs for production:

logging:
  structured: true
  level: INFO

Output:

{"timestamp": "2026-02-27 10:30:00", "level": "INFO", "logger": "rastir.server", "message": "Span batch ingested", "service": "my-app"}

Capacity & Performance

The collector is a single-process async FastAPI application. The push endpoint (POST /v1/telemetry) is the hot path — it validates the payload, enqueues the span batch, and returns 202. All heavy work (metrics derivation, trace storage, OTLP export) happens asynchronously in a background worker.

Benchmark Results (single uvicorn worker)

Tested with 10 spans per request (typical for one agent invocation):

Push Rate p50 Latency p95 Latency p99 Latency Spans/min
100 req/s 1.3 ms 3.4 ms 6 ms 60,000
500 req/s 0.6 ms 1.9 ms 3.9 ms 300,000
1,000 req/s 5.9 ms 45 ms 523 ms 600,000
2,000 req/s 191 ms 344 ms 19 s 1,200,000

The practical ceiling on a single worker is ~1,000 requests/sec (600,000 spans/min). Beyond that, latency degrades significantly.

Sizing Recommendations

Setup Comfortable Push Rate Concurrent Agents (1 call/min)
1 vCPU / 512 MB 500 req/s ~5,000
1 vCPU / 1 GB 500 req/s ~8,000
2 vCPU / 2 GB 1,000 req/s (2 workers) ~20,000
4 vCPU / 4 GB 2,000 req/s (4 workers) ~50,000

Scaling tips:

  • Run multiple uvicorn workers (--workers N) to scale linearly with CPU cores
  • The ingestion queue (default 50,000) absorbs traffic bursts — tune max_queue_size for spike-heavy workloads
  • Rate limiting (per_ip_rpm, per_service_rpm) protects against runaway clients
  • Memory is driven by queue depth and trace store retention — each queued span batch is ~1–5 KB

Rastir — LLM & Agent Observability Library