Configuration Reference
Rastir has two configuration surfaces: the client library (used in your application) and the collector server. This page covers all configuration options, environment variables, and YAML config.
Client Configuration
configure()
from rastir import configure
configure(
service="my-app",
env="production",
version="1.0.0",
push_url="http://localhost:8080",
api_key="secret",
batch_size=100,
flush_interval=5,
timeout=5,
max_retries=3,
)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
service | str | "unknown" | Service name (global label) |
env | str | "development" | Environment name (global label) |
version | str | None | Application version |
push_url | str | None | Collector server URL. If unset, spans are queued locally |
api_key | str | None | API key for the collector (sent as X-API-Key header) |
batch_size | int | 100 | Spans per batch in the background exporter |
flush_interval | int | 5 | Seconds between background flushes |
timeout | int | 5 | HTTP request timeout in seconds |
max_retries | int | 3 | Retries on transient failures (5xx, 429, connection errors) |
retry_backoff | float | 0.5 | Initial backoff in seconds (doubles each retry) |
shutdown_timeout | float | 5.0 | Max seconds to wait for exporter thread on shutdown |
evaluation_enabled | bool | False | Enable evaluation metadata capture on @llm spans |
evaluation_types | list[str] | None | Evaluation types to request (e.g. ["relevance", "faithfulness"]) |
capture_prompt | bool | True | Capture prompt_text attribute in LLM spans |
capture_completion | bool | True | Capture completion_text attribute in LLM spans |
enable_cost_calculation | bool | False | Enable client-side cost calculation on @llm spans |
pricing_profile | str | "default" | Label identifying the pricing configuration used |
pricing_source | str | None | Path to pricing JSON file |
max_cost_per_call_alert | float | None | Per-call cost threshold for warning logs |
enable_ttft | bool | True | Enable Time-To-First-Token measurement on streaming spans |
Client Environment Variables
All client settings can be set via environment variables with the RASTIR_ prefix.
Precedence: configure() arguments > environment variables > defaults.
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVICE | "unknown" | Logical service name attached to all spans and metrics |
RASTIR_ENV | "development" | Deployment environment (e.g. production, staging) |
RASTIR_VERSION | — | Application version string |
RASTIR_PUSH_URL | — | Collector server URL (e.g. http://localhost:8080). Push disabled if unset |
RASTIR_API_KEY | — | Authentication key sent as X-API-Key header to the collector |
RASTIR_BATCH_SIZE | 100 | Max spans per push batch |
RASTIR_FLUSH_INTERVAL | 5 | Seconds between background batch flushes |
RASTIR_TIMEOUT | 5 | HTTP request timeout in seconds |
RASTIR_MAX_RETRIES | 3 | Max retry attempts on transient failures (5xx, 429, connection errors) |
RASTIR_RETRY_BACKOFF | 0.5 | Initial backoff in seconds (doubles each retry) |
RASTIR_SHUTDOWN_TIMEOUT | 5.0 | Max seconds to wait for exporter thread on process shutdown |
RASTIR_EVALUATION_ENABLED | false | Enable evaluation metadata capture on @llm spans |
RASTIR_CAPTURE_PROMPT | true | Capture prompt_text attribute in LLM spans |
RASTIR_CAPTURE_COMPLETION | true | Capture completion_text attribute in LLM spans |
RASTIR_ENABLE_COST_CALCULATION | false | Enable client-side cost calculation on @llm spans |
RASTIR_PRICING_PROFILE | "default" | Label identifying the pricing configuration used |
RASTIR_PRICING_SOURCE | — | Path to pricing JSON file |
RASTIR_PRICING_DATA | — | Inline pricing JSON string (alternative to file) |
RASTIR_MAX_COST_PER_CALL_ALERT | — | Per-call cost threshold in USD for warning logs |
RASTIR_ENABLE_TTFT | true | Enable Time-To-First-Token measurement on streaming spans |
RASTIR_EVALUATION_TYPES | — | Comma-separated evaluation types (e.g. relevance,faithfulness) |
Example
export RASTIR_SERVICE=my-app
export RASTIR_ENV=production
export RASTIR_PUSH_URL=http://collector:8080/v1/telemetry
export RASTIR_API_KEY=secret-key
export RASTIR_BATCH_SIZE=200
export RASTIR_EVALUATION_ENABLED=true
export RASTIR_CAPTURE_PROMPT=false # disable prompt capture in production
export RASTIR_ENABLE_COST_CALCULATION=true
export RASTIR_PRICING_PROFILE=production_2025_q1
export RASTIR_PRICING_SOURCE=/etc/rastir/pricing.json
configure()can only be called once per process. After initialization, configuration is frozen and immutable. Calling it again raisesRuntimeError: rastir.configure() has already been called. This is by design — call it at application startup before any decorated functions run.
Server Configuration
The server loads configuration from three sources (in order of precedence):
- Environment variables (
RASTIR_SERVER_*) - YAML config file (path via
RASTIR_SERVER_CONFIGenv var) - Defaults
YAML Config File
# rastir-server.yml
server:
host: 0.0.0.0
port: 8080
limits:
max_traces: 10000
max_queue_size: 50000
max_span_attributes: 100
max_label_value_length: 128
cardinality_model: 50
cardinality_provider: 10
cardinality_tool_name: 200
cardinality_agent: 200
cardinality_error_type: 50
histograms:
duration_buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0, 30.0, 60.0]
tokens_buckets: [10, 50, 100, 250, 500, 1000, 2000, 4000, 8000, 16000, 32000]
trace_store:
enabled: true
max_spans_per_trace: 500
ttl_seconds: 0 # 0 = no expiration
exporter:
otlp_endpoint: null # Set to enable OTLP forwarding
batch_size: 200
flush_interval: 5
multi_tenant:
enabled: false
header_name: X-Tenant-ID
sampling:
rate: 1.0 # 0.0–1.0 probabilistic sampling rate
backpressure:
soft_limit_pct: 80.0
hard_limit_pct: 95.0
mode: reject # "reject" or "drop_oldest"
rate_limit:
enabled: false
per_ip_rpm: 600
per_service_rpm: 3000
exemplars:
enabled: false
redaction:
enabled: false
max_text_length: 50000
drop_on_failure: true
evaluation:
enabled: false
queue_size: 10000
drop_policy: drop_new
worker_concurrency: 4
default_sample_rate: 1.0
default_timeout_ms: 30000
max_evaluation_types: 20
judge_model: gpt-4o-mini
judge_provider: openai
shutdown:
grace_period_seconds: 30
drain_queue: true
logging:
structured: false
level: INFO
sre:
enabled: true
default_slo_error_rate: 0.01 # 1% error budget
default_cost_budget_usd: 25.0 # $25/period
agents:
my_agent:
slo_error_rate: 0.02 # 2% for this agent
critical_agent:
slo_error_rate: 0.005 # stricter 0.5%
cost_budget_usd: 50.0 # per-agent override
Server Environment Variables
All server settings can be overridden via RASTIR_SERVER_* environment variables.
Precedence: Environment variables > YAML config file > defaults.
Core
Network binding for the FastAPI collector server. The server exposes /v1/telemetry (span ingestion), /metrics (Prometheus scrape), /v1/traces (trace query), /health, and /ready.
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVER_CONFIG | — | Path to YAML config file. All settings below can also be set in this file. Env vars always take precedence over YAML values |
RASTIR_SERVER_HOST | 0.0.0.0 | Server bind address. Use 0.0.0.0 in containers, 127.0.0.1 for local-only access |
RASTIR_SERVER_PORT | 8080 | Server bind port. Prometheus scrapes this port at /metrics, and clients push spans to /v1/telemetry on this port |
Resource Limits
Guardrails that prevent the server from consuming unbounded memory or creating Prometheus cardinality explosions. Cardinality caps limit how many distinct values a Prometheus label can have — once the cap is reached, new values are replaced with __other__. This protects Prometheus from high-cardinality time series that degrade query performance.
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVER_LIMITS_MAX_TRACES | 10000 | Maximum number of traces held in the in-memory trace store. When exceeded, oldest traces are evicted (FIFO). Only relevant when TRACE_STORE_ENABLED=true |
RASTIR_SERVER_LIMITS_MAX_QUEUE_SIZE | 50000 | Maximum number of spans that can be buffered in the ingestion queue between HTTP receipt and processing. Controls memory usage — see Backpressure for what happens when the queue fills |
RASTIR_SERVER_LIMITS_MAX_SPAN_ATTRIBUTES | 100 | Maximum number of key-value attributes retained per span. Excess attributes are silently dropped |
RASTIR_SERVER_LIMITS_MAX_LABEL_VALUE_LENGTH | 128 | Maximum character length for any Prometheus label value. Longer values are truncated. Prevents excessively long model names or agent names from inflating metric storage |
RASTIR_SERVER_LIMITS_CARDINALITY_MODEL | 50 | Max distinct model label values (e.g. gpt-4o, claude-3). Increase if you use many fine-tuned model variants |
RASTIR_SERVER_LIMITS_CARDINALITY_PROVIDER | 10 | Max distinct provider label values (e.g. openai, anthropic). 10 covers all built-in adapters |
RASTIR_SERVER_LIMITS_CARDINALITY_TOOL_NAME | 200 | Max distinct tool_name label values. Increase if your agents use many dynamically-named tools |
RASTIR_SERVER_LIMITS_CARDINALITY_AGENT | 200 | Max distinct agent label values. Increase if you run hundreds of uniquely-named agents |
RASTIR_SERVER_LIMITS_CARDINALITY_ERROR_TYPE | 50 | Max distinct error_type label values. Rastir normalises errors to 6 categories, so 50 is generous |
Histogram Buckets
Customise the Prometheus histogram bucket boundaries for latency and token count distributions. The default buckets work well for most LLM workloads. Only change these if your latency or token distributions are unusual (e.g. very long batch jobs, or very small embeddings-only calls).
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVER_HISTOGRAMS_DURATION_BUCKETS | 0.01,0.05,0.1,0.25,0.5,1.0,2.0,5.0,10.0,30.0,60.0 | Comma-separated duration bucket boundaries in seconds. Used by rastir_duration_seconds histogram. The Grafana dashboards compute p50/p95/p99 latency from these buckets |
RASTIR_SERVER_HISTOGRAMS_TOKENS_BUCKETS | 10,50,100,250,500,1000,2000,4000,8000,16000,32000 | Comma-separated token count bucket boundaries. Used by rastir_tokens_input and rastir_tokens_output histograms |
Trace Store
An in-memory ring buffer that holds recent traces, queryable via GET /v1/traces. This is a lightweight debug tool — useful for curl http://localhost:8080/v1/traces to inspect recent spans without needing a full trace backend. It is not Tempo/OTLP-compatible and cannot be used as a Grafana datasource. For production trace visualization, use the OTLP exporter to forward traces to Tempo, Jaeger, X-Ray, etc.
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVER_TRACE_STORE_ENABLED | true | Enable the in-memory trace store and /v1/traces query endpoint. Disable in production to save memory if you are forwarding traces via OTLP |
RASTIR_SERVER_TRACE_STORE_MAX_SPANS_PER_TRACE | 500 | Maximum spans retained per trace. Traces with more spans (e.g. large agent graphs) have their oldest spans dropped |
RASTIR_SERVER_TRACE_STORE_TTL_SECONDS | 0 | Time-to-live for traces in seconds. 0 = no expiration (traces are only evicted when max_traces is exceeded). Set to e.g. 3600 to auto-expire traces after 1 hour |
OTLP Export
Forwards processed spans from the Rastir server to an external trace backend via the OpenTelemetry Protocol (OTLP). This is the production trace pipeline — use it to send traces to Tempo, Jaeger, or any OTLP-compatible receiver. In cloud deployments, this typically points to a local OTel Collector sidecar (e.g. ADOT on AWS) which then forwards to the cloud trace service (X-Ray, Cloud Trace, etc.).
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVER_EXPORTER_OTLP_ENDPOINT | — | OTLP HTTP endpoint URL (e.g. http://tempo:4318, http://localhost:4318 for a sidecar). Export is disabled when unset. The server posts to {endpoint}/v1/traces |
RASTIR_SERVER_EXPORTER_BATCH_SIZE | 200 | Number of spans accumulated before sending an OTLP export batch. Larger values reduce HTTP overhead but increase latency to the trace backend |
RASTIR_SERVER_EXPORTER_FLUSH_INTERVAL | 5 | Maximum seconds to wait before flushing a partial batch. Ensures spans reach the trace backend even under low throughput |
Multi-Tenant
When enabled, the server extracts a tenant identifier from an HTTP header on each request and adds it as a Prometheus label. This allows a single Rastir instance to serve multiple teams or applications with per-tenant metric isolation.
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVER_MULTI_TENANT_ENABLED | false | Enable multi-tenant label extraction. When enabled, every Prometheus metric gets an additional tenant label |
RASTIR_SERVER_MULTI_TENANT_HEADER_NAME | X-Tenant-ID | HTTP header name to read the tenant identifier from. Clients must include this header on every /v1/telemetry request |
Sampling
Controls probabilistic trace sampling on the server. Sampling affects trace storage, OTLP export, exemplars, and evaluation enqueue. It does not affect Prometheus metrics — all spans always contribute to counters, histograms, and gauges regardless of sampling. This means you can sample down to reduce storage/export costs while keeping 100% accurate metrics.
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVER_SAMPLING_RATE | 1.0 | Probability that a trace is stored/exported (0.0–1.0). 1.0 = keep all traces, 0.1 = keep 10%. Set lower in high-throughput production to control Tempo/Jaeger storage costs while retaining full metric accuracy |
Backpressure
Safety valve for the ingestion queue. The server has a bounded queue (LIMITS_MAX_QUEUE_SIZE) between HTTP span receipt and the worker that processes spans (metrics, store, OTLP export). When clients send spans faster than the server can process them, the queue grows. Backpressure defines what happens:
- Soft limit — queue reaches this % → server logs a warning and exposes a metric. No spans are dropped yet. Use this as an early alert.
- Hard limit — queue reaches this % → server takes action based on the mode setting to prevent out-of-memory.
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVER_BACKPRESSURE_SOFT_LIMIT_PCT | 80.0 | Queue usage percentage that triggers warning logs and a rastir_backpressure_soft_limit_reached metric. Set up a Grafana alert on this |
RASTIR_SERVER_BACKPRESSURE_HARD_LIMIT_PCT | 95.0 | Queue usage percentage that activates the backpressure mode. Must be greater than soft_limit_pct |
RASTIR_SERVER_BACKPRESSURE_MODE | reject | What to do when the hard limit is hit: reject — drop new incoming spans and return HTTP 429/503 to the client (protects server memory); drop_oldest — evict oldest spans from the head of the queue to make room (prioritises recency over completeness) |
Rate Limiting
Optional request-level rate limiting to protect the server from misbehaving clients. This is separate from backpressure (which operates on queue depth). Rate limiting operates at the HTTP layer before spans enter the queue.
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVER_RATE_LIMIT_ENABLED | false | Enable rate limiting. When disabled, no rate checks are performed |
RASTIR_SERVER_RATE_LIMIT_PER_IP_RPM | 600 | Maximum requests per minute from a single client IP address. Protects against a single runaway client flooding the server |
RASTIR_SERVER_RATE_LIMIT_PER_SERVICE_RPM | 3000 | Maximum requests per minute from a single service (identified by the service field in the telemetry payload). Prevents one noisy service from starving others |
Exemplars
Prometheus exemplars attach a trace_id to individual histogram observations, allowing you to jump from a Grafana metric panel directly to the specific trace that caused a latency spike or error. Requires Prometheus ≥ 2.39 with --enable-feature=exemplar-storage and Grafana’s Tempo datasource configured.
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVER_EXEMPLARS_ENABLED | false | Attach trace_id exemplars to duration and token histograms. Enable this when you have a Tempo/Jaeger backend configured, so Grafana can link metrics → traces |
Redaction
Server-side PII/sensitive data redaction for prompt_text and completion_text span attributes. Redaction runs after sampling but before trace storage, OTLP export, and evaluation enqueue — ensuring sensitive data never leaves the server. Built-in patterns detect common PII (SSNs, credit cards, emails, etc.). You can add custom patterns via JSON.
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVER_REDACTION_ENABLED | false | Enable server-side redaction. When disabled, prompt_text and completion_text are stored/exported as-is |
RASTIR_SERVER_REDACTION_MAX_TEXT_LENGTH | 50000 | Maximum character length for prompt/completion text. Text exceeding this is truncated before redaction to bound CPU cost |
RASTIR_SERVER_REDACTION_DROP_ON_FAILURE | true | If redaction processing fails (e.g. regex timeout), drop the entire span rather than risk leaking unredacted data. Security-first default — set false only if availability matters more than data privacy |
RASTIR_SERVER_REDACTION_CUSTOM_PATTERNS_JSON | — | JSON array of custom regex patterns. Format: [{"pattern": "\\b\\d{3}-\\d{2}-\\d{4}\\b", "replacement": "[SSN]"}]. Each matched pattern is replaced with its replacement string. Patterns run in order after built-in redaction |
Evaluation
Async server-side LLM-as-a-judge evaluation. When enabled, the server uses a separate LLM (the “judge”) to evaluate the quality of LLM responses — checking for hallucination, relevance, toxicity, etc. Evaluation runs asynchronously in worker threads after the span is stored/exported, so it does not block ingestion.
How it works: The client-side @llm decorator captures prompt_text, completion_text, and evaluation_types in the span. The server picks up these spans, applies evaluation sampling, then sends prompt+completion to the judge LLM for each evaluation type. Results are emitted as new evaluation spans with scores.
Cost note: Evaluation calls the judge LLM for every sampled span × every evaluation type. With high throughput, this can be expensive. Use DEFAULT_SAMPLE_RATE to control cost — e.g. 0.1 evaluates only 10% of eligible spans. This is independent of trace sampling (SAMPLING_RATE), which controls storage/export. Both rates stack: with SAMPLING_RATE=0.5 and DEFAULT_SAMPLE_RATE=0.5, only ~25% of LLM spans are evaluated.
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVER_EVALUATION_ENABLED | false | Enable async evaluation. Requires a judge LLM to be configured below |
RASTIR_SERVER_EVALUATION_QUEUE_SIZE | 10000 | Bounded queue capacity for spans awaiting evaluation. Sized independently from the ingestion queue |
RASTIR_SERVER_EVALUATION_DROP_POLICY | drop_new | What happens when the evaluation queue is full: drop_new — discard newly arriving spans (safe default); drop_oldest — evict oldest queued spans |
RASTIR_SERVER_EVALUATION_WORKER_CONCURRENCY | 4 | Number of concurrent worker threads making judge LLM API calls. Higher values = faster evaluation throughput but more API cost |
RASTIR_SERVER_EVALUATION_DEFAULT_SAMPLE_RATE | 1.0 | Probability that a sampled span is also evaluated (0.0–1.0). Applies after trace sampling. Per-span evaluation_sample_rate attribute (set by client decorator) overrides this |
RASTIR_SERVER_EVALUATION_DEFAULT_TIMEOUT_MS | 30000 | Timeout for each judge LLM API call in milliseconds. Timed-out evaluations are recorded as failures |
RASTIR_SERVER_EVALUATION_MAX_EVALUATION_TYPES | 20 | Cardinality cap for evaluation_type metric label. Prevents unbounded metric growth from dynamically-named evaluation types |
RASTIR_SERVER_EVALUATION_JUDGE_MODEL | gpt-4o-mini | LLM model used as the evaluation judge. Use a fast, cheap model for cost efficiency |
RASTIR_SERVER_EVALUATION_JUDGE_PROVIDER | openai | Provider for the judge model (openai, anthropic, gemini, bedrock, etc.) |
RASTIR_SERVER_EVALUATION_JUDGE_API_KEY | — | API key for the judge LLM provider. Required unless using IAM-based auth (e.g. Bedrock) |
RASTIR_SERVER_EVALUATION_JUDGE_BASE_URL | — | Custom base URL for the judge LLM API (e.g. Azure OpenAI endpoint, or a local proxy) |
Evaluation types (e.g.
hallucination,relevance,toxicity) are configured client-side, not on the server. Set them per-decorator (@llm(evaluation_types=[...])) or globally viaconfigure(evaluation_types=[...])/RASTIR_EVALUATION_TYPES=hallucination,relevance. The server evaluates whatever types each span requests.
Shutdown
Graceful shutdown behaviour when the server receives SIGTERM (e.g. during ECS task stop, Kubernetes pod termination, or docker stop). The server stops accepting new requests and optionally drains in-flight spans before exiting.
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVER_SHUTDOWN_GRACE_PERIOD_SECONDS | 30 | Maximum seconds to wait during graceful shutdown. Should be less than the container orchestrator’s stop timeout (ECS default: 30s, Kubernetes default: 30s) |
RASTIR_SERVER_SHUTDOWN_DRAIN_QUEUE | true | Process remaining spans in the ingestion queue before shutdown. Set false for faster shutdowns at the cost of losing in-flight spans |
Logging
Server log output configuration. In containerised deployments (ECS, Kubernetes), use structured JSON logging so log aggregators (CloudWatch, Loki, etc.) can parse fields automatically.
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVER_LOGGING_STRUCTURED | false | Enable JSON structured logging. Recommended true for Docker/ECS/Kubernetes. Plain text is easier to read for local development |
RASTIR_SERVER_LOGGING_LEVEL | INFO | Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL. Use DEBUG to see per-span processing steps (very verbose) |
RASTIR_SERVER_LOGGING_LOG_FILE | — | Path to an additional log file. Logs always go to stderr; this mirrors them to a file for debugging. Not typically needed in containers where logs go to stdout/stderr |
SRE
SRE (Site Reliability Engineering) configuration for error budgets, cost budgets, and burn rate tracking. When enabled, the server exposes config gauge metrics that Prometheus recording rules consume to compute derived SRE metrics (budget remaining, burn rate, days-to-exhaustion).
| Variable | Default | Description |
|---|---|---|
RASTIR_SERVER_SRE_ENABLED | false | Enable SRE config gauges. When enabled, rastir_slo_error_rate and rastir_cost_budget_usd gauges are registered and populated at startup |
RASTIR_SERVER_SRE_DEFAULT_SLO_ERROR_RATE | 0.01 | Default error rate SLO for agents without a per-agent override. 0.01 = 1% error budget — if more than 1% of calls fail, the error budget is consumed |
RASTIR_SERVER_SRE_DEFAULT_COST_BUDGET_USD | 0.0 | Default cost budget in USD per Prometheus evaluation period. 0 = cost budget tracking disabled. Set to e.g. 500.0 to track cost consumption against a $500 budget |
RASTIR_SERVER_SRE_AGENTS_JSON | — | JSON object for per-agent SLO and cost budget overrides. Format: {"my_agent": {"slo_error_rate": 0.02, "cost_budget_usd": 100.0}}. Agents not listed here use the defaults above |
Per-agent overrides can also be set in the YAML config file under sre.agents (see YAML example above).
When enabled, the server exposes two Prometheus Gauge metrics at startup:
| Gauge | Labels | Description |
|---|---|---|
rastir_slo_error_rate | agent | Configured SLO error rate per agent |
rastir_cost_budget_usd | agent | Configured cost budget in USD per agent |
These gauges are consumed by Prometheus recording rules (see Server — SRE Recording Rules) to derive error budgets, burn rates, cost budgets, and days-to-exhaustion metrics.
Testing & Script Variables
These variables are used by integration tests and load-testing scripts. They are not needed for normal Rastir operation.
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY | — | OpenAI API key for integration tests |
API_OPENAI_KEY | — | Fallback OpenAI API key (checked if OPENAI_API_KEY is unset) |
ANTHROPIC_API_KEY | — | Anthropic API key for integration tests |
API_ANTHROPIC_KEY | — | Fallback Anthropic API key |
LOAD_ROUNDS | 12 | Number of rounds for load test scripts |
ROUND_PAUSE | 6 | Pause between load test rounds in seconds |
BEDROCK_GUARDRAIL_ID | i3rttxfu7kow | AWS Bedrock guardrail ID for load tests |
Startup Validation
The server validates configuration at startup and refuses to start if:
- Histogram bucket count exceeds 20
- Histogram buckets contain non-positive or unsorted values
- Queue size exceeds 1,000,000 or is non-positive
- Max traces exceeds 500,000 or is non-positive
- Label value length exceeds 1,024 or is non-positive
- Cardinality caps are non-positive
- Sampling rate is outside 0.0–1.0
- Backpressure soft_limit >= hard_limit
- Rate limit RPM values are non-positive
- Max spans per trace is non-positive
- TTL seconds is negative
- Shutdown grace period is negative
- Logging level is not a valid Python log level
- SRE default_slo_error_rate is outside (0.0, 1.0]
- SRE default_cost_budget_usd is negative
- Per-agent SLO error rates are outside (0.0, 1.0]