LlamaIndex Integration
Rastir provides @llamaindex_agent — a single decorator that instruments LlamaIndex agent workflows. It auto-discovers and wraps the agent’s LLM and tools for per-call tracing — tokens, cost, model, provider, input/output — with no code changes inside your agents.
Tip: You can also use
@framework_agentwhich auto-detects LlamaIndex agents from function arguments. The dedicated@llamaindex_agentdecorator is still available for explicit control.
Quick Start
from rastir import configure, llamaindex_agent
from llama_index.llms.openai import OpenAI
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool
configure(service="my-app", push_url="http://localhost:8080")
def add(a: int, b: int) -> int:
"""Add two numbers."""
return a + b
llm = OpenAI(model="gpt-4o-mini")
tools = [FunctionTool.from_defaults(fn=add)]
agent = ReActAgent(llm=llm, tools=tools, streaming=False)
@llamaindex_agent(agent_name="calc_agent")
async def run(agent, query):
return await agent.run(query)
result = asyncio.run(run(agent, "What is 3 + 5?"))
This produces:
calc_agent (AGENT)
├── llamaindex.ReActAgent.llm.achat (LLM) — model, provider, tokens, cost, input
├── add.acall (TOOL) — tool.input, tool.output
├── llamaindex.ReActAgent.llm.achat (LLM) — subsequent calls
└── llamaindex.ReActAgent.llm.achat (LLM) — output on final response
Why a Dedicated Decorator?
LlamaIndex controls the agent loop internally — your code calls agent.run() or agent.chat() and LlamaIndex manages all LLM calls, tool invocations, and reasoning steps inside. @llamaindex_agent wraps the agent’s LLM and tools before execution begins, and restores originals after.
API Reference
llamaindex_agent()
from rastir import llamaindex_agent
@llamaindex_agent
def run(agent): ...
@llamaindex_agent(agent_name="my_agent")
async def run(agent): ...
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
agent_name | str | Function name | Name for the outer agent span |
MCP tools: LlamaIndex handles MCP via llama-index-tools-mcp — MCP tools become regular FunctionTool objects, auto-discovered and wrapped like any other tool.
Supports:
- Bare usage (
@llamaindex_agent) and parameterized (@llamaindex_agent(...)) - Sync and async functions
- Agent passed as positional or keyword argument
Recognised Agent Types
The decorator auto-discovers these LlamaIndex agent classes (and subclasses via MRO):
ReActAgentFunctionAgentOpenAIAgentFunctionCallingAgentStructuredPlannerAgentAgentRunnerBaseAgent
Detection uses class name + module path (llama_index in module), including MRO walking for subclasses.
What Gets Wrapped
LLMs
Each agent’s llm attribute (via ._llm or .llm) is wrapped with a transparent proxy:
| Attribute | Value |
|---|---|
| Span name | llamaindex.<AgentClass>.llm.<method> (e.g., llamaindex.ReActAgent.llm.achat) |
| Span type | LLM |
| Methods wrapped | chat, achat, complete, acomplete, stream_chat, astream_chat, stream_complete, astream_complete, chat_with_tools, achat_with_tools, stream_chat_with_tools, astream_chat_with_tools |
LLM span attributes captured:
| Attribute | Source | Example |
|---|---|---|
model | Extracted from the underlying provider response | gpt-4o-mini-2024-07-18 |
provider | Auto-detected from LLM module path (e.g. llama_index.llms.openai → openai) | openai |
tokens_input | Extracted from the provider response’s usage object | 634 |
tokens_output | Extracted from the provider response’s usage object | 45 |
cost_usd | Calculated from tokens × pricing registry rates | 0.000122 |
input | Messages passed to the LLM — messages, user_msg, or chat_history kwargs | system: You are designed to help... |
output | Response text, or tool_call: func(args) for tool-calling responses | tool_call: add({"a": 3, "b": 5}) |
agent | Inherited from @llamaindex_agent span | calc_agent |
Tools
Each agent’s tools (via ._tools or .tools) are wrapped with a transparent proxy:
| Attribute | Value |
|---|---|
| Span name | <tool_name>.acall (e.g., add.acall, get_weather.acall) |
| Span type | TOOL |
| Methods wrapped | call, __call__, acall |
Tool span attributes captured:
| Attribute | Source | Example |
|---|---|---|
tool.input | Keyword arguments passed to .acall(**kwargs) | {'a': 3, 'b': 5} |
tool.output | Return value from the tool function | 8 |
agent | Inherited from @llamaindex_agent span | calc_agent |
Skip Already-Wrapped Objects
- LLMs with
_rastir_wrapped = Trueare not re-wrapped - Tools with
_rastir_wrapped = Trueare not re-wrapped
MCP Tool Tracing
How MCP Tools Work in LlamaIndex
LlamaIndex uses llama-index-tools-mcp to connect to MCP servers. McpToolSpec.to_tool_list_async() converts MCP tools into regular FunctionTool objects — the decorator wraps them like any other tool.
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec
mcp_client = BasicMCPClient("http://localhost:8080/mcp")
mcp_tool_spec = McpToolSpec(client=mcp_client)
mcp_tools = await mcp_tool_spec.to_tool_list_async()
agent = ReActAgent(llm=llm, tools=mcp_tools, streaming=False)
@llamaindex_agent(agent_name="mcp_agent")
async def run(agent, query):
return await agent.run(query)
Trace Propagation to MCP Servers
@llamaindex_agent auto-discovers BasicMCPClient instances in agent tools and injects the traceparent header, enabling end-to-end distributed tracing with a single trace_id.
This produces a fully linked trace:
mcp_agent (AGENT)
├── llamaindex.ReActAgent.llm.achat (LLM)
├── mcpserver:get_weather (TOOL) ← server span (same trace)
│ └── get_weather.acall (TOOL) ← client tool span
├── llamaindex.ReActAgent.llm.achat (LLM) — final answer
Agent Types
ReActAgent
Uses a Thought → Action → Observation loop. Calls llm.achat() for each reasoning step.
agent = ReActAgent(llm=llm, tools=tools, streaming=False)
Note: Set streaming=False — streaming uses astream_chat which returns an async generator, and token extraction from streams requires additional handling.
FunctionAgent
Uses the LLM’s native function/tool calling capability. Calls llm.achat_with_tools().
from llama_index.core.agent import FunctionAgent
agent = FunctionAgent(llm=llm, tools=tools)
Both agent types are fully supported — the decorator wraps all relevant LLM methods.
Coding Patterns
Pattern 1: ReActAgent with local tools (most common)
agent = ReActAgent(llm=llm, tools=tools, streaming=False)
@llamaindex_agent(agent_name="calc_agent")
async def run(agent, query):
return await agent.run(query)
Pattern 2: FunctionAgent with MCP tools
mcp_tools = await mcp_tool_spec.to_tool_list_async()
agent = FunctionAgent(llm=llm, tools=mcp_tools)
@llamaindex_agent(agent_name="mcp_agent")
async def run(agent, query):
return await agent.run(query)
Pattern 3: Bare decorator (name defaults to function name)
@llamaindex_agent
async def research_pipeline(agent, query):
return await agent.run(query)
# Agent span name will be "research_pipeline"
Pattern 4: Cost tracking with pricing registry
from rastir import configure
from rastir.config import get_pricing_registry
configure(service="my-app", push_url="...", enable_cost_calculation=True)
pr = get_pricing_registry()
pr.register("openai", "gpt-4o-mini", input_price=0.15, output_price=0.60)
# Also register the dated variant that OpenAI returns
pr.register("openai", "gpt-4o-mini-2024-07-18", input_price=0.15, output_price=0.60)
@llamaindex_agent(agent_name="my_agent")
async def run(agent, query):
return await agent.run(query)
# Each LLM span will now include cost_usd
Pattern 5: Agent reuse across calls
@llamaindex_agent(agent_name="reusable")
async def run(agent, query):
return await agent.run(query)
# Safe to call multiple times — originals restored after each call
result1 = await run(agent, "Hello")
result2 = await run(agent, "World")
Restore After Execution
After the agent run completes (success or error), @llamaindex_agent restores:
- Original LLM on the agent (
._llmor.llm) - Original tools list on the agent (
._toolsor.tools)
This means the agent can be safely reused across multiple calls with no accumulated wrapping.
Error Handling
If the decorated function raises an exception:
- The agent span records the error (type + message)
- Span status is set to
ERROR - The exception is re-raised unchanged
- Originals are still restored (via
finallyblock)
Span Hierarchy
A typical LlamaIndex trace looks like this:
@llamaindex_agent agent span
│
├── llamaindex.ReActAgent.llm.achat (LLM) — model=gpt-4o-mini, tokens, cost
│ input=messages, output=tool_call
├── add.acall (TOOL) — input={'a': 3, 'b': 5}, output=8
├── llamaindex.ReActAgent.llm.achat (LLM) — tool result fed back to LLM
│ output=tool_call: multiply(...)
├── multiply.acall (TOOL) — input={'a': 8, 'b': 2}, output=16
├── llamaindex.ReActAgent.llm.achat (LLM) — final answer, has text output
With MCP tools:
@llamaindex_agent agent span
│
├── llamaindex.FunctionAgent.llm.achat_with_tools (LLM)
├── mcpserver:get_population (TOOL) — server span (via traceparent)
│ └── get_population.acall (TOOL) — client tool execution
├── llamaindex.FunctionAgent.llm.achat_with_tools (LLM) — final answer
All child spans inherit the agent label, so Prometheus metrics are grouped by agent.
Span Attributes in Tempo
Here’s what you’ll see in Tempo/Grafana for each span type:
Agent span
| Attribute | Example |
|---|---|
rastir.span_type | agent |
rastir.agent_name | calc_agent |
LLM span
| Attribute | Example |
|---|---|
rastir.span_type | llm |
rastir.model | gpt-4o-mini-2024-07-18 |
rastir.provider | openai |
rastir.tokens_input | 634 |
rastir.tokens_output | 45 |
rastir.cost_usd | 0.000122 |
rastir.input | system: You are designed to help... |
rastir.output | tool_call: add({"a": 3, "b": 5}) or The answer is 8 |
rastir.agent | calc_agent |
Tool span
| Attribute | Example |
|---|---|
rastir.span_type | tool |
rastir.tool.input | {'a': 3, 'b': 5} |
rastir.tool.output | 8 |
rastir.agent | calc_agent |
Prometheus Metrics Produced
| Metric | Source |
|---|---|
rastir_llm_calls_total{model, provider, agent} | Wrapped LLM method calls |
rastir_tokens_input_total{model, provider, agent} | Token extraction from provider response |
rastir_tokens_output_total{model, provider, agent} | Token extraction from provider response |
rastir_cost_total{model, provider, agent} | Cost calculation from pricing registry |
rastir_duration_seconds{span_type="llm"} | LLM call latency |
rastir_tool_calls_total{tool_name, agent} | Tool .acall() invocation |
rastir_duration_seconds{span_type="tool"} | Tool invocation latency |
rastir_duration_seconds{span_type="agent"} | Entire agent execution latency |
Technical Notes
LlamaIndex Token Extraction
LlamaIndex wraps provider responses in its own ChatResponse schema. The ChatResponse.message.raw field holds the original provider response (e.g., OpenAI ChatCompletion). The LlamaIndex adapter unwraps this and delegates to the provider-specific adapter (OpenAI, Anthropic, etc.) for token extraction.
LlamaIndex Output Extraction
- Text responses: Extracted from
ChatResponse.message.content - Tool-calling responses: When the LLM returns tool calls with no text content, Rastir extracts tool call blocks from
message.blocks(forToolCallBlock) ormessage.additional_kwargs["tool_calls"], formatting them astool_call: func_name(args)
LlamaIndex Input Extraction
- ReActAgent: Passes
messages=kwarg tollm.achat(messages=[...]) - FunctionAgent: Passes
chat_history=kwarg tollm.achat_with_tools(chat_history=[...]) - Rastir captures both patterns via
_capture_llm_input()
Pydantic Compatibility
LlamaIndex agents are Pydantic BaseModel subclasses (v2). Unlike CrewAI, LlamaIndex’s Pydantic models accept proxy wrappers for llm and tools fields — no in-place patching workaround is needed. The decorator sets wrapped objects directly via setattr().
Streaming Limitation
When streaming=True (the default for ReActAgent), LlamaIndex uses astream_chat which returns an async generator. Token extraction from async generators is a known limitation. Set streaming=False on agents to ensure full token capture.