LlamaIndex Integration

Rastir provides @llamaindex_agent — a single decorator that instruments LlamaIndex agent workflows. It auto-discovers and wraps the agent’s LLM and tools for per-call tracing — tokens, cost, model, provider, input/output — with no code changes inside your agents.

Tip: You can also use @framework_agent which auto-detects LlamaIndex agents from function arguments. The dedicated @llamaindex_agent decorator is still available for explicit control.


Quick Start

from rastir import configure, llamaindex_agent
from llama_index.llms.openai import OpenAI
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool

configure(service="my-app", push_url="http://localhost:8080")

def add(a: int, b: int) -> int:
    """Add two numbers."""
    return a + b

llm = OpenAI(model="gpt-4o-mini")
tools = [FunctionTool.from_defaults(fn=add)]
agent = ReActAgent(llm=llm, tools=tools, streaming=False)

@llamaindex_agent(agent_name="calc_agent")
async def run(agent, query):
    return await agent.run(query)

result = asyncio.run(run(agent, "What is 3 + 5?"))

This produces:

calc_agent (AGENT)
├── llamaindex.ReActAgent.llm.achat (LLM) — model, provider, tokens, cost, input
├── add.acall (TOOL) — tool.input, tool.output
├── llamaindex.ReActAgent.llm.achat (LLM) — subsequent calls
└── llamaindex.ReActAgent.llm.achat (LLM) — output on final response

Why a Dedicated Decorator?

LlamaIndex controls the agent loop internally — your code calls agent.run() or agent.chat() and LlamaIndex manages all LLM calls, tool invocations, and reasoning steps inside. @llamaindex_agent wraps the agent’s LLM and tools before execution begins, and restores originals after.


API Reference

llamaindex_agent()

from rastir import llamaindex_agent

@llamaindex_agent
def run(agent): ...

@llamaindex_agent(agent_name="my_agent")
async def run(agent): ...

Parameters:

Parameter Type Default Description
agent_name str Function name Name for the outer agent span

MCP tools: LlamaIndex handles MCP via llama-index-tools-mcp — MCP tools become regular FunctionTool objects, auto-discovered and wrapped like any other tool.

Supports:

  • Bare usage (@llamaindex_agent) and parameterized (@llamaindex_agent(...))
  • Sync and async functions
  • Agent passed as positional or keyword argument

Recognised Agent Types

The decorator auto-discovers these LlamaIndex agent classes (and subclasses via MRO):

  • ReActAgent
  • FunctionAgent
  • OpenAIAgent
  • FunctionCallingAgent
  • StructuredPlannerAgent
  • AgentRunner
  • BaseAgent

Detection uses class name + module path (llama_index in module), including MRO walking for subclasses.


What Gets Wrapped

LLMs

Each agent’s llm attribute (via ._llm or .llm) is wrapped with a transparent proxy:

Attribute Value
Span name llamaindex.<AgentClass>.llm.<method> (e.g., llamaindex.ReActAgent.llm.achat)
Span type LLM
Methods wrapped chat, achat, complete, acomplete, stream_chat, astream_chat, stream_complete, astream_complete, chat_with_tools, achat_with_tools, stream_chat_with_tools, astream_chat_with_tools

LLM span attributes captured:

Attribute Source Example
model Extracted from the underlying provider response gpt-4o-mini-2024-07-18
provider Auto-detected from LLM module path (e.g. llama_index.llms.openaiopenai) openai
tokens_input Extracted from the provider response’s usage object 634
tokens_output Extracted from the provider response’s usage object 45
cost_usd Calculated from tokens × pricing registry rates 0.000122
input Messages passed to the LLM — messages, user_msg, or chat_history kwargs system: You are designed to help...
output Response text, or tool_call: func(args) for tool-calling responses tool_call: add({"a": 3, "b": 5})
agent Inherited from @llamaindex_agent span calc_agent

Tools

Each agent’s tools (via ._tools or .tools) are wrapped with a transparent proxy:

Attribute Value
Span name <tool_name>.acall (e.g., add.acall, get_weather.acall)
Span type TOOL
Methods wrapped call, __call__, acall

Tool span attributes captured:

Attribute Source Example
tool.input Keyword arguments passed to .acall(**kwargs) {'a': 3, 'b': 5}
tool.output Return value from the tool function 8
agent Inherited from @llamaindex_agent span calc_agent

Skip Already-Wrapped Objects

  • LLMs with _rastir_wrapped = True are not re-wrapped
  • Tools with _rastir_wrapped = True are not re-wrapped

MCP Tool Tracing

How MCP Tools Work in LlamaIndex

LlamaIndex uses llama-index-tools-mcp to connect to MCP servers. McpToolSpec.to_tool_list_async() converts MCP tools into regular FunctionTool objects — the decorator wraps them like any other tool.

from llama_index.tools.mcp import BasicMCPClient, McpToolSpec

mcp_client = BasicMCPClient("http://localhost:8080/mcp")
mcp_tool_spec = McpToolSpec(client=mcp_client)
mcp_tools = await mcp_tool_spec.to_tool_list_async()

agent = ReActAgent(llm=llm, tools=mcp_tools, streaming=False)

@llamaindex_agent(agent_name="mcp_agent")
async def run(agent, query):
    return await agent.run(query)

Trace Propagation to MCP Servers

@llamaindex_agent auto-discovers BasicMCPClient instances in agent tools and injects the traceparent header, enabling end-to-end distributed tracing with a single trace_id.

This produces a fully linked trace:

mcp_agent (AGENT)
├── llamaindex.ReActAgent.llm.achat (LLM)
├── mcpserver:get_weather (TOOL)           ← server span (same trace)
│   └── get_weather.acall (TOOL)           ← client tool span
├── llamaindex.ReActAgent.llm.achat (LLM)  — final answer

Agent Types

ReActAgent

Uses a Thought → Action → Observation loop. Calls llm.achat() for each reasoning step.

agent = ReActAgent(llm=llm, tools=tools, streaming=False)

Note: Set streaming=False — streaming uses astream_chat which returns an async generator, and token extraction from streams requires additional handling.

FunctionAgent

Uses the LLM’s native function/tool calling capability. Calls llm.achat_with_tools().

from llama_index.core.agent import FunctionAgent

agent = FunctionAgent(llm=llm, tools=tools)

Both agent types are fully supported — the decorator wraps all relevant LLM methods.


Coding Patterns

Pattern 1: ReActAgent with local tools (most common)

agent = ReActAgent(llm=llm, tools=tools, streaming=False)

@llamaindex_agent(agent_name="calc_agent")
async def run(agent, query):
    return await agent.run(query)

Pattern 2: FunctionAgent with MCP tools

mcp_tools = await mcp_tool_spec.to_tool_list_async()
agent = FunctionAgent(llm=llm, tools=mcp_tools)

@llamaindex_agent(agent_name="mcp_agent")
async def run(agent, query):
    return await agent.run(query)

Pattern 3: Bare decorator (name defaults to function name)

@llamaindex_agent
async def research_pipeline(agent, query):
    return await agent.run(query)

# Agent span name will be "research_pipeline"

Pattern 4: Cost tracking with pricing registry

from rastir import configure
from rastir.config import get_pricing_registry

configure(service="my-app", push_url="...", enable_cost_calculation=True)

pr = get_pricing_registry()
pr.register("openai", "gpt-4o-mini", input_price=0.15, output_price=0.60)
# Also register the dated variant that OpenAI returns
pr.register("openai", "gpt-4o-mini-2024-07-18", input_price=0.15, output_price=0.60)

@llamaindex_agent(agent_name="my_agent")
async def run(agent, query):
    return await agent.run(query)
# Each LLM span will now include cost_usd

Pattern 5: Agent reuse across calls

@llamaindex_agent(agent_name="reusable")
async def run(agent, query):
    return await agent.run(query)

# Safe to call multiple times — originals restored after each call
result1 = await run(agent, "Hello")
result2 = await run(agent, "World")

Restore After Execution

After the agent run completes (success or error), @llamaindex_agent restores:

  • Original LLM on the agent (._llm or .llm)
  • Original tools list on the agent (._tools or .tools)

This means the agent can be safely reused across multiple calls with no accumulated wrapping.


Error Handling

If the decorated function raises an exception:

  • The agent span records the error (type + message)
  • Span status is set to ERROR
  • The exception is re-raised unchanged
  • Originals are still restored (via finally block)

Span Hierarchy

A typical LlamaIndex trace looks like this:

@llamaindex_agent agent span
│
├── llamaindex.ReActAgent.llm.achat (LLM)     — model=gpt-4o-mini, tokens, cost
│                                                input=messages, output=tool_call
├── add.acall (TOOL)                           — input={'a': 3, 'b': 5}, output=8
├── llamaindex.ReActAgent.llm.achat (LLM)      — tool result fed back to LLM
│                                                output=tool_call: multiply(...)
├── multiply.acall (TOOL)                      — input={'a': 8, 'b': 2}, output=16
├── llamaindex.ReActAgent.llm.achat (LLM)      — final answer, has text output

With MCP tools:

@llamaindex_agent agent span
│
├── llamaindex.FunctionAgent.llm.achat_with_tools (LLM)
├── mcpserver:get_population (TOOL)            — server span (via traceparent)
│   └── get_population.acall (TOOL)            — client tool execution
├── llamaindex.FunctionAgent.llm.achat_with_tools (LLM) — final answer

All child spans inherit the agent label, so Prometheus metrics are grouped by agent.


Span Attributes in Tempo

Here’s what you’ll see in Tempo/Grafana for each span type:

Agent span

Attribute Example
rastir.span_type agent
rastir.agent_name calc_agent

LLM span

Attribute Example
rastir.span_type llm
rastir.model gpt-4o-mini-2024-07-18
rastir.provider openai
rastir.tokens_input 634
rastir.tokens_output 45
rastir.cost_usd 0.000122
rastir.input system: You are designed to help...
rastir.output tool_call: add({"a": 3, "b": 5}) or The answer is 8
rastir.agent calc_agent

Tool span

Attribute Example
rastir.span_type tool
rastir.tool.input {'a': 3, 'b': 5}
rastir.tool.output 8
rastir.agent calc_agent

Prometheus Metrics Produced

Metric Source
rastir_llm_calls_total{model, provider, agent} Wrapped LLM method calls
rastir_tokens_input_total{model, provider, agent} Token extraction from provider response
rastir_tokens_output_total{model, provider, agent} Token extraction from provider response
rastir_cost_total{model, provider, agent} Cost calculation from pricing registry
rastir_duration_seconds{span_type="llm"} LLM call latency
rastir_tool_calls_total{tool_name, agent} Tool .acall() invocation
rastir_duration_seconds{span_type="tool"} Tool invocation latency
rastir_duration_seconds{span_type="agent"} Entire agent execution latency

Technical Notes

LlamaIndex Token Extraction

LlamaIndex wraps provider responses in its own ChatResponse schema. The ChatResponse.message.raw field holds the original provider response (e.g., OpenAI ChatCompletion). The LlamaIndex adapter unwraps this and delegates to the provider-specific adapter (OpenAI, Anthropic, etc.) for token extraction.

LlamaIndex Output Extraction

  • Text responses: Extracted from ChatResponse.message.content
  • Tool-calling responses: When the LLM returns tool calls with no text content, Rastir extracts tool call blocks from message.blocks (for ToolCallBlock) or message.additional_kwargs["tool_calls"], formatting them as tool_call: func_name(args)

LlamaIndex Input Extraction

  • ReActAgent: Passes messages= kwarg to llm.achat(messages=[...])
  • FunctionAgent: Passes chat_history= kwarg to llm.achat_with_tools(chat_history=[...])
  • Rastir captures both patterns via _capture_llm_input()

Pydantic Compatibility

LlamaIndex agents are Pydantic BaseModel subclasses (v2). Unlike CrewAI, LlamaIndex’s Pydantic models accept proxy wrappers for llm and tools fields — no in-place patching workaround is needed. The decorator sets wrapped objects directly via setattr().

Streaming Limitation

When streaming=True (the default for ReActAgent), LlamaIndex uses astream_chat which returns an async generator. Token extraction from async generators is a known limitation. Set streaming=False on agents to ensure full token capture.


Rastir — LLM & Agent Observability Library