FastroAgent¶
FastroAgent wraps PydanticAI's Agent with automatic cost calculation, optional distributed tracing, and a consistent response format. Every response includes token counts and cost in microcents (exact integer math, no floating-point drift) so you can track usage, set budgets, and debug cost issues in production.
How It Works¶
sequenceDiagram
participant You
participant FastroAgent
participant PydanticAI
participant LLM API
You->>FastroAgent: run("Hello!")
FastroAgent->>PydanticAI: agent.run()
PydanticAI->>LLM API: HTTP request
LLM API-->>PydanticAI: response + token counts
PydanticAI-->>FastroAgent: result
FastroAgent->>FastroAgent: calculate cost
FastroAgent-->>You: ChatResponse
When you call agent.run(), FastroAgent passes your message to the underlying PydanticAI agent, which handles the actual LLM call. When the response comes back, FastroAgent extracts token counts, calculates cost using genai-prices (or your custom pricing), and packages everything into a ChatResponse with usage data.
The agent itself is stateless. Conversation history, user context, and retry logic are your responsibility. This keeps FastroAgent simple and predictable.
Creating an Agent¶
The simplest agent uses defaults for everything:
Add a system prompt and tweak parameters as needed:
agent = FastroAgent(
model="openai:gpt-4o",
system_prompt="You are a helpful financial advisor.",
temperature=0.3, # More consistent responses
max_tokens=4096,
)
If you're passing configuration around (from environment variables, user settings, or a config file), use AgentConfig:
from fastroai import FastroAgent, AgentConfig
config = AgentConfig(
model="anthropic:claude-3-5-sonnet",
system_prompt="You are a code reviewer.",
temperature=0.2,
)
agent = FastroAgent(config=config)
Configuration Reference¶
| Parameter | Default | Description |
|---|---|---|
model |
openai:gpt-4o |
Model identifier with provider prefix |
system_prompt |
"You are a helpful AI assistant." |
Instructions for the model |
temperature |
0.7 |
Sampling temperature (0.0-2.0) |
max_tokens |
4096 |
Maximum response tokens |
timeout_seconds |
120 |
Request timeout |
max_retries |
3 |
Retry attempts on failure |
Model names use PydanticAI's provider prefix format: openai:gpt-4o, anthropic:claude-3-5-sonnet, google:gemini-1.5-pro. The prefix tells PydanticAI which API client to use.
Running Queries¶
Use run() to send a message and get a response:
response = await agent.run("What is the capital of France?")
print(response.output) # "The capital of France is Paris."
print(response.cost_dollars) # 0.000234
The response includes everything you need for billing and debugging:
print(f"Model: {response.model}")
print(f"Input tokens: {response.input_tokens}")
print(f"Output tokens: {response.output_tokens}")
print(f"Cost: ${response.cost_dollars:.6f}")
print(f"Time: {response.processing_time_ms}ms")
What's in a Response¶
| Field | Type | What It's For |
|---|---|---|
content |
str |
The response text |
output |
OutputT |
Typed output (same as content for string agents) |
model |
str |
Model that generated the response |
input_tokens |
int |
Prompt tokens consumed |
output_tokens |
int |
Completion tokens generated |
cache_read_tokens |
int |
Tokens read from prompt cache (typically 90% cheaper) |
cache_write_tokens |
int |
Tokens written to prompt cache |
cost_microcents |
int |
Cost in 1/1,000,000 of a dollar (for calculations) |
cost_dollars |
float |
Cost in dollars (for display) |
processing_time_ms |
int |
Wall-clock time |
tool_calls |
list |
Tools invoked during generation |
tool_call_count |
int |
Number of tool invocations |
request_count |
int |
Number of API requests (increases with tool use) |
trace_id |
str |
Tracing correlation ID |
usage_details |
dict |
Provider-specific details (e.g., reasoning tokens) |
Use cost_microcents when you need to aggregate costs across many calls - integer math won't drift. Use cost_dollars when displaying to users.
When prompt caching is enabled (Anthropic, OpenAI with long prompts), cache_read_tokens shows how many tokens came from cache at a 90% discount. This is automatically factored into cost_microcents.
Structured Output¶
Getting strings back means you have to parse them. For anything structured - extracting entities, classifying content, generating schemas - use Pydantic models instead:
from pydantic import BaseModel
from fastroai import FastroAgent
class MovieReview(BaseModel):
title: str
rating: int
summary: str
agent = FastroAgent(
model="openai:gpt-4o",
output_type=MovieReview,
)
response = await agent.run("Review the movie Inception")
# response.output is a MovieReview instance, not a string
print(response.output.title) # "Inception"
print(response.output.rating) # 9
print(response.output.summary) # "A mind-bending thriller..."
PydanticAI handles the structured output extraction and validation. If the LLM output doesn't match your schema, it retries automatically. You pay for those retry tokens, so keep your schemas reasonable - don't ask for 50 required fields in one call.
Conversation History¶
FastroAgent doesn't store conversation history. This sounds annoying until you realize how many ways there are to mess up conversation storage, and how much your storage requirements differ from everyone else's.
You load history from your storage, pass it in, and save new messages yourself:
from pydantic_ai.messages import ModelMessage
# Load from your storage (database, Redis, files, whatever)
history: list[ModelMessage] = await my_storage.load(user_id)
# Pass to the agent
response = await agent.run(
"Continue our conversation",
message_history=history,
)
# Save the new user message and response
# You manage how messages are stored - FastroAgent doesn't dictate the format
await my_storage.append(user_id, "Continue our conversation", response.content)
This gives you full control. Store history in a database table, Redis with a TTL, or flat files - whatever fits your architecture. Keep conversations forever or clear them after each session. Store complete messages or compress them into summaries. FastroAI doesn't care; it just takes whatever history you pass in.
Streaming¶
For long responses, stream chunks as they arrive so users don't stare at a loading spinner:
async for chunk in agent.run_stream("Write a short story"):
if chunk.is_final:
# Last chunk has complete usage data
print(f"\nCost: ${chunk.usage_data.cost_dollars:.6f}")
else:
# Print content as it arrives
print(chunk.content, end="", flush=True)
Cost tracking still works - you just get it at the end instead of upfront. The final chunk has is_final=True and includes the complete usage_data with token counts and cost.
One gotcha: if your connection drops mid-stream, you lose both the partial response and the usage data. Consider logging partial streams if you need that level of tracking.
Adding Tools¶
Give agents capabilities by passing toolsets. Here's a simple weather tool:
from pydantic_ai.toolsets import FunctionToolset
def get_weather(city: str) -> str:
"""Get current weather for a city."""
# Your weather API call here
return f"Sunny, 72°F in {city}"
toolset = FunctionToolset(tools=[get_weather])
agent = FastroAgent(
model="openai:gpt-4o",
system_prompt="You can check the weather.",
toolsets=[toolset],
)
response = await agent.run("What's the weather in Paris?")
# Agent calls get_weather("Paris") and uses the result
Tool calls are tracked in the response:
for call in response.tool_calls:
print(f"Called: {call['tool_name']}")
print(f"Args: {call['args']}")
In production, tools fail. APIs time out. External services go down. Regular tools crash the whole request when that happens. The Safe Tools guide covers @safe_tool, which wraps tools with timeout, retry, and graceful error handling so failures become messages the LLM can work with instead of exceptions that kill your request.
Dependencies¶
Pass runtime dependencies to your tools - database connections, API clients, user context:
class MyDeps:
def __init__(self, api_key: str, user_id: str):
self.api_key = api_key
self.user_id = user_id
response = await agent.run(
"Search for news about AI",
deps=MyDeps(api_key="sk-...", user_id="user_123"),
)
Your tools receive these deps as their first argument. This keeps tools testable (inject mock deps) and avoids global state.
Tracing¶
In production, you need to correlate AI calls with the rest of your request flow. Pass a tracer:
from fastroai import SimpleTracer
tracer = SimpleTracer()
response = await agent.run("Hello", tracer=tracer)
print(response.trace_id) # "trace_abc123..."
SimpleTracer logs to Python's logging module - good for development. For production, implement the Tracer protocol to send spans to your observability platform (Datadog, Honeycomb, Logfire, etc.). The Tracing guide has examples for common platforms.
Using Existing PydanticAI Agents¶
Already have a configured PydanticAI agent with custom output types, validators, or complex tooling? Wrap it with FastroAgent to add cost tracking without changing anything else:
from pydantic_ai import Agent
from fastroai import FastroAgent
# Your existing PydanticAI agent
pydantic_agent = Agent(
model="openai:gpt-4o",
output_type=MyCustomType,
# ... your configuration
)
# Wrap it with FastroAgent
fastro_agent = FastroAgent(agent=pydantic_agent)
# Now you get cost tracking on your existing agent
response = await fastro_agent.run("Hello")
print(response.cost_dollars)
The underlying agent is always accessible if you need it:
Key Files¶
| Component | Location |
|---|---|
| FastroAgent | fastroai/agent/agent.py |
| AgentConfig | fastroai/agent/schemas.py |
| ChatResponse | fastroai/agent/schemas.py |
| StreamChunk | fastroai/agent/schemas.py |