Not logs. Not traces. The answer.
Exact step, root cause, one-line fix — in under 30 seconds.
$ agentlens diagnose cf3fcea8
AgentLens Diagnosis
═══════════════════════════════════════
ROOT CAUSE:
tool_selection
FAILED AT:
Step 2 (search_web)
WHY:
Both tools had identical descriptions — the agent treated
them as interchangeable and called search_web for a local
record lookup that only query_db can answer.
FIX:
Rewrite tool descriptions so search_web is clearly for
external web queries and query_db is clearly for internal
customer records. Make them mutually exclusive.
CONFIDENCE: 0.90
AgentLens patches your provider client at import time. No changes to existing agent code.
Call agentlens.init() before creating your Anthropic or OpenAI client. It patches the class — every call from that point is captured.
Add @agentlens.run(name="...") to your agent. All LLM calls, tool calls, errors, and costs inside it are grouped into one run.
Run agentlens diagnose <run_id>. Get the exact failure step, why the agent made that decision, and a one-line fix.
import agentlens import anthropic agentlens.init() # patches Anthropic + OpenAI + async variants client = anthropic.Anthropic() # captured automatically from here @agentlens.run(name="support_agent") # works with sync and async def run_agent(query: str): response = client.messages.create( model="claude-3-5-sonnet-latest", max_tokens=512, tools=[...], messages=[{"role": "user", "content": query}], ) # record tool results after executing them agentlens.record_tool_result(tool_name=..., output=..., tool_use_id=...) return response # LangGraph — one extra line agentlens.patch_langgraph() # call before graph.compile() app = graph.compile() # every node is now a span
No configuration. No labelling. AgentLens reads the trace and finds the pattern.
Agent picked the wrong tool because two tools had similar descriptions. The most common silent failure in multi-tool agents.
Agent called the same tool with the same inputs multiple times instead of changing strategy after a failure.
An upstream tool returned stale or null data. A downstream step used it and crashed. The real bug was 3 steps earlier.
Conflicting instructions in the system prompt diluted the agent's goal before it even started tool selection.
Agent started on the right task but lost track of the original goal across a long multi-step conversation.
Critical early context was pushed out of the context window before the key decision was made.
AgentLens diagnosed these from real traces across different frameworks — same output format every time.
AgentLens checks every tool call against the tool schema. If the agent invents a parameter that doesn't exist, it's flagged immediately.
HALLUCINATIONS DETECTED:
[HIGH] step 2 — invented param:
'query_db' was called with ['limit', 'sort_by']
which are not in its schema.
Valid params: ['customer_id', 'field']
[MED] step 4 — context contradiction:
LLM claims "order found" but get_order
returned status: "not found" at step 3.
No dashboard to sign in to. No data leaving your machine.
$ agentlens runs list # all recent runs with status + cost $ agentlens runs show <id> # full span detail — every LLM call, tool call, error $ agentlens diagnose <id> # root cause + fix $ agentlens stats # token usage, latency, cost across all runs $ agentlens anonymize <id> # redact secrets before sharing $ agentlens evaluate # accuracy check against your test cases
Free. Local. No account.