open source · local-first · no signup

When your AI agent breaks,
this tells you why.

Not logs. Not traces. The answer.
Exact step, root cause, one-line fix — in under 30 seconds.

$ pip install runlens

Python 3.9+ · no cloud · no API key required · clone first →
Anthropic OpenAI LangGraph CrewAI AutoGen PydanticAI raw API
agentlens — diagnose

$ agentlens diagnose cf3fcea8

AgentLens Diagnosis

═══════════════════════════════════════

ROOT CAUSE:

tool_selection

FAILED AT:

Step 2 (search_web)

WHY:

Both tools had identical descriptions — the agent treated

them as interchangeable and called search_web for a local

record lookup that only query_db can answer.

FIX:

Rewrite tool descriptions so search_web is clearly for

external web queries and query_db is clearly for internal

customer records. Make them mutually exclusive.

CONFIDENCE: 0.90

// how it works

Two lines. That's it.

AgentLens patches your provider client at import time. No changes to existing agent code.

01

Init before your client

Call agentlens.init() before creating your Anthropic or OpenAI client. It patches the class — every call from that point is captured.

02

Wrap your agent function

Add @agentlens.run(name="...") to your agent. All LLM calls, tool calls, errors, and costs inside it are grouped into one run.

03

Diagnose when it breaks

Run agentlens diagnose <run_id>. Get the exact failure step, why the agent made that decision, and a one-line fix.

python quickstart.py
import agentlens
import anthropic

agentlens.init()                     # patches Anthropic + OpenAI + async variants

client = anthropic.Anthropic()       # captured automatically from here

@agentlens.run(name="support_agent")  # works with sync and async
def run_agent(query: str):
    response = client.messages.create(
        model="claude-3-5-sonnet-latest",
        max_tokens=512,
        tools=[...],
        messages=[{"role": "user", "content": query}],
    )
    # record tool results after executing them
    agentlens.record_tool_result(tool_name=..., output=..., tool_use_id=...)
    return response

# LangGraph — one extra line
agentlens.patch_langgraph()          # call before graph.compile()
app = graph.compile()               # every node is now a span
// what it catches

Six failure categories,
detected automatically.

No configuration. No labelling. AgentLens reads the trace and finds the pattern.

tool_selection

Wrong tool chosen

Agent picked the wrong tool because two tools had similar descriptions. The most common silent failure in multi-tool agents.

loop

Retried without exit

Agent called the same tool with the same inputs multiple times instead of changing strategy after a failure.

cascade

Bad data propagated

An upstream tool returned stale or null data. A downstream step used it and crashed. The real bug was 3 steps earlier.

context_pollution

Contradictory instructions

Conflicting instructions in the system prompt diluted the agent's goal before it even started tool selection.

state_drift

Goal abandoned mid-run

Agent started on the right task but lost track of the original goal across a long multi-step conversation.

overflow

Context window truncated

Critical early context was pushed out of the context window before the key decision was made.

// real examples

Same tool. Five frameworks.

AgentLens diagnosed these from real traces across different frameworks — same output format every time.

LangGraph tool_selection
step 2 (search_web)
conf 0.90
Both tools had identical descriptions. Agent picked external web search for a local record lookup.
confidence
AutoGen loop
step 7 (fetch_data)
conf 0.95
Step 7 repeats the same tool call first made at step 3. No exit condition after repeated failure.
confidence
PydanticAI cascade
step 3 (get_user_profile)
conf 0.90
Profile returned email: null (stale cache). send_email used it at step 6 and crashed.
confidence


// also detects

Hallucinations, too.

AgentLens checks every tool call against the tool schema. If the agent invents a parameter that doesn't exist, it's flagged immediately.

hallucination detection

HALLUCINATIONS DETECTED:

[HIGH] step 2 — invented param:

'query_db' was called with ['limit', 'sort_by']

which are not in its schema.

Valid params: ['customer_id', 'field']

[MED] step 4 — context contradiction:

LLM claims "order found" but get_order

returned status: "not found" at step 3.

// cli

Everything from the terminal.

No dashboard to sign in to. No data leaving your machine.

shell
$ agentlens runs list          # all recent runs with status + cost
$ agentlens runs show <id>     # full span detail — every LLM call, tool call, error
$ agentlens diagnose <id>     # root cause + fix
$ agentlens stats              # token usage, latency, cost across all runs
$ agentlens anonymize <id>    # redact secrets before sharing
$ agentlens evaluate           # accuracy check against your test cases

// get started

Your agent broke.
Find out why in 30 seconds.

Free. Local. No account.

$ pip install runlens

★ Star on GitHub →