open source · local-first · no signup

When your AI agent breaks,
this tells you why.

Not logs. Not traces. The answer.
Exact step, root cause, one-line fix — in under 30 seconds.

$ pip install runlens

Python 3.9+ · no cloud · no API key required · clone first →

→ See a live diagnosis first

Anthropic OpenAI LangGraph raw API CrewAI* AutoGen* PydanticAI*

* traced through their underlying Anthropic/OpenAI calls — LLM + tool calls captured, framework-level spans on the roadmap

agentlens — diagnose

$ agentlens diagnose cf3fcea8

AgentLens Diagnosis

═══════════════════════════════════════

ROOT CAUSE:

tool_selection

FAILED AT:

Step 2 (search_web)

WHY:

Both tools had identical descriptions — the agent treated

them as interchangeable and called search_web for a local

record lookup that only query_db can answer.

FIX:

Rewrite tool descriptions so search_web is clearly for

external web queries and query_db is clearly for internal

customer records. Make them mutually exclusive.

CONFIDENCE: 0.90

// how it works

Two lines. That's it.

AgentLens patches your provider client at import time. No changes to existing agent code.

Init before your client

Call agentlens.init() before creating your Anthropic or OpenAI client. It patches the class — every call from that point is captured.

Wrap your agent function

Add @agentlens.run(name="...") to your agent. All LLM calls, tool calls, errors, and costs inside it are grouped into one run.

Diagnose when it breaks

Run agentlens diagnose <run_id>. Get the exact failure step, why the agent made that decision, and a one-line fix.

python quickstart.py

import agentlens
import anthropic

agentlens.init()                     # patches Anthropic + OpenAI + async variants

client = anthropic.Anthropic()       # captured automatically from here

@agentlens.run(name="support_agent")  # works with sync and async
def run_agent(query: str):
    response = client.messages.create(
        model="claude-3-5-sonnet-latest",
        max_tokens=512,
        tools=[...],
        messages=[{"role": "user", "content": query}],
    )
    # record tool results after executing them
    agentlens.record_tool_result(tool_name=..., output=..., tool_use_id=...)
    return response

# LangGraph — one extra line
agentlens.patch_langgraph()          # call before graph.compile()
app = graph.compile()               # every node is now a span

// what it catches

Six failure categories,
detected automatically.

No configuration. No labelling. AgentLens reads the trace and finds the pattern.

tool_selection

Wrong tool chosen

Agent picked the wrong tool because two tools had similar descriptions. The most common silent failure in multi-tool agents.

loop

Retried without exit

Agent called the same tool with the same inputs multiple times instead of changing strategy after a failure.

cascade

Bad data propagated

An upstream tool returned stale or null data. A downstream step used it and crashed. The real bug was 3 steps earlier.

context_pollution

Contradictory instructions

Conflicting instructions in the system prompt diluted the agent's goal before it even started tool selection.

state_drift

Goal abandoned mid-run

Agent started on the right task but lost track of the original goal across a long multi-step conversation.

overflow

Context window truncated

Critical early context was pushed out of the context window before the key decision was made.

// example diagnoses

Same answer format. Any framework.

Three failure patterns AgentLens detects — whatever your agent is built with, the diagnosis looks like this.

LangGraph tool_selection

step 2 (search_web)

conf 0.90

Both tools had identical descriptions. Agent picked external web search for a local record lookup.

confidence

AutoGen loop

step 7 (fetch_data)

conf 0.95

Step 7 repeats the same tool call first made at step 3. No exit condition after repeated failure.

confidence

PydanticAI cascade

step 3 (get_user_profile)

conf 0.90

Profile returned email: null (stale cache). send_email used it at step 6 and crashed.

confidence

// also detects

Hallucinations, too.

AgentLens checks every tool call against the tool schema. If the agent invents a parameter that doesn't exist, it's flagged immediately.

hallucination detection

HALLUCINATIONS DETECTED:

[HIGH] step 2 — invented param:

'query_db' was called with ['limit', 'sort_by']

which are not in its schema.

Valid params: ['customer_id', 'field']

[MED] step 4 — context contradiction:

LLM claims "order found" but get_order

returned status: "not found" at step 3.

// cli

Everything from the terminal.

No dashboard to sign in to. No data leaving your machine.

shell

$ agentlens demo               # see a full diagnosis in 10 seconds — no API key
$ agentlens watch              # live mode — spans stream as your agent runs
$ agentlens runs list          # all recent runs with status + cost
$ agentlens runs view <id>    # visual timeline in the browser
$ agentlens diagnose <id>     # root cause + fix
$ agentlens stats              # token usage, latency, cost across all runs

When your AI agent breaks,this tells you why.

Two lines. That's it.

Init before your client

Wrap your agent function

Diagnose when it breaks

Six failure categories,detected automatically.

Wrong tool chosen

Retried without exit

Bad data propagated

Contradictory instructions

Goal abandoned mid-run

Context window truncated

Same answer format. Any framework.

Hallucinations, too.

Everything from the terminal.

Your agent broke.Find out why in 30 seconds.

When your AI agent breaks,
this tells you why.

Six failure categories,
detected automatically.

Your agent broke.
Find out why in 30 seconds.