The Problem
Most AI research tools are demos. They look impressive in a video—type a question, watch the agent browse the web, get a summary. In production they break in predictable ways: they hallucinate citations, loop endlessly when a URL times out, produce different output for the same input, and leave no trace of what they actually fetched.
A production research agent needs to be deterministic, observable, and recoverable. This one is.
What Was Built
The Deep Research Agent is a LangGraph-based multi-step agent that takes a research question (and optional seed URLs or documents) and produces a structured, cited report.
The agent runs in four phases:
- Planning — generates a research plan (
plan.md) with scoped sub-questions and target source types - Fetching — retrieves URLs and documents, normalises them to text
- Note-taking — extracts relevant information per sub-question into
notes.mdwith source attribution - Synthesis — writes the final
report.mdwith inline citations referencingsources.json
Every phase writes to the run's artifact directory (runs/<thread_id>/). If a run fails mid-way, you can inspect exactly where it stopped.
The Fetch Pipeline
Fetching is where most research agent implementations break. This pipeline handles:
- HTML — clean text extraction, JS-rendered pages via headless fetch where needed
- PDF — text extraction with page boundary awareness
- DOCX, TXT, MD, CSV — each with appropriate parsers
Every fetch has a configurable timeout and size cap. A single large document cannot stall the agent—it is truncated to the token budget and flagged in the source manifest.
Failed fetches are recorded in sources.json with a failure reason, not silently dropped.
Guardrails
The agent operates within strict limits defined at invocation:
max_sources— caps total sources fetched per runmax_links_per_source— prevents recursive link-following explosionsmax_tokens_per_note— keeps context within model limits- HTTP and model timeouts — no hanging requests
- Retry limits with exponential backoff
These guardrails are not afterthoughts. They are the primary mechanism that makes the agent safe to run in production without human supervision.
Artifact Completion
If a run terminates abnormally (OOM, timeout, upstream error) before report.md is written, an artifact completion step runs. This is a lightweight, tool-free model call that reads the available notes.md and writes a best-effort summary report. The run is never in a state where artifacts are partially written with no report.
Key Engineering Decisions
LangGraph over a custom loop. LangGraph's explicit state machine made it straightforward to define phase transitions, handle conditional edges (retry vs fail vs complete), and inspect intermediate state during debugging.
Thread ID as the artifact namespace. Every run gets a UUID thread ID. All artifacts live under runs/<thread_id>/. This makes runs independently inspectable, comparable, and replayable from any step.
Determinism as a design goal. Given the same inputs and the same model version, the agent produces the same plan. Source selection is ordered, not random. This makes output variance debuggable rather than mysterious.
Scope
- ✓Tool-using workflows via LangChain / LangGraph with a Deep Agents-inspired design.
- ✓Fetch pipeline that handles HTML and the top 5 document formats with safe timeouts and size caps.
- ✓Runs produce plan.md, notes.md, sources.json, report.md and normalized source files under runs/<thread_id>/.
- ✓Guardrails: caps on max_sources, max_links_per_source, HTTP/model timeouts, token limits and retries.
- ✓Automatic artifact completion if report.md is missing after a run, using a tool-free model call.
Waqas Raza
AI-Native Full-Stack Engineer. Top Rated on Upwork · $180K+ earned · 93% job success. I build production AI agents, LLM systems, Web3 platforms, and full-stack applications.
Hire me on Upwork