Overview
The hard part of building agents (or any LLM application) is making them reliable enough. While they may work for a prototype, they often fail in real-world use cases.Why do agents fail?
When agents fail, it’s usually because the LLM call inside the agent took the wrong action / didn’t do what we expected. LLMs fail for one of two reasons:- The underlying LLM is not capable enough
- The “right” context was not passed to the LLM
New to context engineering? Start with the conceptual overview to understand the different types of context and when to use them.
The agent loop
A typical agent loop consists of two main steps:- Model call - calls the LLM with a prompt and available tools, returns either a response or a request to execute tools
- Tool execution - executes the tools that the LLM requested, returns tool results

What you can control
To build reliable agents, you need to control what happens at each step of the agent loop, as well as what happens between steps.| Context Type | What You Control | Transient or Persistent |
|---|---|---|
| Model Context | What goes into model calls (instructions, message history, tools, response format) | Transient |
| Tool Context | What tools can access and produce (reads/writes to state, store, runtime context) | Persistent |
| Life-cycle Context | What happens between model and tool calls (summarization, guardrails, logging, etc.) | Persistent |
Transient context
What the LLM sees for a single call. You can modify messages, tools, or prompts without changing what’s saved in state.
Persistent context
What gets saved in state across turns. Life-cycle hooks and tool writes modify this permanently.
Data sources
Throughout this process, your agent accesses (reads / writes) different sources of data:| Data Source | Also Known As | Scope | Examples |
|---|---|---|---|
| Runtime Context | Static configuration | Conversation-scoped | User ID, API keys, database connections, permissions, environment settings |
| State | Short-term memory | Conversation-scoped | Current messages, uploaded files, authentication status, tool results |
| Store | Long-term memory | Cross-conversation | User preferences, extracted insights, memories, historical data |
How it works
LangChain middleware is the mechanism under the hood that makes context engineering practical for developers using LangChain. Middleware allows you to hook into any step in the agent lifecycle and:- Update context
- Jump to a different step in the agent lifecycle
Model Context
Control what goes into each model call - instructions, available tools, which model to use, and output format. These decisions directly impact reliability and cost.System Prompt
Base instructions from the developer to the LLM.
Messages
The full list of messages (conversation history) sent to the LLM.
Tools
Utilities the agent has access to to take actions.
Model
The actual model (including configuration) to be called.
Response Format
Schema specification for the model’s final response.
System Prompt
The system prompt sets the LLM’s behavior and capabilities. Different users, contexts, or conversation stages need different instructions. Successful agents draw on memories, preferences, and configuration to provide the right instructions for the current state of the conversation.- State
- Store
- Runtime Context
Access message count or conversation context from state:
Messages
Messages make up the prompt that is sent to the LLM. It’s critical to manage the content of messages to ensure that the LLM has the right information to respond well.- State
- Store
- Runtime Context
Inject uploaded file context from State when relevant to current query:
Transient vs Persistent Message Updates:The examples above use
wrap_model_call to make transient updates - modifying what messages are sent to the model for a single call without changing what’s saved in state.For persistent updates that modify state (like the summarization example in Life-cycle Context), use life-cycle hooks like before_model or after_model to permanently update the conversation history. See the middleware documentation for more details.Tools
Tools let the model interact with databases, APIs, and external systems. How you define and select tools directly impacts whether the model can complete tasks effectively.Defining tools
Each tool needs a clear name, description, argument names, and argument descriptions. These aren’t just metadata—they guide the model’s reasoning about when and how to use the tool.Selecting tools
Not every tool is appropriate for every situation. Too many tools may overwhelm the model (overload context) and increase errors; too few limit capabilities. Dynamic tool selection adapts the available toolset based on authentication state, user permissions, feature flags, or conversation stage.- State
- Store
- Runtime Context
Enable advanced tools only after certain conversation milestones:
Model
Different models have different strengths, costs, and context windows. Select the right model for the task at hand, which might change during an agent run.- State
- Store
- Runtime Context
Use different models based on conversation length from State:
Response Format
Structured output transforms unstructured text into validated, structured data. When extracting specific fields or returning data for downstream systems, free-form text isn’t sufficient. How it works: When you provide a schema as the response format, the model’s final response is guaranteed to conform to that schema. The agent runs the model / tool calling loop until the model is done calling tools, then the final response is coerced into the provided format.Defining formats
Schema definitions guide the model. Field names, types, and descriptions specify exactly what format the output should adhere to.Selecting formats
Dynamic response format selection adapts schemas based on user preferences, conversation stage, or role—returning simple formats early and detailed formats as complexity increases.- State
- Store
- Runtime Context
Configure structured output based on conversation state:
Tool Context
Tools are special in that they both read and write context. In the most basic case, when a tool executes, it receives the LLM’s request parameters and returns a tool message back. The tool does its work and produces a result. Tools can also fetch important information for the model that allows it to perform and complete tasks.Reads
Most real-world tools need more than just the LLM’s parameters. They need user IDs for database queries, API keys for external services, or current session state to make decisions. Tools read from state, store, and runtime context to access this information.- State
- Store
- Runtime Context
Read from State to check current session information:
Writes
Tool results can be used to help an agent complete a given task. Tools can both return results directly to the model and update the memory of the agent to make important context available to future steps.- State
- Store
Write to State to track session-specific information using Command:
Life-cycle Context
Control what happens between the core agent steps - intercepting data flow to implement cross-cutting concerns like summarization, guardrails, and logging. As you’ve seen in Model Context and Tool Context, middleware is the mechanism that makes context engineering practical. Middleware allows you to hook into any step in the agent lifecycle and either:- Update context - Modify state and store to persist changes, update conversation history, or save insights
- Jump in the lifecycle - Move to different steps in the agent cycle based on context (e.g., skip tool execution if a condition is met, repeat model call with modified context)

Example: Summarization
One of the most common life-cycle patterns is automatically condensing conversation history when it gets too long. Unlike the transient message trimming shown in Model Context, summarization persistently updates state - permanently replacing old messages with a summary that’s saved for all future turns. LangChain offers built-in middleware for this:SummarizationMiddleware automatically:
- Summarizes older messages using a separate LLM call
- Replaces them with a summary message in State (permanently)
- Keeps recent messages intact for context
For a complete list of built-in middleware, available hooks, and how to create custom middleware, see the Middleware documentation.
Best practices
- Start simple - Begin with static prompts and tools, add dynamics only when needed
- Test incrementally - Add one context engineering feature at a time
- Monitor performance - Track model calls, token usage, and latency
- Use built-in middleware - Leverage
SummarizationMiddleware,LLMToolSelectorMiddleware, etc. - Document your context strategy - Make it clear what context is being passed and why
- Understand transient vs persistent: Model context changes are transient (per-call), while life-cycle context changes persist to state
Related resources
- Context conceptual overview - Understand context types and when to use them
- Middleware - Complete middleware guide
- Tools - Tool creation and context access
- Memory - Short-term and long-term memory patterns
- Agents - Core agent concepts
Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.