Skip to content
AgentEnsemble AgentEnsemble
Get Started

Scoped Memory for Agent Systems: Cross-Run Persistence Without Global State

Most agent frameworks treat each run as stateless. The agent starts fresh, does its work, and the output is consumed by whatever called it. If you run the same workflow again next week, the agent has no memory of what it produced last time.

For some use cases that is fine. For others — recurring research tasks, iterative drafting, accumulated domain knowledge — you want the agent to remember what it learned in previous runs and build on it.

The question is how to add cross-run memory without introducing global shared state that makes the system hard to reason about.

AgentEnsemble uses named memory scopes. Each task declares which scopes it reads from and writes to. A task can only see memory from scopes it explicitly declares.

MemoryStore store = MemoryStore.inMemory();
Task researchTask = Task.builder()
.description("Research current AI trends")
.expectedOutput("A research report")
.agent(researcher)
.memory("ai-research")
.build();
Ensemble.builder()
.agent(researcher)
.task(researchTask)
.memoryStore(store)
.build()
.run();

After the run, the task’s output is stored in the "ai-research" scope. On a second run with the same store, the agent’s prompt automatically includes entries from the first run under a ## Memory: ai-research section.

The scope name is the isolation boundary. Task A storing into "research" and task B declaring only "drafts" means task B never sees task A’s output. This is not a security mechanism — it is an attention mechanism. It controls what context an agent receives, keeping prompts focused on relevant history rather than everything that ever happened.

The mechanics are straightforward:

  1. At task startup, the framework retrieves entries from every declared scope and injects them into the agent’s prompt.
  2. At task completion, the framework stores the task output into every declared scope.
  3. Because entries persist in the MemoryStore across runs, agents in later runs automatically see outputs from earlier runs.

The prompt injection looks like this:

## Memory: ai-project
The following information from scope "ai-project" may be relevant:
---
Research findings from previous run: AI is accelerating in healthcare...
---
## Task
Analyse the research findings

There is no magic retrieval. The framework puts the memory content into the prompt, and the LLM uses it (or ignores it) during reasoning.

MemoryStore has two built-in implementations:

In-memory stores entries in insertion order per scope. Retrieval returns the most recent entries without semantic search. Suitable for development, testing, and single-JVM runs. Entries do not survive JVM restarts.

MemoryStore store = MemoryStore.inMemory();

Embedding-based stores entries via an embedding model and retrieves them via semantic similarity search. The backing EmbeddingStore controls durability — Chroma, Qdrant, Pinecone, pgvector, or any LangChain4j-compatible store.

EmbeddingModel embeddingModel = OpenAiEmbeddingModel.builder()
.apiKey(System.getenv("OPENAI_API_KEY"))
.modelName("text-embedding-3-small")
.build();
EmbeddingStore<TextSegment> embeddingStore = ChromaEmbeddingStore.builder()
.baseUrl("http://localhost:8000")
.collectionName("agentensemble-memory")
.build();
MemoryStore store = MemoryStore.embeddings(embeddingModel, embeddingStore);

The design tradeoff is explicit. In-memory is fast and simple but loses data on restart and does not do semantic retrieval. Embedding-based is durable and semantically aware but requires an embedding model and a vector store. You choose based on your operational requirements.

Unbounded memory is a prompt-size problem. Every stored entry adds tokens to the next run’s prompt. Scopes support optional eviction to keep sizes bounded:

// Retain only the 5 most recent entries
MemoryScope.builder()
.name("research")
.keepLastEntries(5)
.build()
// Retain only entries from the past 7 days
MemoryScope.builder()
.name("research")
.keepEntriesWithin(Duration.ofDays(7))
.build()

Eviction is applied after each task stores its output. For embedding-based stores, eviction is a no-op since most embedding stores do not support deletion of individual entries.

In addition to the automatic scope-based mechanism, agents can interact with memory directly during their ReAct loop using MemoryTool:

Agent researcher = Agent.builder()
.role("Researcher")
.goal("Research and remember important facts")
.tools(MemoryTool.of("research", store))
.build();

MemoryTool provides two tool methods the LLM can call: storeMemory(key, value) to store an arbitrary fact, and retrieveMemory(query) to retrieve relevant memories by query.

When the same MemoryStore instance is used for both MemoryTool and Ensemble.builder().memoryStore(...), explicit tool access and automatic scope-based access share the same backing store. This means an agent can both receive automatic context from previous runs and actively query or store additional facts during execution.

Multiple tasks can declare the same scope name. Each task writes its output to the scope after it completes, so later tasks in a sequential workflow see earlier tasks’ outputs:

Task research = Task.builder()
.description("Research AI trends")
.memory("ai-project")
.build();
Task analysis = Task.builder()
.description("Analyse the research findings")
.memory("ai-project")
.build();
Ensemble.builder()
.task(research)
.task(analysis)
.memoryStore(store)
.build()
.run();

This is within-run memory sharing. The analysis task sees the research task’s output because they share the "ai-project" scope. On the next run, both tasks see outputs from the previous run’s research and analysis.

The key design decision is that memory is opt-in and scoped, not global and automatic. An agent does not remember everything by default. Each task explicitly declares what it wants to remember and what it wants to recall.

This makes the system easier to reason about. You can look at a task definition and know exactly what memory context it will receive. You can test a task with a pre-populated store and verify that it uses the memory correctly. You can clear a scope without affecting other scopes.

The tradeoff is that you have to think about memory design upfront. Which tasks share scopes? How many entries should be retained? Should you use semantic search or recency-based retrieval? These are design decisions that the framework surfaces explicitly rather than hiding behind defaults.


The full memory guide is in the documentation.

I’d be interested in how you handle the prompt-size tension — whether bounded eviction is sufficient, or whether you have needed more sophisticated retrieval strategies for production memory systems.