Self-Optimizing Agent Tasks: Persistent Reflection Loops in Java

Apr 15, 2026

Task definitions are written at compile time. You describe what the task should do, wire up a model, and run. The prompt stays fixed unless you go back and edit it.

In practice, you often discover after a few runs that the instructions could be more precise. The LLM misses an edge case you didn’t anticipate. The output format drifts in ways you didn’t specify. You revise the description, redeploy, and try again.

The harder version of this problem is: what if the instructions could improve themselves?

Task reflection is a persistent, automated feedback loop built into the task execution lifecycle. After a task completes successfully — output accepted, guardrails passed, reviews approved — an LLM-backed analysis step reviews whether the task’s instructions could be improved. Improvements are stored in a ReflectionStore and injected into the task’s prompt on subsequent runs. The original task definition is never modified.

This post covers how reflection works, what the API looks like, and where the tradeoffs sit.

Reflection vs Phase Review

These two mechanisms are often confused because both involve quality analysis. The distinction is in scope and timing:

	Phase Review	Task Reflection
Trigger	After phase completes	After task output accepted
Scope	Within a single `Ensemble.run()`	Across multiple `Ensemble.run()` calls
Purpose	Fix inadequate output this run	Improve instructions for future runs
Persistence	Transient — lost after run	Persistent — stored between runs
Initiated by	External reviewer	Automated LLM analysis

Phase review fixes output within a run. Task reflection improves instructions across runs. They compose: a task can have both.

Quick Start

Enable reflection with .reflect(true) on a task, and configure a ReflectionStore on the ensemble:

ReflectionStore store = new InMemoryReflectionStore();

Task research = Task.builder()
    .description("research the top 5 trends in cloud-native Java for 2026")
    .reflect(true)
    .chatModel(model)
    .build();

// Run 1: no prior reflections; task executes normally
EnsembleOutput run1 = Ensemble.builder()
    .tasks(List.of(research))
    .reflectionStore(store)
    .chatModel(model)
    .build()
    .run();

// Reflection fires after run 1 completes; improvements stored in `store`

// Run 2: prior reflections injected into the prompt automatically
EnsembleOutput run2 = Ensemble.builder()
    .tasks(List.of(research))
    .reflectionStore(store)
    .chatModel(model)
    .build()
    .run();

The store is the key. Pass the same store instance across run() calls and the accumulated reflections persist. The task definition — the Task object — is the same every run. The difference is what the reflection store contributes to the prompt.

The Execution Lifecycle

Reflection fires at the end of the task lifecycle, after all other post-processing:

1. Task executes (LLM call)
2. Guardrails evaluate output
3. Review gate runs (if configured)
4. Memory scopes write
5. [Reflection] LLM analyzes output; generates improvement; stores in ReflectionStore

If the task fails, guardrails reject the output, or the review gate retries, reflection does not fire. Reflection only fires on a fully accepted output.

On the next run of the same task:

1. ReflectionStore loads prior reflections for this task identity
2. Reflections injected into prompt
3. Task executes with improved instructions
4. [Reflection] New analysis fires; improvement stored

The task is identified by a TaskIdentity derived from its description. Two tasks with the same description share the same reflection history.

What Gets Injected

The reflection store contributes an additional section to the task’s prompt:

[original task description]

## Instruction Refinements

Based on previous runs, the following refinements have been found to improve output quality:
- Be specific about the time range: results should cover events within the last 12 months only.
- Structure the output as a numbered list with a one-sentence summary per trend.
- For each trend, cite at least one concrete project or company as evidence.

The injection is additive. Original instructions are preserved. Reflections narrow, clarify, or extend them based on what previous outputs revealed.

Reflection Configuration

Bounding the History

By default, the framework uses all stored reflections for a task. You can bound the number injected via ReflectionConfig:

Task analysis = Task.builder()
    .description("analyze customer sentiment from support tickets")
    .reflect(true)
    .reflectionConfig(ReflectionConfig.builder()
        .maxReflections(5)
        .build())
    .chatModel(model)
    .build();

With maxReflections(5), only the 5 most recent reflections are injected. Older reflections remain in the store but are not included in the prompt. This prevents prompt bloat as the number of runs grows.

Custom Reflection Strategy

The default strategy uses an LLM call to analyze the task output and generate an improvement. You can substitute a custom ReflectionStrategy:

public class DomainReflectionStrategy implements ReflectionStrategy {

    @Override
    public Optional<String> reflect(ReflectionInput input) {
        String output = input.taskOutput();
        // custom analysis: check for required sections, format, length
        if (!output.contains("## Summary")) {
            return Optional.of("Always include a ## Summary section as the first heading");
        }
        // no improvement identified this time
        return Optional.empty();
    }
}

Task.builder()
    .description("write a technical design document")
    .reflect(true)
    .reflectionConfig(ReflectionConfig.builder()
        .strategy(new DomainReflectionStrategy())
        .build())
    .chatModel(model)
    .build();

A custom strategy can use deterministic rules, call a different model, or apply domain-specific analysis. Returning Optional.empty() skips storage for that run.

The Reflection Store

ReflectionStore is an interface with two methods:

public interface ReflectionStore {
    List<TaskReflection> load(TaskIdentity identity);
    void store(TaskIdentity identity, TaskReflection reflection);
}

InMemoryReflectionStore is included for development and testing. It holds reflections in a ConcurrentHashMap and loses state when the process stops.

For production, implement ReflectionStore against whatever persistence layer makes sense for your system — a relational database, a document store, or a key-value store:

public class JdbcReflectionStore implements ReflectionStore {

    private final DataSource dataSource;

    @Override
    public List<TaskReflection> load(TaskIdentity identity) {
        // SELECT content FROM task_reflections WHERE task_id = ?
        // ORDER BY created_at DESC LIMIT maxReflections
    }

    @Override
    public void store(TaskIdentity identity, TaskReflection reflection) {
        // INSERT INTO task_reflections (task_id, content, created_at) VALUES (?, ?, ?)
    }
}

The TaskReflection record holds the improvement text and a timestamp. TaskIdentity includes the task description hash used for keying.

Disabling Reflection Selectively

Reflection is opt-in per task. Tasks without .reflect(true) are unaffected even if a ReflectionStore is configured on the ensemble. You can enable reflection for high-value tasks and leave it off for tasks where the instructions are stable or where the cost of an extra LLM call isn’t justified.

Ensemble.builder()
    .tasks(List.of(
        stableDataFetchTask,      // no reflection
        evolvingAnalysisTask,     // .reflect(true)
        stableFormattingTask      // no reflection
    ))
    .reflectionStore(store)
    .chatModel(model)
    .build()
    .run();

The store is queried and written only for tasks with reflection enabled.

Tradeoffs

Reflection adds an LLM call per reflective task per run. For tasks that run thousands of times, this adds up. The cost is bounded if reflections converge — if the task’s instructions become stable after a few runs, reflections may produce no new improvements and Optional.empty() returns more often.

Reflections can drift. If a task’s purpose changes — the description is updated, the downstream context changes, the data it processes shifts — earlier reflections may no longer apply. maxReflections helps here by aging out old improvements. For significant task changes, clearing the stored reflections for that task is reasonable.

Reflection is not a substitute for good initial instructions. A task with fundamentally unclear instructions will accumulate reflections that patch around the ambiguity. The better use is to start with reasonable instructions and use reflection to sharpen them in response to real outputs over time.

The original task definition is never modified. All improvements live in the store. This is a deliberate choice: the source of truth for what a task does remains in code, not in a mutable prompt that silently drifts over time.

Guide: Task Reflection | Design: Task Reflection | GitHub

AgentEnsemble is open-source under the MIT license.