Skip to content
AgentEnsemble AgentEnsemble
Get Started

Token-Efficient Context Passing: Pluggable Serialization for Multi-Agent Pipelines

In a multi-agent pipeline, structured data flows between tasks at every step. Task outputs become context for downstream tasks. Tool results are appended to the conversation. Memory entries are injected into prompts. All of this data is serialized as text and counted against the model’s context window.

The default serialization format is JSON. JSON is familiar, well-supported, and universally understood by LLMs. It is also verbose. Curly braces, quoted keys, commas, colons, and brackets consume tokens that carry no semantic value for the model. In a short pipeline with small payloads, this overhead is negligible. In a long pipeline with rich context — multiple tool calls per task, structured outputs flowing forward, memory entries accumulating — it compounds quickly.

The question I kept coming back to was: where does the serialization format actually matter in this pipeline, and can the framework make it pluggable without leaking complexity?


In a typical multi-agent workflow, structured data appears in four places:

LocationWhat gets serializedToken impact
Task contextOutputs from prior tasks injected into downstream promptsMedium — depends on output size
Tool resultsJSON payloads returned by tool executions during ReAct loopsHigh — tool results are often large and iterate multiple times
Memory entriesStructured content from memory scopesMedium — grows with pipeline length
Trace exportExecution traces serialized for analysisNone — not sent to the LLM

Tool results tend to dominate. A tool that queries a database or calls an API returns a JSON payload. In a ReAct loop with multiple iterations, these payloads accumulate in the conversation history. Each iteration adds more serialized data to the context window.

The framework already controls every one of these serialization points. It builds prompts, formats tool results, injects memory, and exports traces. That means a single configuration point can control the format everywhere, without requiring changes to task definitions, tool implementations, or agent logic.


The goal was a pluggable serialization layer with three properties:

  1. Opt-in: JSON remains the default. No existing code changes behavior.
  2. Fail-fast: If a format is selected but its runtime dependency is missing, the ensemble fails at build time with a clear error, not at runtime mid-pipeline.
  3. Single configuration point: One builder call controls all serialization points. No per-task or per-tool format settings.

The API surface is small:

public enum ContextFormat {
JSON,
TOON
}
public interface ContextFormatter {
String format(Object value);
String formatJson(String json);
}

ContextFormat is an enum selecting the serialization strategy. ContextFormatter is an interface with two methods: format(Object) for Java objects and formatJson(String) for re-encoding existing JSON strings. The distinction matters because tool results arrive as JSON strings that need to be converted, while task outputs and memory entries are Java objects that need to be serialized from scratch.

A factory class resolves the correct implementation:

ContextFormatter formatter = ContextFormatters.forFormat(ContextFormat.TOON);
String encoded = formatter.format(myObject);

TOON (Token-Oriented Object Notation) is a compact, human-readable serialization format designed specifically for LLM contexts. It combines YAML-like indentation with CSV-like tabular arrays and achieves 30-60% token reduction versus JSON.

JSON:

{"items":[{"sku":"A1","qty":2,"price":9.99},{"sku":"B2","qty":1,"price":14.5}]}

TOON:

items[2]{sku,qty,price}:
A1,2,9.99
B2,1,14.5

The token savings come from eliminating repeated key names in arrays (declared once in the header), removing quotes around keys, and using indentation instead of braces. For tabular data — which tool results and structured outputs frequently contain — the reduction is substantial.

JToon is the Java implementation. It is MIT-licensed, available on Maven Central, requires Java 17+, and supports Jackson annotations.


Enabling TOON is one builder call:

EnsembleOutput result = Ensemble.builder()
.chatLanguageModel(model)
.contextFormat(ContextFormat.TOON)
.task(researchTask)
.task(analysisTask)
.task(reportTask)
.build()
.run();

At build time, if TOON is selected, the framework verifies that the JToon class is loadable. If not, it throws an IllegalStateException with Maven and Gradle coordinates:

TOON context format requires the JToon library on the classpath.
Add to your build:
Gradle: implementation("dev.toonformat:jtoon")
Maven: <dependency><groupId>dev.toonformat</groupId><artifactId>jtoon</artifactId></dependency>

This is a deliberate design choice. JToon is declared as compileOnly in the framework, so applications that never use TOON pay no dependency cost. Applications that do use it add one dependency and one builder call.

The resolved ContextFormatter is stored in ExecutionContext and passed to every component that serializes data for the LLM: the prompt builder, the tool result formatter, and the memory injector. No component needs to know which format is active — it just calls formatter.format(value).


When the prompt builder constructs the user message, context from prior tasks and memory entries flows through the configured formatter. If the format is TOON, the data arrives in the prompt as TOON. The LLM reads it as context — it does not need to produce TOON output.

One important boundary: structured output schemas remain in JSON regardless of the context format. If a task has an outputType, the JSON schema in the prompt stays in JSON because the LLM needs to produce parseable JSON that the framework can deserialize. The context around the schema uses whatever format is configured.

Tool execution results are the highest-impact integration point. When a tool returns a JSON string, the framework can re-encode it via contextFormatter.formatJson(toolResultText) before appending it to the conversation. In a ReAct loop with multiple tool calls, each iteration’s results are formatted, and the savings compound across the conversation.

ExecutionTrace gains TOON export methods alongside the existing JSON ones:

EnsembleOutput result = ensemble.run();
// JSON (always available)
result.getTrace().toJson(Path.of("trace.json"));
// TOON (requires JToon on classpath)
result.getTrace().toToon(Path.of("trace.toon"));

Trace export is not sent to the LLM, so the token savings do not apply here. The benefit is smaller trace files for storage and analysis.


Compatibility vs savings: JSON is universally understood by every LLM. TOON is newer and less widely tested across models. For models that handle structured text well (GPT-4o, Claude, Gemini), TOON works reliably as context. For smaller or less capable models, JSON may be safer.

Debuggability vs compactness: When you are inspecting prompts during development, JSON is more familiar. TOON is human-readable but less immediately obvious. During development, you might use JSON and switch to TOON for production workloads where cost matters.

Cost vs complexity: TOON reduces token usage, which directly reduces API cost. In a production pipeline processing thousands of runs, 30-60% fewer tokens in context passing translates to measurable cost savings. The complexity cost is one dependency and one builder call.

Schema boundary: The structured output schema stays in JSON. This means a prompt with TOON context and a JSON schema contains mixed formats. In practice, this works because the LLM treats the context section and the schema section as separate concerns. But it is worth being aware of if you are debugging prompt construction.

The format is opt-in and the default is unchanged. If you never set contextFormat, nothing changes. The pluggable design means future formats can be added — a custom ContextFormatter implementation for a domain-specific format, or a future format optimized for a specific model family — without changing the public API.


The TOON format guide covers setup, configuration, and usage patterns. The design document covers the architectural decisions.

AgentEnsemble is open-source under the MIT license.