Skip to content
AgentEnsemble AgentEnsemble
Get Started

A Control Plane for Long-Running Agent Services

An earlier post in this series covered running agent ensembles as long-running services — always-on processes that accept work over WebSocket, HTTP, queues, or topics instead of running once and exiting. Once an ensemble is a service, a new category of problem appears: how do external systems interact with it?

The existing WebSocket dashboard streams execution events and handles review decisions. That covers observability and human review. What it doesn’t cover is run submission. There’s no way for a CI pipeline, orchestrator, or custom UI to kick off a run, pass runtime parameters, query what’s currently executing, or cancel something that’s gone wrong — without a WebSocket connection and custom client code.

The Ensemble Control API fills that gap.

Before getting into the API itself, a design distinction worth stating explicitly.

The v3 network module handles ensemble-to-ensemble communication: tasks delegating work to remote peers, capability registries, federation across namespaces. That’s the data plane — ensemble-internal traffic, designed for ensemble peers.

The Control API is the control plane: CI pipelines, orchestrators, and custom UIs talking to an ensemble service. Different audience, different semantics. External systems shouldn’t need a WebSocket client, shouldn’t need to understand the ensemble networking protocol, and shouldn’t be treated as ensemble peers. The REST-first design reflects that distinction.

Four endpoints on the same Javalin server as the WebSocket dashboard — no new port, no new process:

POST /api/runs Submit a run with input variables
GET /api/runs List recent runs (filterable by status, tag)
GET /api/runs/{runId} Get full run detail (status, task outputs, metrics)
GET /api/capabilities List registered tools, models, and preconfigured tasks

The API is activated by adding catalogs to WebDashboard.builder():

ToolCatalog tools = ToolCatalog.builder()
.tool("web_search", webSearchTool)
.tool("calculator", calculatorTool)
.build();
ModelCatalog models = ModelCatalog.builder()
.model("sonnet", claudeSonnetModel)
.model("haiku", claudeHaikuModel)
.build();
WebDashboard dashboard = WebDashboard.builder()
.port(7329)
.toolCatalog(tools)
.modelCatalog(models)
.maxConcurrentRuns(5)
.maxRetainedCompletedRuns(100)
.build();

The ensemble wires in the dashboard:

Ensemble.builder()
.chatLanguageModel(claudeSonnetModel)
.webDashboard(dashboard)
.task(Task.builder()
.description("Research {topic} focusing on recent developments in {year}")
.tools(webSearchTool)
.build())
.task(Task.builder()
.description("Write a concise executive summary of the research")
.build())
.build()
.start(7329);

ToolCatalog and ModelCatalog serve two purposes. They make the API transport-agnostic (JSON refers to tools and models by name, not class). And they act as allowlists — only registered tools and models can be used. Dynamic task creation in later phases cannot instantiate arbitrary code.

POST /api/runs submits the pre-configured ensemble tasks with variable substitution:

{
"inputs": {
"topic": "AI safety",
"year": "2025"
},
"tags": {
"triggeredBy": "ci-pipeline",
"environment": "staging"
}
}

Response (202 Accepted):

{
"runId": "run-7f3a2b",
"status": "ACCEPTED",
"tasks": 2,
"workflow": "SEQUENTIAL"
}

The run executes asynchronously — the response is immediate. Poll GET /api/runs/{runId} for completion. Tags are arbitrary metadata for filtering and auditing. An empty body submits the template ensemble with no substitution. If maxConcurrentRuns is reached, the response is 429 with a retryAfterMs hint.

GET /api/capabilities exposes what’s registered:

{
"tools": [
{ "name": "web_search", "description": "Search the web using Google" },
{ "name": "calculator", "description": "Evaluate mathematical expressions" }
],
"models": [
{ "alias": "sonnet", "provider": "anthropic" },
{ "alias": "haiku", "provider": "anthropic" }
],
"preconfiguredTasks": [
{ "description": "Research {topic} focusing on recent developments in {year}" },
{ "description": "Write a concise executive summary of the research" }
]
}

GET /api/runs/{runId} returns full run detail including task outputs and metrics. GET /api/runs lists recent runs filterable by ?status=RUNNING, ?status=COMPLETED, or ?tag=triggeredBy:ci-pipeline.

Phase 2: The Three-Level Run Submission Model

Section titled “Phase 2: The Three-Level Run Submission Model”

The most interesting design decision in the Control API is the graduated run submission model. There are three levels, each more dynamic than the last.

Level 1 (covered above): substitute template variables into the pre-configured ensemble. The simplest and most constrained option — the Java code defines what runs.

Level 2: override specific fields of individual tasks at runtime.

Level 3: define a new task list entirely in the POST body, without changing any Java code.

This graduated approach keeps the simple case simple while making the more dynamic cases possible without abandoning the safety properties of the catalog model.

To use Levels 2 and 3 effectively, tasks can be given logical names:

Task.builder()
.name("researcher")
.description("Research {topic} focusing on recent developments in {year}")
.tools(webSearchTool)
.build()

GET /api/capabilities returns task names alongside descriptions. Level 2 override keys match by exact name first, then by description prefix (first 50 characters, case-insensitive) as a fallback.

taskOverrides lets a caller change a specific task’s description, model, tools, or context without recompilation:

{
"inputs": { "topic": "AI safety" },
"taskOverrides": {
"researcher": {
"description": "Research {topic} focusing on EU AI Act compliance",
"expectedOutput": "A regulatory analysis report with citations",
"model": "sonnet",
"maxIterations": 15,
"additionalContext": "The EU AI Act was formally adopted in March 2024.",
"tools": {
"add": ["web_search"],
"remove": ["calculator"]
}
}
}
}

The override key ("researcher") is matched against the template ensemble’s task names. If no matching task exists, the request is rejected with 400. The original task objects are never mutated — Task.toBuilder() creates modified copies.

All tool references are resolved against the ToolCatalog and all model references against the ModelCatalog. A caller cannot inject a tool or model that was not pre-registered.

When tasks is provided in the request body, the template ensemble’s task list is replaced entirely. The template’s model, catalogs, and configuration are preserved — only the task list changes:

{
"tasks": [
{
"name": "researcher",
"description": "Research the competitive landscape for {product}",
"expectedOutput": "A competitive analysis identifying 5 key competitors",
"tools": ["web_search"],
"model": "sonnet",
"maxIterations": 20
},
{
"name": "writer",
"description": "Write an executive brief based on the research",
"expectedOutput": "A 1-page executive summary suitable for C-suite",
"context": ["$researcher"],
"model": "sonnet"
}
],
"inputs": { "product": "AgentEnsemble" }
}

The context field declares dependencies between tasks. $researcher references the task named "researcher"; $0 references the task at index 0. The scheduler infers the workflow type from these dependencies — if context references exist and no workflow is explicitly set, PARALLEL (DAG-based) is used. Circular dependencies and unknown references are rejected at submission time.

REST isn’t the only submission channel. WebSocket clients can submit runs using the run_request message — useful for browser-based UIs that already have a dashboard connection:

{
"type": "run_request",
"requestId": "req-1",
"inputs": { "topic": "AI safety" },
"tags": { "env": "staging" }
}

The server acknowledges immediately with run_ack. On completion it sends run_result to the originating session only — the existing ensemble_completed broadcast continues to go to all connected clients unchanged.

Two operations that apply to in-flight runs.

POST /api/runs/{runId}/cancel cancels a running or accepted run. This is cooperative cancellation — the current in-flight task completes normally; cancellation takes effect before the next task starts.

{ "runId": "run-abc", "status": "CANCELLING" }

The same operation is available over WebSocket: { "type": "run_control", "runId": "run-abc", "action": "cancel" }.

The cooperative model is intentional. A task mid-execution is mid-LLM-call. Interrupting that immediately would leave the ensemble in an undefined state. Completing the current task and stopping cleanly at the boundary gives deterministic behavior without losing progress already made.

POST /api/runs/{runId}/model switches which LLM subsequent tasks will use:

{ "model": "haiku" }

The switch takes effect on the next LLM call; the in-flight call completes with the previous model. The model alias must be registered in the ModelCatalog. This is useful when a long-running ensemble is partway through and you want subsequent tasks to use a cheaper or faster model.

The existing WebSocket dashboard broadcasts all execution events to all connected sessions. Phase 4 adds filtering and an HTTP-native alternative.

WebSocket clients can subscribe to a specific subset of events:

{ "type": "subscribe", "events": ["task_started", "task_completed", "run_result"] }

Or filter to a specific run:

{ "type": "subscribe", "events": ["run_result"], "runId": "run-abc" }

Reset to all events with "events": ["*"]. The server responds with a subscribe_ack confirming the effective subscription.

For HTTP-only clients — curl scripts, serverless functions, server-side integrations — a WebSocket connection is awkward. The SSE endpoint offers the same event stream over a regular HTTP connection:

GET /api/runs/{runId}/events
Accept: text/event-stream

For completed runs, stored events replay immediately and the connection closes. For in-progress runs, events stream until the run completes. A from parameter supports reconnection by resuming from a specific position in the stored output.

Phase 5 rounds out the API with three operations that were previously only available through the WebSocket dashboard or by interacting with a running Java process directly.

The human-in-the-loop system generates review gates where a reviewer approves, edits, or rejects task output before the ensemble proceeds. Phase 5 exposes this over REST, so server-side systems (Slack bots, CI pipelines) can automate or route review decisions:

POST /api/reviews/{reviewId}
{ "decision": "CONTINUE" }

For edits:

{ "decision": "EDIT", "revisedOutput": "Updated output..." }

Discover pending reviews:

GET /api/reviews
GET /api/reviews?runId=run-abc

Inject a directive into a running ensemble’s DirectiveStore. The directive is picked up on the next LLM iteration of any agent in the ensemble:

POST /api/runs/{runId}/inject
{ "content": "Focus on EU AI Act compliance", "target": "researcher" }

This is the REST equivalent of what the dashboard allows through the live run view — useful for server-side automation that needs to steer a run mid-execution.

Execute a registered tool from the ToolCatalog without running a full ensemble:

POST /api/tools/calculator/invoke
{ "input": "What is 42 * 17?" }

Response:

{ "tool": "calculator", "status": "SUCCESS", "output": "714", "durationMs": 2 }

This is useful for integration testing, for validating tool configuration, and for pipeline steps that need a single tool call without the overhead of an ensemble run.

The interesting question in a feature like this is where the boundary sits between the control plane and the data plane.

The v3 network module already has capability queries (CapabilityQueryMessage), task delegation (NetworkTask/NetworkTool), and directives (DirectiveMessage). The Control API exposes similar operations — but over HTTP, for a different audience, with different security and access semantics.

The key distinction is the audience. External systems that should not need a WebSocket client and should not need to understand the ensemble networking protocol are not ensemble peers — they’re operators. The REST-first design, catalog-enforced allowlists, and graduated Level 1/2/3 submission model reflect that distinction throughout.


The Ensemble Control API is documented in the control API guide. The underlying design doc is design/28. Source is on GitHub.

I’d be interested in where the three-level submission model feels right or falls short. The boundary between Level 2 (override existing tasks) and Level 3 (define new tasks) is where the most design tension sits — curious whether that separation is useful or whether most real use cases collapse to one or the other.