Skip to content
AgentEnsemble AgentEnsemble
Get Started

From Run-and-Exit to Always-On: When Agent Ensembles Become Services

Every multi-agent framework works the same way at its core. You define some agents, give them tasks, press go, get output. The agents exist for the duration of the run and then disappear.

This is fine for bounded problems: “research this topic and write a report.” But it does not model how real work gets done in production systems that need to be always-on, multi-domain, and human-augmented.

The question I kept coming back to was: what changes when an ensemble stops being a script and starts being a service?


A script runs and exits. You invoke it, it does work, it returns a result, the process terminates. Every multi-agent framework today — CrewAI, AutoGen, LangGraph, AgentEnsemble v2.x — operates in this mode.

A service runs continuously. It handles work as it arrives, communicates with peers, maintains state between requests, and survives restarts. The difference is not just about uptime — it changes the entire interaction model.

When an ensemble is a script, it is invoked by something external. When an ensemble is a service, it participates in a network of other services. It can accept work from multiple sources, share capabilities with peers, and run proactive tasks on a schedule — all without an external orchestrator telling it what to do.


Consider a hotel. It is composed of departments: front desk, housekeeping, kitchen, room service, maintenance, procurement. Each department is autonomous — it has its own staff, processes, and expertise. These departments communicate with each other directly. Room service calls the kitchen to prepare a meal. Maintenance calls procurement to order spare parts.

The hotel runs continuously. The manager comes in at 8am, walks around, checks on things, gives some direction, handles decisions that require authority, and goes home at 6pm. The hotel does not stop when the manager leaves.

This maps directly to a distributed agent architecture:

Hotel conceptAgent system equivalent
A departmentAn ensemble — long-running, autonomous
Staff within a departmentAgents and tasks within the ensemble
The intercom / phone systemWebSocket mesh — the message transport
A work orderA WorkRequest — the standard message envelope
The hotel directoryService registry — ensembles discover each other
The duty managerA human who connects via the dashboard to observe and intervene

The key observation: the hotel is not centrally orchestrated. There is no “manager agent” that routes every message. Departments handle their domain and communicate laterally.


The existing one-shot mode remains unchanged:

EnsembleOutput output = Ensemble.run(model,
Task.of("Research AI trends"),
Task.of("Write a report"));

Tasks execute, output is returned, the ensemble is done. This is a “gig” — a bounded unit of work.

The new long-running mode turns the ensemble into a service:

Ensemble kitchen = Ensemble.builder()
.name("kitchen")
.chatLanguageModel(model)
.task(Task.of("Manage kitchen operations"))
// Share capabilities to the network
.shareTask("prepare-meal", Task.builder()
.description("Prepare a meal as specified")
.expectedOutput("Confirmation with preparation details and timing")
.build())
.shareTool("check-inventory", inventoryTool)
// Scheduled proactive task
.scheduledTask(ScheduledTask.builder()
.name("inventory-report")
.task(Task.of("Check current inventory levels and report shortages"))
.schedule(Schedule.every(Duration.ofHours(1)))
.broadcastTo("hotel.inventory")
.build())
.build();
kitchen.start(7329); // WebSocket server, K8s Service fronts this

In long-running mode, the ensemble:

  • Registers shared tasks and tools on the network
  • Accepts incoming work requests via WebSocket, queue, HTTP, or topic subscription
  • Processes work through a priority queue
  • Delivers results via the caller-specified delivery method
  • Runs scheduled proactive tasks on configured intervals
  • Continues until explicitly stopped or drained

The start(port) call is the boundary between script and service. Before it, the ensemble is a configuration. After it, the ensemble is an active participant in a network.


When an ensemble becomes a service, work can arrive from multiple sources simultaneously:

SourceDescription
WebSocketDirect from another ensemble (real-time)
QueuePull from durable queue (Kafka, SQS, Redis Streams)
HTTP APIPOST /api/work (external systems, scripts, CI pipelines)
Topic subscriptionReact to events from other ensembles
ScheduleInternal cron/interval (proactive tasks)

All sources normalize into the same internal format before entering the ensemble’s priority queue. The ensemble processes work by priority (CRITICAL > HIGH > NORMAL > LOW), with FIFO ordering within the same priority level.

This means an ensemble can simultaneously handle direct requests from peer ensembles, pull batch work from a queue, respond to events, and run scheduled health checks — without any of these mechanisms knowing about each other.


Each ensemble deploys as a Kubernetes service — one or more pods behind a K8s Service resource. Ensembles discover each other via DNS name. This is standard infrastructure that operations teams already know how to manage.

Namespace: hotel-downtown
+-- Service: kitchen
+-- Service: room-service
+-- Service: maintenance
+-- Service: front-desk
+-- Service: dashboard

Scaling is handled by Kubernetes HPA watching queue depth or request latency. Conference weekend with heavy kitchen load? Scale kitchen to 3 replicas. Off-peak Tuesday? Scale back to 1. The ensemble handles replica coordination through broadcast-claim delivery: a work request is offered to all replicas, and the first to claim it processes it.


The shift from script to service changes several things:

Lifecycle management matters. A script that crashes restarts from scratch. A service that crashes needs graceful shutdown, drain logic, and state recovery. The ensemble supports a drain mode where it stops accepting new work, finishes in-flight tasks, and shuts down cleanly. On restart, it picks up queued work from durable sources.

Proactive work becomes possible. A script only does what you tell it to do. A service can schedule its own work — periodic inventory checks, health assessments, report generation. These scheduled tasks run on internal timers and broadcast results to interested subscribers.

Observability changes. A script that runs for 30 seconds needs a log. A service that runs for months needs a dashboard. The existing web module (WebSocket server, live trace streaming, late-join snapshot) extends naturally to the long-running model.

The human relationship changes. A script blocks on human input and times out. A service has humans who connect and disconnect. They observe the current state, give direction, handle decisions that need authority, and leave. The system keeps running. This is a deep enough topic that the next post in this series will cover it in detail.


Complexity vs capability. A script is simple: invoke it, get a result. A service requires infrastructure — Kubernetes, queues, monitoring, lifecycle management. If your workload is “run this pipeline once and give me the output,” the service model is unnecessary overhead.

Always-on cost. A script uses resources only while it runs. A service uses resources continuously, even when idle. For intermittent workloads, the cost calculus favors one-shot execution with on-demand scaling.

State management. Scripts are stateless by nature — they start fresh every time. Services accumulate state: queued work, scheduled tasks, shared memory, connection state. This state needs to be durable, recoverable, and observable.

When to use which. The one-shot mode is right for discrete, bounded problems. The long-running mode is right when the workload is continuous, when multiple domains need to communicate, when humans need to observe and participate without blocking, and when the system needs to be always-on.

Both modes coexist. An ensemble that runs as a long-running service can still execute individual tasks in one-shot mode internally. The architecture does not force a choice — it extends the existing model.


This is the first post in a three-part arc on the Ensemble Network architecture planned for v3.0.0. The next post covers cross-ensemble delegation — how ensembles share tasks and tools across service boundaries, and why the contract between them is natural language, not typed schemas.

The design document covers the full architecture.

AgentEnsemble is open-source under the MIT license.