From Run-and-Exit to Always-On: When Agent Ensembles Become Services

Apr 24, 2026

Every multi-agent framework works the same way at its core. You define some agents, give them tasks, press go, get output. The agents exist for the duration of the run and then disappear.

This is fine for bounded problems: “research this topic and write a report.” But it does not model how real work gets done in production systems that need to be always-on, multi-domain, and human-augmented.

The question I kept coming back to was: what changes when an ensemble stops being a script and starts being a service?

Scripts vs Services

A script runs and exits. You invoke it, it does work, it returns a result, the process terminates. Every multi-agent framework today — CrewAI, AutoGen, LangGraph, AgentEnsemble v2.x — operates in this mode.

A service runs continuously. It handles work as it arrives, communicates with peers, maintains state between requests, and survives restarts. The difference is not just about uptime — it changes the entire interaction model.

When an ensemble is a script, it is invoked by something external. When an ensemble is a service, it participates in a network of other services. It can accept work from multiple sources, share capabilities with peers, and run proactive tasks on a schedule — all without an external orchestrator telling it what to do.

The Hotel Model

Consider a hotel. It is composed of departments: front desk, housekeeping, kitchen, room service, maintenance, procurement. Each department is autonomous — it has its own staff, processes, and expertise. These departments communicate with each other directly. Room service calls the kitchen to prepare a meal. Maintenance calls procurement to order spare parts.

The hotel runs continuously. The manager comes in at 8am, walks around, checks on things, gives some direction, handles decisions that require authority, and goes home at 6pm. The hotel does not stop when the manager leaves.

This maps directly to a distributed agent architecture:

Hotel concept	Agent system equivalent
A department	An ensemble — long-running, autonomous
Staff within a department	Agents and tasks within the ensemble
The intercom / phone system	WebSocket mesh — the message transport
A work order	A WorkRequest — the standard message envelope
The hotel directory	Service registry — ensembles discover each other
The duty manager	A human who connects via the dashboard to observe and intervene

The key observation: the hotel is not centrally orchestrated. There is no “manager agent” that routes every message. Departments handle their domain and communicate laterally.

Two Execution Modes

The existing one-shot mode remains unchanged:

EnsembleOutput output = Ensemble.run(model,
    Task.of("Research AI trends"),
    Task.of("Write a report"));

Tasks execute, output is returned, the ensemble is done. This is a “gig” — a bounded unit of work.

The new long-running mode turns the ensemble into a service:

Ensemble kitchen = Ensemble.builder()
    .name("kitchen")
    .chatLanguageModel(model)
    .task(Task.of("Manage kitchen operations"))

    // Share capabilities to the network
    .shareTask("prepare-meal", Task.builder()
        .description("Prepare a meal as specified")
        .expectedOutput("Confirmation with preparation details and timing")
        .build())
    .shareTool("check-inventory", inventoryTool)

    // Scheduled proactive task
    .scheduledTask(ScheduledTask.builder()
        .name("inventory-report")
        .task(Task.of("Check current inventory levels and report shortages"))
        .schedule(Schedule.every(Duration.ofHours(1)))
        .broadcastTo("hotel.inventory")
        .build())

    .build();

kitchen.start(7329);  // WebSocket server, K8s Service fronts this

In long-running mode, the ensemble:

Registers shared tasks and tools on the network
Accepts incoming work requests via WebSocket, queue, HTTP, or topic subscription
Processes work through a priority queue
Delivers results via the caller-specified delivery method
Runs scheduled proactive tasks on configured intervals
Continues until explicitly stopped or drained

The start(port) call is the boundary between script and service. Before it, the ensemble is a configuration. After it, the ensemble is an active participant in a network.

Work Ingress

When an ensemble becomes a service, work can arrive from multiple sources simultaneously:

Source	Description
WebSocket	Direct from another ensemble (real-time)
Queue	Pull from durable queue (Kafka, SQS, Redis Streams)
HTTP API	`POST /api/work` (external systems, scripts, CI pipelines)
Topic subscription	React to events from other ensembles
Schedule	Internal cron/interval (proactive tasks)

All sources normalize into the same internal format before entering the ensemble’s priority queue. The ensemble processes work by priority (CRITICAL > HIGH > NORMAL > LOW), with FIFO ordering within the same priority level.

This means an ensemble can simultaneously handle direct requests from peer ensembles, pull batch work from a queue, respond to events, and run scheduled health checks — without any of these mechanisms knowing about each other.

Deployment Model

Each ensemble deploys as a Kubernetes service — one or more pods behind a K8s Service resource. Ensembles discover each other via DNS name. This is standard infrastructure that operations teams already know how to manage.

Namespace: hotel-downtown
  +-- Service: kitchen
  +-- Service: room-service
  +-- Service: maintenance
  +-- Service: front-desk
  +-- Service: dashboard

Scaling is handled by Kubernetes HPA watching queue depth or request latency. Conference weekend with heavy kitchen load? Scale kitchen to 3 replicas. Off-peak Tuesday? Scale back to 1. The ensemble handles replica coordination through broadcast-claim delivery: a work request is offered to all replicas, and the first to claim it processes it.

What Changes

The shift from script to service changes several things:

Lifecycle management matters. A script that crashes restarts from scratch. A service that crashes needs graceful shutdown, drain logic, and state recovery. The ensemble supports a drain mode where it stops accepting new work, finishes in-flight tasks, and shuts down cleanly. On restart, it picks up queued work from durable sources.

Proactive work becomes possible. A script only does what you tell it to do. A service can schedule its own work — periodic inventory checks, health assessments, report generation. These scheduled tasks run on internal timers and broadcast results to interested subscribers.

Observability changes. A script that runs for 30 seconds needs a log. A service that runs for months needs a dashboard. The existing web module (WebSocket server, live trace streaming, late-join snapshot) extends naturally to the long-running model.

The human relationship changes. A script blocks on human input and times out. A service has humans who connect and disconnect. They observe the current state, give direction, handle decisions that need authority, and leave. The system keeps running. This is a deep enough topic that the next post in this series will cover it in detail.

Tradeoffs

Complexity vs capability. A script is simple: invoke it, get a result. A service requires infrastructure — Kubernetes, queues, monitoring, lifecycle management. If your workload is “run this pipeline once and give me the output,” the service model is unnecessary overhead.

Always-on cost. A script uses resources only while it runs. A service uses resources continuously, even when idle. For intermittent workloads, the cost calculus favors one-shot execution with on-demand scaling.

State management. Scripts are stateless by nature — they start fresh every time. Services accumulate state: queued work, scheduled tasks, shared memory, connection state. This state needs to be durable, recoverable, and observable.

When to use which. The one-shot mode is right for discrete, bounded problems. The long-running mode is right when the workload is continuous, when multiple domains need to communicate, when humans need to observe and participate without blocking, and when the system needs to be always-on.

Both modes coexist. An ensemble that runs as a long-running service can still execute individual tasks in one-shot mode internally. The architecture does not force a choice — it extends the existing model.

This is the first post in a three-part arc on the Ensemble Network architecture planned for v3.0.0. The next post covers cross-ensemble delegation — how ensembles share tasks and tools across service boundaries, and why the contract between them is natural language, not typed schemas.

The design document covers the full architecture.

AgentEnsemble is open-source under the MIT license.