Humans as Participants, Not Controllers: Designing Agent Systems That Run Without You

Apr 30, 2026

Most human-in-the-loop designs treat humans as gatekeepers. The agent pipeline pauses, a notification fires, a human reviews and approves, the pipeline continues. If the human is not there, the system waits. If the human takes too long, the system times out.

This works for simple approval workflows. It does not work for systems that need to run autonomously for hours or days while humans come and go.

The harder design problem is: how do you build agent systems where humans are participants in the system rather than controllers of it? Where the system runs without them, benefits from their presence, and does not break when they leave?

The Controller Model vs the Participant Model

In the controller model, the human is a required step in the pipeline. The system cannot proceed without them. If the human is unavailable, the system blocks. Every approval gate is a potential bottleneck.

In the participant model, the human connects to a running system, observes its current state, provides input where useful, makes decisions that require their authority, and disconnects. The system keeps running.

The distinction is not about removing humans from the loop. It is about changing the default from “blocked, waiting for human” to “running autonomously, human welcome.”

The Interaction Spectrum

Not all human interactions have the same urgency or the same blocking requirement. The design uses a five-level spectrum:

Level	Example	Behavior
Autonomous	Housekeeping cleans rooms after checkout	No human needed
Advisory	Manager says “prioritize VIP guest”	Human input welcomed but not required
Notifiable	”Water leak detected in room 305”	Alert a human, proceed with best-effort response
Approvable	Guest requests late checkout	Ask human if available, auto-approve on timeout
Gated	Opening the hotel safe	Cannot proceed without human authorization

Most interactions in a well-designed system should fall in the first three levels. The system handles them autonomously. Humans are notified of important events but do not need to take action for the system to continue.

The gated level is reserved for decisions that genuinely require human authority — security decisions, compliance gates, large financial commitments. These are intentionally rare and intentionally blocking.

Gated Reviews with Role Requirements

When a task requires human authorization, the review specifies who can approve:

Task openSafe = Task.builder()
    .description("Open the hotel safe for cash reconciliation")
    .review(Review.builder()
        .prompt("Manager authorization required to open the safe")
        .requiredRole("manager")
        .timeout(Duration.ZERO)  // no timeout -- wait until a human decides
        .build())
    .build();

When this review fires and no qualified human is connected:

The review is queued
An out-of-band notification is sent (Slack, email, webhook)
The task waits
When a qualified human connects to the dashboard, they see the pending review immediately
They approve or reject, and the task resumes

The key design choice: timeout(Duration.ZERO) means the system waits indefinitely. This is appropriate for decisions that genuinely cannot be made without human authority. For less critical approvals, a timeout with auto-approve provides the fallback:

Review.builder()
    .prompt("Guest requests late checkout -- approve?")
    .requiredRole("front-desk")
    .timeout(Duration.ofMinutes(10))
    .timeoutDecision(ReviewDecision.APPROVE)
    .build()

If no human responds within 10 minutes, the system auto-approves. The human can still intervene within the window, but the system does not block indefinitely for a non-critical decision.

Human Directives

Humans can inject guidance into any ensemble they have access to:

{
  "type": "directive",
  "to": "room-service",
  "from": "manager:human",
  "content": "Guest in 801 is VIP, prioritize all their requests"
}

Directives are non-blocking. They do not pause the system or wait for acknowledgment. They are injected as additional context for future task executions. The next time room service processes a request related to room 801, the directive is included in the prompt context.

This models how human managers actually work. A hotel manager does not approve every room service order. They walk through the hotel, observe what is happening, and give occasional direction: “That table needs attention.” “The VIP in the penthouse gets priority.” Then they move on.

Control Plane Directives

Beyond natural language guidance, humans (or automated policies) can send structured control plane directives:

{
  "type": "directive",
  "to": "kitchen",
  "from": "cost-policy:automated",
  "action": "SET_MODEL_TIER",
  "value": "FALLBACK"
}

This switches the kitchen ensemble to a cheaper LLM model without restarting. The ensemble has configurable model tiers:

Ensemble.builder()
    .chatLanguageModel(gpt4)        // primary
    .fallbackModel(gpt4Mini)        // cheaper fallback
    .build();

Other control plane actions include pausing an ensemble, adjusting priority weights, enabling or disabling specific shared tasks, and changing queue depth limits. These are operational controls that affect ensemble behavior at runtime without redeployment.

Late-Join State Synchronization

When a human connects to the dashboard — whether it is their first time today or they are reconnecting after a network interruption — they need to see the current state of the system immediately. They should not have to wait for events to stream in before understanding what is happening.

The existing late-join mechanism (from v2.1.0’s agentensemble-web module) extends to the network level. When a human connects:

The dashboard sends a hello message with the human’s identity and roles
Each ensemble the human has access to sends a snapshotTrace — the current state of all active tasks, pending reviews, queue depths, and recent events
Live events start streaming immediately

The human is caught up within seconds of connecting. Pending reviews that match their role are highlighted. They can start making decisions immediately without waiting for context to accumulate.

Operational Resilience

The participant model enables several operational patterns that the controller model cannot support:

Elastic scaling with human oversight. A conference weekend means higher load. The system scales automatically (K8s HPA watching queue depth). The human manager connects, observes the scaled-up state, adjusts priorities if needed, and disconnects. The system handles the load autonomously.

Operational profiles. Predefined configurations for known scenarios:

NetworkProfile sportingEvent = NetworkProfile.builder()
    .name("sporting-event-weekend")
    .ensemble("front-desk", Capacity.replicas(4).maxConcurrent(50))
    .ensemble("kitchen", Capacity.replicas(3).maxConcurrent(100))
    .ensemble("room-service", Capacity.replicas(3).maxConcurrent(80))
    .preload("kitchen", "inventory", "Extra beer and ice stocked")
    .build();

network.applyProfile(sportingEvent);

A human can apply a profile, or profiles can activate on a schedule or via rules.

Simulation and chaos engineering. Before the conference, simulate the expected load: “What happens if kitchen goes down during peak dinner service?” Run a simulation with mock LLMs, time-compressed. Get a capacity report. Then inject a kitchen failure as a chaos test. Assert that room service’s circuit breaker opens within 30 seconds and the fallback activates within 1 minute. These are built into the framework, not bolted on.

Federation. Hotel A is at capacity. Hotel B across town has idle kitchen capacity. Overflow requests route to Hotel B automatically. The human manager sees both hotels on the same dashboard. This is the network-of-networks level — multiple independent agent systems sharing capacity when needed.

Tradeoffs

Autonomy vs oversight. The more autonomous the system, the less opportunity for human correction before a mistake propagates. The mitigation is observability: the system runs autonomously but every decision is traced, logged, and visible. Humans review after the fact and inject directives to adjust future behavior.

Gating cost. Every gated review is a potential bottleneck and a source of latency. The design pressure is to minimize gated interactions — reserve them for decisions that genuinely require human authority. If you find yourself gating routine operations, the system design needs revision, not more human approvals.

Notification fatigue. A system that notifies humans about everything trains them to ignore notifications. The notification levels (autonomous, advisory, notifiable, approvable, gated) exist to keep the signal-to-noise ratio high. Most things should be autonomous. Notifications should be reserved for things that actually need attention.

Simulation fidelity. Simulations use mock LLMs and time compression. The behavior will not perfectly match production. The value is in finding structural problems — capacity bottlenecks, missing fallbacks, broken circuit breakers — not in predicting exact outcomes.

This is the third and final post in the Ensemble Network architecture arc. The architecture is planned for AgentEnsemble v3.0.0. The previous posts cover ensembles as services and cross-ensemble delegation.

The design document covers the full architecture including discovery, error handling, versioning, security, testing, and the phased delivery plan.

AgentEnsemble is open-source under the MIT license.