Skip to content
AgentEnsemble AgentEnsemble
Get Started

Task Sharing vs Tool Sharing: Cross-Ensemble Delegation in Distributed Agent Systems

MCP (Model Context Protocol) gives agents the ability to call tools hosted by other services. This is useful — it is function-level interoperability. An agent calls a function, gets a result, continues.

But there is a level above function calls that most frameworks have not addressed: what happens when one autonomous agent system needs to delegate a complex, multi-step process to another autonomous agent system?

The distinction matters. Calling a tool is like borrowing a calculator. Delegating a task is like hiring a department.


When agent ensembles run as long-lived services on a network (as described in the previous post), they need to share capabilities with each other. There are two fundamentally different kinds of sharing:

Tool sharing exposes a single function. The calling agent invokes it in its ReAct loop, gets a result, and continues reasoning. The tool executes atomically — there is no multi-step process, no internal agents, no review gates. This is what MCP provides.

Task sharing exposes a complete process. The calling ensemble delegates work to another ensemble, which runs its own agents, tools, memory, and review gates to produce the result. The caller does not know or control the internal process. It hands off work and gets back a result.

// Room service uses both kinds of sharing from kitchen
Ensemble roomService = Ensemble.builder()
.name("room-service")
.chatLanguageModel(model)
.task(Task.builder()
.description("Handle guest room service request")
.tools(
// Task sharing: delegates the full meal preparation process
NetworkTask.from("kitchen", "prepare-meal"),
// Tool sharing: calls a single function for inventory check
NetworkTool.from("kitchen", "check-inventory"),
NetworkTool.from("kitchen", "dietary-check"),
// Task sharing: delegates repair work to maintenance
NetworkTask.from("maintenance", "repair-request"))
.build())
.build();

Both NetworkTask and NetworkTool implement the same AgentTool interface. The agent calling them does not know whether a tool is local or remote, or whether it triggers a single function or an entire pipeline. The existing ReAct loop, tool executor, metrics, and tracing all work unchanged.


When an agent calls a shared tool, the flow is straightforward:

  1. Agent calls check-inventory("wagyu beef")
  2. NetworkTool serializes the call into a WorkRequest
  3. Request is sent to the kitchen ensemble (WebSocket or queue)
  4. Kitchen executes inventoryTool.execute("wagyu beef") locally
  5. Result flows back: "Yes, 3 portions available"
  6. Agent continues its ReAct loop

When an agent calls a shared task, the flow involves a full pipeline on the other side:

  1. Agent calls prepare-meal("Wagyu steak, medium-rare, room 403")
  2. NetworkTask serializes a WorkRequest with the full task context
  3. Request is sent to kitchen
  4. Kitchen runs its complete task pipeline — agent synthesis, tool calls, execution, review gates
  5. Result flows back: "Preparing now, estimated 25 minutes, ticket #4071"
  6. Agent continues

The critical difference: in step 4 of the task delegation, the kitchen ensemble is running its own agents with its own tools and its own review gates. The room service agent is not involved in any of that. It delegated the work and is waiting for a result — or continuing with other work if the request was async.


Every cross-ensemble message uses a standardized envelope:

public record WorkRequest(
String requestId, // Correlation + idempotency key
String from, // Requesting ensemble name
String task, // Shared task or tool name to execute
String context, // Natural language input/context
Priority priority, // CRITICAL / HIGH / NORMAL / LOW
Duration deadline, // Caller's SLA ("I need this within...")
DeliverySpec delivery, // How and where to return the result
String traceContext, // W3C traceparent for distributed tracing
CachePolicy cachePolicy, // USE_CACHED / FORCE_FRESH
String cacheKey // Optional, for result caching
) {}

A few design choices in this envelope are worth noting:

The context field is natural language. When maintenance asks procurement to order parts, the context is: “Order replacement valve for building 2 boiler.” Not a typed JSON schema. Not a protobuf message. Natural language that the receiving ensemble’s LLM interprets.

The deadline belongs to the caller, not the provider. The requester sets the SLA: “I need this within 30 minutes.” The provider responds with an estimated completion time. If the estimate exceeds the deadline, the caller decides: accept the longer wait, try another provider (federation), or continue without.

Delivery is caller-specified. The requester tells the provider how to return the result — WebSocket for real-time, a durable queue for reliability, a webhook for external integration, or a shared store for polling.


This is the design choice I find most interesting and most debatable.

In traditional microservice architectures, services communicate via typed schemas — protobuf, OpenAPI, GraphQL. Schema versioning is a constant source of friction. A field name change breaks callers. A new required field breaks backwards compatibility. Teams spend significant effort on schema evolution, versioning policies, and migration tooling.

In the Ensemble Network, the contract between services is natural language. When maintenance tells procurement “order replacement parts for the boiler valve,” it does not matter whether procurement’s internal schema changed. The LLM on the receiving side interprets the request. Minor changes in wording do not break callers.

This works because the participants are LLMs, not deterministic parsers. An LLM that receives “order parts for the boiler” and an LLM that receives “purchase replacement components for the heating system” will produce equivalent behavior. The semantic intent is preserved even when the exact phrasing varies.

The tradeoff is real: you lose type safety. A typed schema guarantees that the data conforms to a specific shape. Natural language does not. If the receiving ensemble misinterprets the request, you get a wrong result, not a compile error. The mitigation is the same as elsewhere in agent systems: review gates, guardrails, and observability.


The caller decides how to wait for the result:

ModeBehaviorUse case
AwaitBlock until resultCritical path: “Can’t continue without this”
AsyncSubmit and continue; result delivered laterNon-critical: “Order towels when you get to it”
Await with deadlineWait up to N; then continue with partial/no resultBalanced: “Wait 30 min, then proceed with what I know”

The await-with-deadline mode is the most operationally useful. It lets the caller set a budget for how long to wait before continuing. If the provider delivers within the deadline, the caller uses the result. If not, it makes a decision: retry, use a fallback, or proceed without.


The provider’s default response to load is accept and queue, not reject. LLM tasks are not real-time request/response — they take seconds to hours. Everyone expects latency. The provider accepts the work into a priority queue and returns an estimated completion time:

{
"type": "task_accepted",
"requestId": "maint-7721",
"queuePosition": 7,
"estimatedCompletion": "PT45M"
}

Rejection only happens at hard limits — the queue itself is full. This “bend, don’t break” approach matches the reality of LLM workloads: capacity is elastic, latency is expected, and it is almost always better to queue work than to reject it.

Priority queuing ensures critical requests are processed first (CRITICAL > HIGH > NORMAL > LOW). Within the same priority, FIFO. Low-priority items age over time to prevent starvation.


Every WorkRequest carries a W3C traceparent header. When maintenance delegates to procurement, which delegates to logistics, the trace context propagates across all three. Open Jaeger (or any W3C-compatible tracing backend) and you see the full chain: which ensemble originated the request, how long each step took, where the bottleneck was.

This is standard distributed tracing, not a custom solution. The same infrastructure teams use for HTTP microservices works here. The difference is that each span may represent an LLM call that takes 30 seconds instead of a database query that takes 3 milliseconds.


Loose coupling vs type safety. Natural language contracts are resilient to change but do not guarantee correctness. Typed schemas guarantee correctness but are brittle to change. The right choice depends on how stable the interface is. For evolving, exploratory agent systems, natural language is pragmatic. For stable, high-volume interfaces, a typed schema wrapper may be worth the friction.

Latency tolerance. Cross-ensemble delegation adds network hops and queuing delays. A task that takes 10 seconds locally may take 2 minutes when delegated across a network. The architecture assumes latency tolerance — if your use case requires sub-second responses, delegation is the wrong pattern.

Failure modes. When the kitchen ensemble is down, room service’s prepare-meal call fails. The circuit breaker opens. The agent needs a fallback — suggest alternatives, queue the request for later, or inform the guest. Distributed systems fail in distributed ways. The framework provides the circuit breaker and fallback mechanisms, but the failure strategy is application-specific.

Observability cost. Every cross-ensemble request generates trace data, metrics, and log entries. In a busy network with many delegations, the observability overhead is non-trivial. The tracing infrastructure needs to handle the volume, and teams need dashboards that make sense of the flow.


This is the second post in a three-part arc on the Ensemble Network architecture. The next post covers human participation — how humans connect to and interact with a network of autonomous ensembles without becoming bottlenecks.

The design document covers the full architecture.

AgentEnsemble is open-source under the MIT license.