Task Sharing vs Tool Sharing: Cross-Ensemble Delegation in Distributed Agent Systems
MCP (Model Context Protocol) gives agents the ability to call tools hosted by other services. This is useful — it is function-level interoperability. An agent calls a function, gets a result, continues.
But there is a level above function calls that most frameworks have not addressed: what happens when one autonomous agent system needs to delegate a complex, multi-step process to another autonomous agent system?
The distinction matters. Calling a tool is like borrowing a calculator. Delegating a task is like hiring a department.
Two Kinds of Sharing
Section titled “Two Kinds of Sharing”When agent ensembles run as long-lived services on a network (as described in the previous post), they need to share capabilities with each other. There are two fundamentally different kinds of sharing:
Tool sharing exposes a single function. The calling agent invokes it in its ReAct loop, gets a result, and continues reasoning. The tool executes atomically — there is no multi-step process, no internal agents, no review gates. This is what MCP provides.
Task sharing exposes a complete process. The calling ensemble delegates work to another ensemble, which runs its own agents, tools, memory, and review gates to produce the result. The caller does not know or control the internal process. It hands off work and gets back a result.
// Room service uses both kinds of sharing from kitchenEnsemble roomService = Ensemble.builder() .name("room-service") .chatLanguageModel(model) .task(Task.builder() .description("Handle guest room service request") .tools( // Task sharing: delegates the full meal preparation process NetworkTask.from("kitchen", "prepare-meal"),
// Tool sharing: calls a single function for inventory check NetworkTool.from("kitchen", "check-inventory"), NetworkTool.from("kitchen", "dietary-check"),
// Task sharing: delegates repair work to maintenance NetworkTask.from("maintenance", "repair-request")) .build()) .build();Both NetworkTask and NetworkTool implement the same AgentTool interface. The agent calling them does not know whether a tool is local or remote, or whether it triggers a single function or an entire pipeline. The existing ReAct loop, tool executor, metrics, and tracing all work unchanged.
How Delegation Works
Section titled “How Delegation Works”When an agent calls a shared tool, the flow is straightforward:
- Agent calls
check-inventory("wagyu beef") NetworkToolserializes the call into a WorkRequest- Request is sent to the kitchen ensemble (WebSocket or queue)
- Kitchen executes
inventoryTool.execute("wagyu beef")locally - Result flows back:
"Yes, 3 portions available" - Agent continues its ReAct loop
When an agent calls a shared task, the flow involves a full pipeline on the other side:
- Agent calls
prepare-meal("Wagyu steak, medium-rare, room 403") NetworkTaskserializes a WorkRequest with the full task context- Request is sent to kitchen
- Kitchen runs its complete task pipeline — agent synthesis, tool calls, execution, review gates
- Result flows back:
"Preparing now, estimated 25 minutes, ticket #4071" - Agent continues
The critical difference: in step 4 of the task delegation, the kitchen ensemble is running its own agents with its own tools and its own review gates. The room service agent is not involved in any of that. It delegated the work and is waiting for a result — or continuing with other work if the request was async.
The WorkRequest Envelope
Section titled “The WorkRequest Envelope”Every cross-ensemble message uses a standardized envelope:
public record WorkRequest( String requestId, // Correlation + idempotency key String from, // Requesting ensemble name String task, // Shared task or tool name to execute String context, // Natural language input/context Priority priority, // CRITICAL / HIGH / NORMAL / LOW Duration deadline, // Caller's SLA ("I need this within...") DeliverySpec delivery, // How and where to return the result String traceContext, // W3C traceparent for distributed tracing CachePolicy cachePolicy, // USE_CACHED / FORCE_FRESH String cacheKey // Optional, for result caching) {}A few design choices in this envelope are worth noting:
The context field is natural language. When maintenance asks procurement to order parts, the context is: “Order replacement valve for building 2 boiler.” Not a typed JSON schema. Not a protobuf message. Natural language that the receiving ensemble’s LLM interprets.
The deadline belongs to the caller, not the provider. The requester sets the SLA: “I need this within 30 minutes.” The provider responds with an estimated completion time. If the estimate exceeds the deadline, the caller decides: accept the longer wait, try another provider (federation), or continue without.
Delivery is caller-specified. The requester tells the provider how to return the result — WebSocket for real-time, a durable queue for reliability, a webhook for external integration, or a shared store for polling.
Natural Language as Contract
Section titled “Natural Language as Contract”This is the design choice I find most interesting and most debatable.
In traditional microservice architectures, services communicate via typed schemas — protobuf, OpenAPI, GraphQL. Schema versioning is a constant source of friction. A field name change breaks callers. A new required field breaks backwards compatibility. Teams spend significant effort on schema evolution, versioning policies, and migration tooling.
In the Ensemble Network, the contract between services is natural language. When maintenance tells procurement “order replacement parts for the boiler valve,” it does not matter whether procurement’s internal schema changed. The LLM on the receiving side interprets the request. Minor changes in wording do not break callers.
This works because the participants are LLMs, not deterministic parsers. An LLM that receives “order parts for the boiler” and an LLM that receives “purchase replacement components for the heating system” will produce equivalent behavior. The semantic intent is preserved even when the exact phrasing varies.
The tradeoff is real: you lose type safety. A typed schema guarantees that the data conforms to a specific shape. Natural language does not. If the receiving ensemble misinterprets the request, you get a wrong result, not a compile error. The mitigation is the same as elsewhere in agent systems: review gates, guardrails, and observability.
Three Request Modes
Section titled “Three Request Modes”The caller decides how to wait for the result:
| Mode | Behavior | Use case |
|---|---|---|
| Await | Block until result | Critical path: “Can’t continue without this” |
| Async | Submit and continue; result delivered later | Non-critical: “Order towels when you get to it” |
| Await with deadline | Wait up to N; then continue with partial/no result | Balanced: “Wait 30 min, then proceed with what I know” |
The await-with-deadline mode is the most operationally useful. It lets the caller set a budget for how long to wait before continuing. If the provider delivers within the deadline, the caller uses the result. If not, it makes a decision: retry, use a fallback, or proceed without.
Capacity Management
Section titled “Capacity Management”The provider’s default response to load is accept and queue, not reject. LLM tasks are not real-time request/response — they take seconds to hours. Everyone expects latency. The provider accepts the work into a priority queue and returns an estimated completion time:
{ "type": "task_accepted", "requestId": "maint-7721", "queuePosition": 7, "estimatedCompletion": "PT45M"}Rejection only happens at hard limits — the queue itself is full. This “bend, don’t break” approach matches the reality of LLM workloads: capacity is elastic, latency is expected, and it is almost always better to queue work than to reject it.
Priority queuing ensures critical requests are processed first (CRITICAL > HIGH > NORMAL > LOW). Within the same priority, FIFO. Low-priority items age over time to prevent starvation.
Distributed Tracing
Section titled “Distributed Tracing”Every WorkRequest carries a W3C traceparent header. When maintenance delegates to procurement, which delegates to logistics, the trace context propagates across all three. Open Jaeger (or any W3C-compatible tracing backend) and you see the full chain: which ensemble originated the request, how long each step took, where the bottleneck was.
This is standard distributed tracing, not a custom solution. The same infrastructure teams use for HTTP microservices works here. The difference is that each span may represent an LLM call that takes 30 seconds instead of a database query that takes 3 milliseconds.
Tradeoffs
Section titled “Tradeoffs”Loose coupling vs type safety. Natural language contracts are resilient to change but do not guarantee correctness. Typed schemas guarantee correctness but are brittle to change. The right choice depends on how stable the interface is. For evolving, exploratory agent systems, natural language is pragmatic. For stable, high-volume interfaces, a typed schema wrapper may be worth the friction.
Latency tolerance. Cross-ensemble delegation adds network hops and queuing delays. A task that takes 10 seconds locally may take 2 minutes when delegated across a network. The architecture assumes latency tolerance — if your use case requires sub-second responses, delegation is the wrong pattern.
Failure modes. When the kitchen ensemble is down, room service’s prepare-meal call fails. The circuit breaker opens. The agent needs a fallback — suggest alternatives, queue the request for later, or inform the guest. Distributed systems fail in distributed ways. The framework provides the circuit breaker and fallback mechanisms, but the failure strategy is application-specific.
Observability cost. Every cross-ensemble request generates trace data, metrics, and log entries. In a busy network with many delegations, the observability overhead is non-trivial. The tracing infrastructure needs to handle the volume, and teams need dashboards that make sense of the flow.
This is the second post in a three-part arc on the Ensemble Network architecture. The next post covers human participation — how humans connect to and interact with a network of autonomous ensembles without becoming bottlenecks.
The design document covers the full architecture.
AgentEnsemble is open-source under the MIT license.