Transport SPI: Making Agent Network Infrastructure Pluggable

May 7, 2026

When agent ensembles become long-running services that communicate over a network, the communication layer becomes infrastructure. And infrastructure has a property that application code should not: it varies by deployment environment.

Development uses in-process queues. Staging might use Redis. Production runs Kafka. The application code — the agents, tasks, workflows — should not change between these environments. The question is where to draw the abstraction line.

The transport problem

An ensemble network needs several communication primitives:

Request queues — how work requests arrive at an ensemble
Delivery registries — how responses get routed back to the requester
Capability registries — how ensembles advertise and discover shared tasks and tools
Capacity tracking — how ensembles report their current load

Each of these has a natural in-process implementation (maps, queues, lists) and at least one distributed implementation (Kafka topics, Redis streams, service registries). If these are hardcoded to a specific backing store, every deployment environment change requires code changes.

The SPI design

AgentEnsemble defines transport as a set of Java interfaces — a Service Provider Interface — with pluggable implementations:

Transport transport = Transport.websocket("kitchen");

// Or for production with delivery guarantees
Transport transport = Transport.simple("kitchen", deliveryRegistry);

The Transport interface provides access to the individual primitives:

Primitive	Interface	Purpose
Request queue	`RequestQueue`	Inbound work request buffering
Delivery registry	`DeliveryRegistry`	Response routing back to callers
Capability registry	`CapabilityRegistry`	Shared task/tool advertisement

Each interface has a simple contract. RequestQueue, for example:

public interface RequestQueue {
    void enqueue(WorkRequest request);
    Optional<WorkRequest> poll(Duration timeout);
    int size();
}

The in-process implementation uses a LinkedBlockingQueue. The Kafka implementation produces to a topic and consumes with manual offset commits. Same interface, different backing.

Why this matters for agent systems

The transport SPI is not unusual as an architectural pattern — it is a standard dependency inversion. What makes it interesting in the agent context is what it enables.

Agent networks are inherently non-deterministic. Agents take variable time, produce variable output, and may fail in unpredictable ways. Adding infrastructure variability on top of that makes the system harder to reason about.

By isolating transport from application logic, you can:

Test with in-process transport — no containers, no network, deterministic ordering
Develop locally with WebSocket transport — real network behavior, zero infrastructure setup
Deploy to production with Kafka — durability, horizontal scaling, replay capability
Switch between environments — without touching agent code, task definitions, or workflow configuration

The capability registry

One of the more interesting transport primitives is the capability registry. When an ensemble shares a task or tool on the network, that capability needs to be discoverable by other ensembles.

CapabilityRegistry registry = transport.capabilityRegistry();
registry.register("prepare-meal", CapabilityType.TASK, "kitchen");
registry.register("check-inventory", CapabilityType.TOOL, "kitchen");

Optional<String> provider = registry.findProvider("prepare-meal");

In simple mode, this is an in-memory map. In production, it could be backed by a service registry, a shared database, or Kafka’s consumer group protocol. The application code that registers and discovers capabilities does not change.

Tradeoffs

Abstraction leaks. In-process queues have different ordering and delivery guarantees than Kafka topics. The SPI abstracts the interface but cannot fully abstract the semantics.

Configuration complexity. Each transport implementation has its own configuration. The SPI does not unify configuration — you still need environment-specific setup for each backing store.

Performance characteristics vary. In-process queues are nanosecond-scale. Kafka adds millisecond-scale latency. If your agent workflow is latency-sensitive, the transport choice matters.

The design principle

The useful insight is that agent network communication has a small number of well-defined primitives, and these primitives have natural implementations at every scale. Defining the primitives as interfaces lets the infrastructure decision be made at deployment time rather than at development time.

This is standard dependency inversion. It is not novel. But it is the foundation that makes everything else in the ensemble network possible — durable transport, discovery, federation, and capacity management all build on these same interfaces.

The transport SPI is part of AgentEnsemble. The durable transport guide covers the Kafka implementation in detail.