Transport SPI: Making Agent Network Infrastructure Pluggable
When agent ensembles become long-running services that communicate over a network, the communication layer becomes infrastructure. And infrastructure has a property that application code should not: it varies by deployment environment.
Development uses in-process queues. Staging might use Redis. Production runs Kafka. The application code — the agents, tasks, workflows — should not change between these environments. The question is where to draw the abstraction line.
The transport problem
Section titled “The transport problem”An ensemble network needs several communication primitives:
- Request queues — how work requests arrive at an ensemble
- Delivery registries — how responses get routed back to the requester
- Capability registries — how ensembles advertise and discover shared tasks and tools
- Capacity tracking — how ensembles report their current load
Each of these has a natural in-process implementation (maps, queues, lists) and at least one distributed implementation (Kafka topics, Redis streams, service registries). If these are hardcoded to a specific backing store, every deployment environment change requires code changes.
The SPI design
Section titled “The SPI design”AgentEnsemble defines transport as a set of Java interfaces — a Service Provider Interface — with pluggable implementations:
Transport transport = Transport.websocket("kitchen");
// Or for production with delivery guaranteesTransport transport = Transport.simple("kitchen", deliveryRegistry);The Transport interface provides access to the individual primitives:
| Primitive | Interface | Purpose |
|---|---|---|
| Request queue | RequestQueue | Inbound work request buffering |
| Delivery registry | DeliveryRegistry | Response routing back to callers |
| Capability registry | CapabilityRegistry | Shared task/tool advertisement |
Each interface has a simple contract. RequestQueue, for example:
public interface RequestQueue { void enqueue(WorkRequest request); Optional<WorkRequest> poll(Duration timeout); int size();}The in-process implementation uses a LinkedBlockingQueue. The Kafka implementation produces to a topic and consumes with manual offset commits. Same interface, different backing.
Why this matters for agent systems
Section titled “Why this matters for agent systems”The transport SPI is not unusual as an architectural pattern — it is a standard dependency inversion. What makes it interesting in the agent context is what it enables.
Agent networks are inherently non-deterministic. Agents take variable time, produce variable output, and may fail in unpredictable ways. Adding infrastructure variability on top of that makes the system harder to reason about.
By isolating transport from application logic, you can:
- Test with in-process transport — no containers, no network, deterministic ordering
- Develop locally with WebSocket transport — real network behavior, zero infrastructure setup
- Deploy to production with Kafka — durability, horizontal scaling, replay capability
- Switch between environments — without touching agent code, task definitions, or workflow configuration
The capability registry
Section titled “The capability registry”One of the more interesting transport primitives is the capability registry. When an ensemble shares a task or tool on the network, that capability needs to be discoverable by other ensembles.
CapabilityRegistry registry = transport.capabilityRegistry();registry.register("prepare-meal", CapabilityType.TASK, "kitchen");registry.register("check-inventory", CapabilityType.TOOL, "kitchen");
Optional<String> provider = registry.findProvider("prepare-meal");In simple mode, this is an in-memory map. In production, it could be backed by a service registry, a shared database, or Kafka’s consumer group protocol. The application code that registers and discovers capabilities does not change.
Tradeoffs
Section titled “Tradeoffs”Abstraction leaks. In-process queues have different ordering and delivery guarantees than Kafka topics. The SPI abstracts the interface but cannot fully abstract the semantics.
Configuration complexity. Each transport implementation has its own configuration. The SPI does not unify configuration — you still need environment-specific setup for each backing store.
Performance characteristics vary. In-process queues are nanosecond-scale. Kafka adds millisecond-scale latency. If your agent workflow is latency-sensitive, the transport choice matters.
The design principle
Section titled “The design principle”The useful insight is that agent network communication has a small number of well-defined primitives, and these primitives have natural implementations at every scale. Defining the primitives as interfaces lets the infrastructure decision be made at deployment time rather than at development time.
This is standard dependency inversion. It is not novel. But it is the foundation that makes everything else in the ensemble network possible — durable transport, discovery, federation, and capacity management all build on these same interfaces.
The transport SPI is part of AgentEnsemble. The durable transport guide covers the Kafka implementation in detail.