Testing Distributed Agent Systems: Stubs, Recordings, and Isolation
Testing a single agent ensemble is already harder than testing most software: the output is non-deterministic, the execution path depends on LLM responses, and the number of iterations is unpredictable.
Testing a network of agent ensembles adds distributed system concerns on top of that: WebSocket connections between services, shared state across ensembles, capability discovery, and cross-ensemble delegation.
The testing problem
Section titled “The testing problem”An ensemble that delegates work via NetworkTask or NetworkTool has external dependencies. In tests, you need control over what those dependencies return without running real ensembles.
Stubs for predictable behavior
Section titled “Stubs for predictable behavior”NetworkTask.stub() returns canned responses without connecting to any real ensemble:
StubNetworkTask mealStub = NetworkTask.stub("kitchen", "prepare-meal", "Meal prepared: wagyu steak, medium-rare. Estimated 25 minutes.");
Ensemble roomService = Ensemble.builder() .chatLanguageModel(model) .task(Task.builder() .description("Handle room service request") .tools(mealStub) .build()) .build();Deterministic network behavior while the ensemble’s own LLM interactions remain non-deterministic.
Recordings for assertion
Section titled “Recordings for assertion”NetworkTask.recording() captures every request for later assertion:
RecordingNetworkTask recorder = NetworkTask.recording("kitchen", "prepare-meal");roomService.run();
assertThat(recorder.callCount()).isEqualTo(1);assertThat(recorder.lastRequest()).contains("wagyu");Testing patterns summary
Section titled “Testing patterns summary”| What to test | Tool | Approach |
|---|---|---|
| Ensemble uses network response correctly | NetworkTask.stub() | Canned response, deterministic |
| Ensemble sends correct request | NetworkTask.recording() | Capture and assert |
| Two ensembles work together | In-process transport | Real interaction, no network |
| End-to-end | WebSocket transport | Full integration test |
The design principle
Section titled “The design principle”Network behavior and business logic are separable concerns. Test doubles let you test business logic without infrastructure. In-process transport lets you test interaction without the network. Full integration tests verify everything works together.
Network testing tools are part of AgentEnsemble. The network testing guide covers the full API.