Building a Mission-Critical Integration Blind, Part II: 100,000 Requests Later

Wait 5 sec.

Part 2 of 2. Part 1 covers building the service from scratch — including when you have no access to the systems you're integrating with.The service had been in production for a couple of months. Time to find out what it actually looked like under load.Not a synthetic benchmark — a realistic test using actual API calls, real request parameters sourced from public business registries, and the full processing pipeline end to end. The goal: understand the behavior at scale, find the bottlenecks, and fix what needs fixing.The results were not what I expected. Our system wasn't the problem.Building the Test 💡 LLM Tip: Generate the tooling, not just the tests I needed a Python script with configurable batch submission, parallel async polling, timing collection (submission latency, end-to-end processing time), parameter verification, and a structured report with status breakdowns and p95 latencies. I also needed realistic test data — actual company and sole proprietor identifiers guaranteed to return results. These were sitting in reference documents in various formats. Both the test framework and the data extraction were done in 2 hours with Claude. Without LLM assistance: a full day of setup before any testing could begin. The data parsing alone — extracting structured records from multi-format government reference documents — would have been tedious work.What the Numbers ShowedIn about 3 minutes, with only minor delays, 100,000 requests were sent. The only synchronous part of the project—the API—worked without any issues.| Metric | Value ||----|----|| CPU (all job containers, peak) | 0.047 cores || Memory (peak per container) | ~128 MB || Lock contention on the message queue | None observed || Requests completed within 1 hour | ~6000 || Requests waiting on the external system | ~94,000 || Job replicas during test | 2 || Internal messages per request | ~10–15 || Peak queue depth | ~150k messages |CPU at 0.047 cores — five hundredths of a core. The workers were not compute-bound.The external system was the bottleneck. Just 6% requests were completed in the first hour. The rest were waiting on responses from the external infrastructure — some arriving hours later, some from a previous test run still pending days later.Our queue was empty. We were just waiting. This is expected behavior for a system with a five-working-day SLA, but worth confirming empirically. The test answered the question: our processing capacity is not the constraint.When responses did start arriving in bulk — thousands within a short window — a different picture emerged.Where Things Actually Got SlowThe Real Bottleneck: Worker CountEach request generates approximately 10-15 internal messages across its full lifecycle: state transitions, document processing, result dispatch, and audit events. So, 100,000 requests = up to 1,500,000 internal messages.With 2 job replicas, each processing one message at a time, the queue depth grew faster than it could be consumed when bulk responses arrived.Adding replicas produced near-linear throughput improvement:2 replicas → baseline throughput4 replicas → ~2x throughput 8 replicas → ~4x throughputLinear scaling is the result you want — it means the bottleneck is parallelism, not shared state or resource contention. The Doctrine transport's locking held up. The database was not the constraint.Next step: Kubernetes HPA configured to scale on queue depth, so replica count adjusts automatically.A Missing Composite IndexThe timeout retry service scans pending requests periodically, re-queues those waiting too long, and marks requests as failed after exceeding retry limits. Query pattern:SELECT * FROM requestsWHERE status_code = 'timeout'ORDER BY created_at;Separate indexes on status_code and created_at — but no composite index (just forgot it because of the hurry up). Under load with thousands of pending requests, PostgreSQL wasn't using both. One fix:CREATE INDEX idx_requests_status_createdON requests (status_code, created_at);Rule: if you have a WHERE clause and an ORDER BY that always appear together, they probably belong in one composite index. Obvious in retrospect.Problems Fixed Before the Load TestA few issues found during development — documented here because they're instructive.The sleep(10) Anti-PatternWhen a response includes file references, the service needs all referenced documents processed before marking the request complete. Initial implementation:if ($request->getQuantityFiles() !== $processedFiles) { sleep(10); return new AsyncRequestMessage(requestId: $request->getId());}Worker blocked for 10 seconds doing nothing. Under load with many file-heavy requests, workers pile up, sleeping, starving other messages.The fix: re-dispatch with a delay stamp, releasing the worker immediately.if ($request->getQuantityFiles() !== $processedFiles) { return new Envelope( new AsyncRequestMessage(requestId: $request->getId()), [new DelayStamp(10000), new RetryCountStamp(attempts: $attempts + 1)] );}The worker returns immediately. Message reappears after 10 seconds. The worker handles other messages in the meantime. Functionally identical behavior, completely different resource profile under load.The DispatchAfterCurrentBusStamp TrapWe used DispatchAfterCurrentBusStamp on the first dispatched message — intent: ensure the controller returns its HTTP response before processing begins.This stamp is designed for use within message handlers, where it defers dispatch until after the current handler completes. From an HTTP controller, the behavior is different: if a worker picks up the message before the HTTP response is flushed, the controller can block. We saw this intermittently under load — the endpoint hung while a worker was processing the newly dispatched message.Fix: a small initial delay instead.$this->messageBus->dispatch( new AsyncRequestMessage(requestId: $requestId), [new DelayStamp(1000)]);Controller returns UUID. Message available 1 second later. No race possible.Lesson: understand what Symfony stamps actually do before reaching for them. The documentation describes the intent; the behavior under specific conditions requires testing.Streaming Document DeliveryDocuments from external file storage are streamed — read in chunks, written directly to the response, never buffered in full. The external storage API was idiosyncratic: inconsistent Content-Length headers, occasional mid-stream errors, responses compressed as ZIP archives requiring on-the-fly extraction."Stream it" is not a complete implementation. A mid-stream failure, leaving a client with a partial response, is worse than a clean error. Graceful handling of connection drops, proper error propagation through the stream chain, and cleanup on partial delivery required explicit attention at every boundary.Cleaning Doctrine after handlingEntityManager::clear() between messages — long-running workers accumulate entities in Doctrine's Identity Map; explicit clearing prevents memory growth.Honest LimitationsThe service works well. It also has rough edges we've accepted.The external system occasionally returns schema-invalid XML. We handle it — log it, mark the request for manual inspection — but we don't fully understand the pattern. Some edge cases in XAdES validation we still don't fully trust; our test coverage for malformed signature scenarios is thinner than I'd like. And a small percentage of requests with complex file structures still require occasional manual intervention when the document storage returns unexpected responses.These aren't blockers. But they're real, and pretending otherwise would be dishonest.What's Next?The service is in production. Immediate roadmap:Kubernetes HPA on queue depth — we confirmed linear scaling works, now automate it with KEDAPer-step timing instrumentation — we know the external system is the ceiling, but we don't have granular data on internal step latency. Profiling will tell us if XML generation or signing steps have optimization potentialRabbitMQ/Redis transport evaluation — not because PostgreSQL is failing, but to understand the tradeoff empirically before we need itMost integration architecture isn't about elegance.It's about building reliable software around systems you can't access, can't fully trust, and can't control — and surviving long enough for the real system to finally reveal itself.The load test confirmed what the architecture was designed to handle: our processing is correct and scales linearly. The ceiling is external, not internal. That's where you want it.Building something similar? I'd be happy to discuss specifics in the comments.