Your AI Agent Wants Microservices. Should You Trust It?
Ask an AI agent for a production-ready backend and you'll get five services, a broker, and a folder of Kubernetes manifests — because microservices are easy to generate, not because your system needs them. Here's how to make the agent justify every network boundary, and why a modular monolith is the better default.
The architecture looks impressive. That's the problem.
Ask an AI coding agent to design a scalable, production-ready backend. Watch what comes back.
An API gateway. Five services. A message broker. A database per service. Containers. Distributed tracing. A folder of Kubernetes manifests. It compiles. It ships with a clean architecture diagram. It looks like something built to grow.
Before you accept it, answer one question:
What problem did the fifth service solve?
Not what responsibility it owns. Not whether the boxes line up neatly in the diagram. What concrete problem requires it to deploy, fail, scale, and get someone paged for it — independently?
If you can't answer that, the agent didn't design for your scale. It pattern-matched to the shape of every backend tutorial on the internet, and handed you distributed-systems complexity before you have distributed-systems problems.
Here's the architecture the way the agent imagines it — tidy, modular, impressive:
And here's the part the diagram leaves out: every one of those boxes is something you now operate at 2 a.m.
Why agents reach for microservices
Microservices are easy to generate. That's the whole reason, and it has nothing to do with your requirements.
A service has an obvious, repeatable shape:
API endpoint
business logic
database
Dockerfile
deployment configThat pattern shows up in thousands of tutorials and repositories. The boundaries are explicit. Every component can be generated in isolation. The output looks professional and lands as a neat folder. For a model trained to produce plausible, complete-looking code, another service is the path of least resistance.
But the agent doesn't receive your future operational bill.
For the agent, another service is another folder. For you, it's another:
- deployment pipeline
- versioned API contract
- authentication and secrets path
- dashboard, alert, and on-call surface
- failure mode that didn't exist before
- version you have to keep backward-compatible
AI made writing the code cheaper. It did nothing to make operating a distributed system simpler — and those are not the same bill.
A module is not a service
Orders, Billing, Identity, Notifications — these should almost certainly be separate modules: distinct, well-owned areas of your domain. That's good design.
It does not follow that they need separate containers.
Inside a modular monolithDefinitionA single deployable application split internally into strictly-bounded modules that own their own data and communicate only through explicit interfaces — the ownership and boundaries of microservices without the network between them., Orders calls Billing through an explicit interface — a function call, in the same process. Inside microservices, that exact same call now needs timeout rules, retries, authentication, tracing, idempotencyDefinitionA property where performing the same operation twice has the same effect as performing it once. It's essential across a network, where a client can't tell a lost request from a slow one and will retry. keys, and partial failureDefinitionWhen one part of a distributed system is down or slow while the rest keeps running — leaving the overall system in a state that is neither fully working nor cleanly failed. handling.
The business logic didn't get more valuable. The transport got more expensive.
billing.charge(order) across a network boundary. Now it needs a timeout, a retry policy, a circuit breaker, authentication, request tracing, idempotency keys, and a plan for when Billing replies "...maybe." The round trip alone costs ~500,000 ns — before any of that logic runs.
billing.charge(order) in the same process. A function call. It returns in nanoseconds. If it fails, you get a stack trace, not a distributed-tracing investigation. The business logic is identical; the transport is free.
And the gap isn't a rounding error. An in-process call resolves in nanoseconds. A network round trip — same datacenter, nothing exotic — runs about 500,000 ns, roughly 5,000× slower than a main-memory reference, per the latency numbers every programmer eventually memorizes. Cross-region, it's about 150 milliseconds. You pay that tax on every hop, then stack retries and serialization on top.
So before you accept any split, ask:
If these two components ran in the same process, what capability would we actually lose?
If the only answer is "microservices scale better," you don't have a reason yet. You have a reflex.
The receipts: teams that walked it back
This isn't theoretical, and it isn't anti-microservices. The most-cited cautionary tales come from strong engineering teams who had real reasons to go distributed — and still found the operational tax wasn't worth paying.
Segment consolidated over 140 services back into a single monolith. They were adding roughly three destinations a month, and each one meant a new repo, a new queue, and another service to scale and get paged about. After collapsing them, the test suite for all 140+ destinations ran in milliseconds — a single destination used to take minutes — and shared-library improvements jumped from 32 to 46 a year. The distributed version was actively slowing the team down.
Amazon Prime Video's video-quality monitoring service, built as a fleet of distributed serverless components, hit a hard scaling wall at about 5% of its expected load. The team rebuilt it as a single process and cut infrastructure cost by ~90%. To be precise: this was one team's monitoring service, not all of Prime Video. The lesson isn't "monoliths always win" — it's that the distributed default was simply wrong for that workload.
And Martin Fowler, no microservices skeptic, has long observed that almost every successful microservices system started as a monolith that grew and got broken up — while systems built as microservices from scratch have "almost all" ended up in serious trouble.
Every one of these teams could operate microservices. They had the headcount, the tooling, and the on-call maturity. They went back anyway, because the complexity wasn't paying for itself. If elite teams at real scale find the tax too high to keep, the bar for an AI agent to add a network boundary to your fresh project — on day one, before a single real user — should be a lot higher than "it's the standard pattern."
Give the agent boundaries, not infrastructure
The thing you actually want from microservices — strict ownership, clean boundaries, the freedom to extract later — you can have today, in a single deployment.
A modular monolith keeps one deployable while enforcing hard internal walls:
application/
├── orders/ # owns its schema, exposes a contract
├── billing/ # cannot import orders' internal types
├── identity/
└── notifications/Each module owns its logic, its data, and its public contract. Orders cannot query Billing's tables. Billing cannot import Orders' internal classes. The boundary is real and enforced — it just isn't a network.
The difference is entirely in how you prompt the agent. Instead of asking for "a scalable backend" and accepting whatever sprawl comes back, ask for the constraint:
Design a modular monolith with a single deployment. Each module owns its own schema and exposes explicit public contracts. Prevent cross-module imports of internal types. Identify the seams where a module could later be extracted into a service — but do not introduce any network boundaries yet. For anything you would normally split into a separate service, state the concrete operational reason first.
Now the agent has to justify complexity instead of generating it by reflex.
Ask an agent to build the backend for a BIM data service — something that ingests an IFC or Revit export, parses it, computes quantities, and pushes results to a dashboard. You'll often get five services: ingestion, parsing, geometry, quantity-takeoff, and notifications, each with its own queue and database.
But parsing and quantity-takeoff operate on the same multi-hundred-megabyte model and always run together. Splitting them across a network means serializing that model and shipping it between two services that a function call could have connected — you've bought a latency penalty and a new failure mode for nothing. Start as one deployment with a parsing module that's ready to extract, and let the geometry engine become its own service only when it genuinely needs its own GPU nodes and scaling cadence.
When should you actually extract a service?
There are real reasons. Extract a module into a service when you need:
- Independent deployment — it has to ship on a different cadence than the rest of the app.
- Isolated failure — it must be able to fall over without taking everything else down, or vice versa.
- Independent scaling — it has a genuinely different load profile, ideally one you can put a number on (those GPU-bound geometry jobs; that one endpoint taking 100× the traffic).
- A hard security or compliance boundary — payment data or PII that needs to live behind its own wall.
- Separate team ownership — a different team needs its own release cycle and blast radius.
Notice what's not on that list: "it might be useful one day," "Kubernetes manifests are easy to generate," and "the diagram looks more mature."
Run each proposed service through one filter:
| Keep it a module | Make it a service |
|---|---|
| Function call, ~nanoseconds | Network round trip, ~500,000 ns + serialization |
| Refactor a boundary by moving code | Refactor a boundary by versioning a contract and migrating two deployments |
| One pipeline, one dashboard | +1 pipeline, +1 dashboard, +1 on-call surface |
| A bug is a stack trace | A bug is a distributed-tracing investigation |
| Deploys with everything else | Deploys independently (when you need it) |
| Scales with the whole app | Scales on its own (when you need it) |
The last two rows are the only things a service buys that a module can't — and they're worth real money when you actually need them. Everything above them is cost. A good architecture doesn't maximize the number of services; it delays the expensive, hard-to-reverse decisions until the system has earned them. A network boundary is one of the most expensive, hardest-to-reverse decisions you can make.
Make the AI justify the boundary
Generated architecture still creates real operational debt — the fact that a machine wrote it doesn't make it free to run. The agent is optimizing for code that looks complete and production-grade. You're the one who has to operate it, debug it across five services at 2 a.m., and keep six contracts backward-compatible.
So make every network boundary earn its place. One prompt does most of the work:
For each service in this design, name the specific operational problem that requires it to deploy, fail, and scale independently. If the honest answer is "scalability" with no number behind it, make it a module instead.
The agent will happily build you a distributed system. It's your job to ask what each service buys you — before you accept the bill.
