Your AI Agent Wants Microservices. Should You Trust It?

⚡

TL;DR

AI agents reach for microservices by default — not because your system needs them, but because a service has an obvious, templatable shape they can generate cleanly.
The agent optimizes for code that looks production-ready. It never sees your operational bill: every service is another pipeline, contract, dashboard, and failure mode you run forever.
A business boundary does not require a network boundary. A module gives you the same ownership as a service — minus the timeouts, retries, auth, and a call that's roughly 5,000× slower.
Teams that went distributed early walked it back: Segment collapsed 140+ services into one; an Amazon Prime Video service cut costs ~90% by becoming a monolith.
Default to a modular monolith. Make the agent justify every network boundary with a concrete operational problem, and extract a service only when the system has earned it.

The architecture looks impressive. That's the problem.

Ask an AI coding agent to design a scalable, production-ready backend. Watch what comes back.

An API gateway. Five services. A message broker. A database per service. Containers. Distributed tracing. A folder of Kubernetes manifests. It compiles. It ships with a clean architecture diagram. It looks like something built to grow.

Before you accept it, answer one question:

What problem did the fifth service solve?

Not what responsibility it owns. Not whether the boxes line up neatly in the diagram. What concrete problem requires it to deploy, fail, scale, and get someone paged for it — independently?

If you can't answer that, the agent didn't design for your scale. It pattern-matched to the shape of every backend tutorial on the internet, and handed you distributed-systems complexity before you have distributed-systems problems.

Here's the architecture the way the agent imagines it — tidy, modular, impressive:

Loading diagram...

And here's the part the diagram leaves out: every one of those boxes is something you now operate at 2 a.m.

Why agents reach for microservices

Microservices are easy to generate. That's the whole reason, and it has nothing to do with your requirements.

A service has an obvious, repeatable shape:

API endpoint
business logic
database
Dockerfile
deployment config

That pattern shows up in thousands of tutorials and repositories. The boundaries are explicit. Every component can be generated in isolation. The output looks professional and lands as a neat folder. For a model trained to produce plausible, complete-looking code, another service is the path of least resistance.

But the agent doesn't receive your future operational bill.

For the agent, another service is another folder. For you, it's another:

deployment pipeline
versioned API contract
authentication and secrets path
dashboard, alert, and on-call surface
failure mode that didn't exist before
version you have to keep backward-compatible

AI made writing the code cheaper. It did nothing to make operating a distributed system simpler — and those are not the same bill.

Loading diagram...

A module is not a service

Orders, Billing, Identity, Notifications — these should almost certainly be separate modules: distinct, well-owned areas of your domain. That's good design.

It does not follow that they need separate containers.

Inside a modular monolith, Orders calls Billing through an explicit interface — a function call, in the same process. Inside microservices, that exact same call now needs timeout rules, retries, authentication, tracing, idempotency keys, and partial failure handling.

The business logic didn't get more valuable. The transport got more expensive.

✗Before

billing.charge(order) across a network boundary. Now it needs a timeout, a retry policy, a circuit breaker, authentication, request tracing, idempotency keys, and a plan for when Billing replies "...maybe." The round trip alone costs ~500,000 ns — before any of that logic runs.

✓After

billing.charge(order) in the same process. A function call. It returns in nanoseconds. If it fails, you get a stack trace, not a distributed-tracing investigation. The business logic is identical; the transport is free.

And the gap isn't a rounding error. An in-process call resolves in nanoseconds. A network round trip — same datacenter, nothing exotic — runs about 500,000 ns, roughly 5,000× slower than a main-memory reference, per the latency numbers every programmer eventually memorizes. Cross-region, it's about 150 milliseconds. You pay that tax on every hop, then stack retries and serialization on top.

So before you accept any split, ask:

If these two components ran in the same process, what capability would we actually lose?

If the only answer is "microservices scale better," you don't have a reason yet. You have a reflex.

The receipts: teams that walked it back

This isn't theoretical, and it isn't anti-microservices. The most-cited cautionary tales come from strong engineering teams who had real reasons to go distributed — and still found the operational tax wasn't worth paying.

Segment consolidated over 140 services back into a single monolith. They were adding roughly three destinations a month, and each one meant a new repo, a new queue, and another service to scale and get paged about. After collapsing them, the test suite for all 140+ destinations ran in milliseconds — a single destination used to take minutes — and shared-library improvements jumped from 32 to 46 a year. The distributed version was actively slowing the team down.

Amazon Prime Video's video-quality monitoring service, built as a fleet of distributed serverless components, hit a hard scaling wall at about 5% of its expected load. The team rebuilt it as a single process and cut infrastructure cost by ~90%. To be precise: this was one team's monitoring service, not all of Prime Video. The lesson isn't "monoliths always win" — it's that the distributed default was simply wrong for that workload.

And Martin Fowler, no microservices skeptic, has long observed that almost every successful microservices system started as a monolith that grew and got broken up — while systems built as microservices from scratch have "almost all" ended up in serious trouble.

Segment: services collapsed:140 → 1from back to one monolith↓

Prime Video: infra cost:-90%from one service, rebuilt as a monolith↓

Segment: shared-lib improvements:32 → 46/yrfrom after consolidating↑

⚠️Read these as walk-backs, not as 'monoliths win'

Every one of these teams could operate microservices. They had the headcount, the tooling, and the on-call maturity. They went back anyway, because the complexity wasn't paying for itself. If elite teams at real scale find the tax too high to keep, the bar for an AI agent to add a network boundary to your fresh project — on day one, before a single real user — should be a lot higher than "it's the standard pattern."

Give the agent boundaries, not infrastructure

The thing you actually want from microservices — strict ownership, clean boundaries, the freedom to extract later — you can have today, in a single deployment.

A modular monolith keeps one deployable while enforcing hard internal walls:

application/
├── orders/         # owns its schema, exposes a contract
├── billing/        # cannot import orders' internal types
├── identity/
└── notifications/

Each module owns its logic, its data, and its public contract. Orders cannot query Billing's tables. Billing cannot import Orders' internal classes. The boundary is real and enforced — it just isn't a network.

Loading diagram...

The difference is entirely in how you prompt the agent. Instead of asking for "a scalable backend" and accepting whatever sprawl comes back, ask for the constraint:

💡The prompt that changes what you get

Design a modular monolith with a single deployment. Each module owns its own schema and exposes explicit public contracts. Prevent cross-module imports of internal types. Identify the seams where a module could later be extracted into a service — but do not introduce any network boundaries yet. For anything you would normally split into a separate service, state the concrete operational reason first.

Now the agent has to justify complexity instead of generating it by reflex.

ℹ️What this looks like in AEC tooling

Ask an agent to build the backend for a BIM data service — something that ingests an IFC or Revit export, parses it, computes quantities, and pushes results to a dashboard. You'll often get five services: ingestion, parsing, geometry, quantity-takeoff, and notifications, each with its own queue and database.

But parsing and quantity-takeoff operate on the same multi-hundred-megabyte model and always run together. Splitting them across a network means serializing that model and shipping it between two services that a function call could have connected — you've bought a latency penalty and a new failure mode for nothing. Start as one deployment with a parsing module that's ready to extract, and let the geometry engine become its own service only when it genuinely needs its own GPU nodes and scaling cadence.

When should you actually extract a service?

There are real reasons. Extract a module into a service when you need:

Independent deployment — it has to ship on a different cadence than the rest of the app.
Isolated failure — it must be able to fall over without taking everything else down, or vice versa.
Independent scaling — it has a genuinely different load profile, ideally one you can put a number on (those GPU-bound geometry jobs; that one endpoint taking 100× the traffic).
A hard security or compliance boundary — payment data or PII that needs to live behind its own wall.
Separate team ownership — a different team needs its own release cycle and blast radius.

Notice what's not on that list: "it might be useful one day," "Kubernetes manifests are easy to generate," and "the diagram looks more mature."

Loading diagram...

Run each proposed service through one filter:

Keep it a module	Make it a service
Function call, ~nanoseconds	Network round trip, ~500,000 ns + serialization
Refactor a boundary by moving code	Refactor a boundary by versioning a contract and migrating two deployments
One pipeline, one dashboard	+1 pipeline, +1 dashboard, +1 on-call surface
A bug is a stack trace	A bug is a distributed-tracing investigation
Deploys with everything else	Deploys independently (when you need it)
Scales with the whole app	Scales on its own (when you need it)

The last two rows are the only things a service buys that a module can't — and they're worth real money when you actually need them. Everything above them is cost. A good architecture doesn't maximize the number of services; it delays the expensive, hard-to-reverse decisions until the system has earned them. A network boundary is one of the most expensive, hardest-to-reverse decisions you can make.

Make the AI justify the boundary

Generated architecture still creates real operational debt — the fact that a machine wrote it doesn't make it free to run. The agent is optimizing for code that looks complete and production-grade. You're the one who has to operate it, debug it across five services at 2 a.m., and keep six contracts backward-compatible.

So make every network boundary earn its place. One prompt does most of the work:

For each service in this design, name the specific operational problem that requires it to deploy, fail, and scale independently. If the honest answer is "scalability" with no number behind it, make it a module instead.

The agent will happily build you a distributed system. It's your job to ask what each service buys you — before you accept the bill.

ArchBits infographic summarizing the post: an AI agent's microservices sprawl (API gateway; Orders, Billing, Identity, and Notifications plus a fifth service; a message broker; and a database per service) versus a modular monolith with the same modules over a single database — plus the hidden per-service costs (deploy pipeline, API contract, auth and secrets, observability, new failure mode, operational load), real-world walk-backs by Segment and Amazon Prime Video, and a checklist for when to actually extract a service. — The whole argument on one page: same business boundaries either way — microservices just turn each one into a network call you operate forever.

🎯

Key Takeaways

Generated architecture isn't free architecture. Every service the agent emits is a pipeline, a contract, a dashboard, and a failure mode you operate forever.
Business boundaries and deployment boundaries are different decisions. A module gives you ownership; a service adds a network — and a ~5,000× slower call — on top.
AI agents reach for microservices because they're easy to generate, not because they're easy to run. The model never sees your operational bill.
Default to a modular monolith: one deployment, strict module ownership, schemas that don't bleed, seams ready for future extraction.
Extract a service only for a concrete reason — independent deployment, isolated failure, independent scaling, a security boundary, or separate team ownership — never because the diagram looks more impressive.
Strong teams like Segment and Prime Video walked microservices back. Set your bar for adding a boundary higher than "it's the standard pattern."