Cloud Migration Without Downtime
The move is easy; staying correct is not
The lift-and-shift is the part of a cloud migration that demos well and means least. Copying servers to a new provider is a weekend. Doing it while the system keeps serving real users, real orders, and real money, and proving afterwards that nothing changed underneath them — that is the engagement.
A migration succeeds on the scaffolding around it, not on the move itself.
We led the cloud migration of a B2B commerce platform onto AWS while it carried hundreds of thousands of products in production. It did not go dark, and customers did not feel it. That outcome was not luck. It was the direct result of the verification work that surrounded the move.
Why migrations actually fail
Migrations rarely fail on the obvious step. They fail on the things the old environment was quietly doing that nobody wrote down: a cron job on a forgotten box, a file path that only resolves on the legacy host, a database setting that masked a latent bug for years. The new environment is correct, and the system breaks anyway, because the old one was correct by accident.
The second failure mode is the big-bang cutover. Move everything in one night, and when something is wrong at 03:00 you cannot tell which of fifty changes did it, and rollback means undoing all of them at once. The risk is not that any single step is hard. It is that you have no way to localise a fault.
You cannot roll back a surprise you never defined the correct state for.
Define correct, then move toward it
The reframe that makes migrations boring — which is the goal — is to treat “the system still works” as something you can assert and check, not something you hope for. The verification stack does that work.
End-to-end tests written against the old system become the definition of correct. They are the contract the new environment has to satisfy: the same order books, the same statement reconciles, the same permission holds. Run them against old and new and the difference between the two is no longer a matter of opinion.
Infrastructure as code turns the new environment into a reviewable artefact instead of a pile of console clicks. The cloud setup lands in version control, goes through the same review as application code, and can be rebuilt identically. The forgotten cron job has nowhere to hide when the whole environment is a file you can read.
Incremental cutover moves one slice at a time behind a switch you can flip back in seconds. When something is wrong, the blast radius is one slice and the cause is obvious. Boring is the highest compliment a migration can earn.
Where AI earns its keep
Migrations are full of legacy code nobody has read in a decade, and that is where AI-assisted work genuinely helps: explaining a tangled module, drafting the characterisation tests that pin down what it does today, translating a config from one platform's dialect to another. Every one of those outputs lands in front of the test suite that defines correct, so a confident-but-wrong answer is caught rather than shipped.
That is the discipline from the code-factory piece applied to infrastructure: generation is cheap, the human-and-test gate on correctness is the part that matters, and a migration without that gate is just a faster way to break production somewhere new.
Proof, including our own
The B2B platform migration is the client proof: a four-year engagement, still on AWS, moved without the kind of outage that makes the trade press. We run the same way ourselves.
MushRoom is multi-tenant on AWS with its infrastructure in code and a test suite that defines correct, which is why it can deploy every day and change its own foundations without a maintenance window. It is not the headline. It is the evidence that the approach we bring to a migration is the one we trust with our own product.
On the Radar
Tools this article names that we have shipped in production.