Nerdy Tech Geeks Smarter Tech. Better Outcomes.
Menu

Cloud • 9 min read

Cloud migration checklist: what to decide before moving workloads

A practical seven-step checklist for leaders planning cloud, hybrid, or multi-cloud migration without creating avoidable security, cost, and operating risk.

Published

Cloud migration succeeds when the business understands what it is moving, why it is moving, what must remain stable, and what the target operating model will look like after cutover. The technology decisions are the easy part. The decisions that determine whether the migration is judged a success six months later are about scope, sequencing, governance, and operating readiness — and they need to be made before any workload moves.

This is the checklist we use in the assessment phase of every cloud migration engagement we run. It is deliberately platform-neutral. The same questions apply whether the destination is Azure, AWS, Google Cloud, a private cloud, or a hybrid stack.

1. Define the business outcome

Start with the reason for migration. Common drivers include data centre exit, cost optimisation, improved resilience, acquisition integration, security modernisation, platform standardisation, and faster delivery. Each of these implies a different design.

A data-centre-exit migration prioritises lift-and-shift speed because the deadline is fixed by the lease. A cost-optimisation migration prioritises rightsizing, reserved capacity, and platform-as-a-service replacements that reduce operational burden. A resilience-driven migration prioritises multi-region architecture, backup and restore validation, and DR testing. A security-modernisation migration prioritises identity, network segmentation, and detection capability before anything else.

If the team cannot articulate the primary outcome in one sentence, the migration will drift. We have seen migrations where finance treated it as cost reduction, security treated it as control improvement, and engineering treated it as platform modernisation — three legitimate goals, three different designs, and a result that satisfied none of them well. Pick the lead outcome, document it, and resolve trade-offs against it.

Without a clear outcome, migration becomes a technical relocation exercise rather than a business improvement programme.

2. Build the dependency map

Identify applications, servers, databases, identities, file shares, network routes, DNS records, certificates, scheduled tasks, integrations, and support ownership. Migration plans fail when hidden dependencies are discovered too late.

The hidden-dependency categories that cause the most disruption:

  • Service accounts that authenticate against the source environment using mechanisms that do not work after migration. Hardcoded LDAP binds, Kerberos delegation, NTLM, on-premises certificates.
  • Integrations between applications that everyone forgot existed. The HR system that pushes data nightly to a finance application via a mapped drive. The reporting tool that scrapes a SharePoint site at 6 AM.
  • DNS-coupled dependencies where an application has hardcoded a server name or IP address that does not exist in the target environment.
  • Scheduled tasks running on Windows servers under specific accounts, with specific permissions, calling specific scripts that nobody has read in five years.
  • External integrations with banks, suppliers, or partners that have whitelisted the current source IP. The migration breaks the whitelist relationship and nobody knew it existed.

Discovery tools (Azure Migrate, AWS Application Discovery Service, third-party CMDB scanners) get most of the network and runtime dependencies. They miss the application-to-application integrations and the people-and-process dependencies. Combine automated discovery with stakeholder interviews — the application owner usually knows about three dependencies the scanner missed.

3. Decide the landing zone model

A good landing zone includes network segmentation, identity, logging, monitoring, backup, policy, tagging, access control, cost management, and environment separation before workloads arrive.

The principle: the platform should be a controlled environment when the first workload lands, not a series of ad-hoc fixes applied after problems appear.

The minimum landing zone components:

  • Network: VPC or VNet structure, address space planning, segmentation between environments and workload tiers, hybrid connectivity (ExpressRoute, Direct Connect, VPN), egress control.
  • Identity: directory integration, role-based access, Conditional Access or equivalent, privileged access patterns, service principal governance.
  • Logging and monitoring: centralised log collection, retention policy, baseline alerts, tooling decisions (Sentinel, CloudWatch, Cloud Logging, third-party SIEM).
  • Backup and recovery: backup policies per workload tier, recovery target validation, immutable backup where appropriate.
  • Policy and tagging: required tags for cost allocation, ownership, environment, data classification. Policy enforcement (Azure Policy, AWS Service Control Policies, Google Cloud Organization Policies).
  • Cost management: budgets, alerts, showback or chargeback, regular reviews.
  • Environment separation: dev/test/prod boundary, automated provisioning, the rule that no human applies changes directly to production.

Microsoft, AWS, and Google all publish reference landing zone architectures. Use them as the starting point, not the destination. The reference designs are deliberately broad. Tailor to the operating model the business will actually run.

4. Plan identity and access early

Identity is usually the hardest part of modernisation. Confirm authentication flows, admin roles, SSO, MFA, service accounts, legacy dependencies, and joiner/mover/leaver processes before moving workloads.

Three common identity gotchas in cloud migration:

  • Legacy authentication protocols. NTLM, basic authentication, hardcoded passwords, on-premises certificates that do not exist in the cloud identity provider. Inventory these before the migration, not during cutover.
  • Service accounts as a category. They often have permissions nobody can fully justify, no documented owner, and a password that has not rotated in years. The cloud equivalent (managed identities, service principals, IAM roles) is structurally different. Map each service account to a target identity model before workloads move.
  • Privileged access. Local administrators on servers, domain administrators, application administrators with embedded credentials. The cloud’s privileged-access model (PIM in Azure, IAM in AWS, IAM in Google Cloud) is conditional and time-bound. Plan the change in operating model, not just the technical mapping.

Identity is also where the security improvement opportunity is biggest. A migration that does not improve MFA coverage, reduce standing privilege, or improve identity logging is a missed opportunity.

5. Define recovery objectives

Every workload should have a recovery time objective, recovery point objective, owner, backup method, and restore test plan. Migration is a good time to improve resilience, not only replicate old risk in a new platform.

The pattern we use:

Workload tierRTO targetRPO targetBackup methodRestore test
Tier 1 (mission critical)1 hour15 minutesContinuous replication + immutable backupQuarterly full restore
Tier 2 (business critical)4 hours1 hourDaily snapshots + incrementalHalf-yearly partial restore
Tier 3 (important)24 hours24 hoursDaily backup, retained 30 daysAnnual sample restore
Tier 4 (low priority)72 hours24 hoursDaily backup, retained 14 daysAnnual sample restore

The exact RTO/RPO values vary by business. The discipline of categorising every workload, naming an owner, and committing to a test cadence is what matters.

Backup that has not been restored is an assumption. Migration is the right moment to convert assumptions to evidence.

6. Create migration waves

Group workloads by dependency, complexity, risk, and business impact. Start with lower-risk systems, validate patterns, then move increasingly critical workloads using proven runbooks.

A typical wave structure:

  • Wave 0 — pilot. 5–10 low-risk workloads. The point is to validate the tooling, runbook, communications, and helpdesk process. Run for 2 weeks before the next wave.
  • Wave 1 — non-production. Dev and test environments. Surfaces application compatibility issues without business impact.
  • Wave 2 — low-tier production. Tier 3 and Tier 4 workloads. Real production but limited business risk.
  • Wave 3 — business-tier production. Tier 2 workloads. Real business impact if a wave fails.
  • Wave 4 — mission-critical. Tier 1 workloads. Most preparation, most communication, most rollback readiness.
  • Wave 5 — cleanup. Decommission, document, transfer ownership, close out.

Two principles that consistently improve outcomes: keep tightly coupled application groups in the same wave (do not migrate a database in one wave and its application in another), and avoid scheduling waves through finance month-end, year-end, or known business pressure points.

7. Build governance into delivery

Tagging, budget alerts, policy, access reviews, logging, change control, and documentation should be part of the migration work, not an afterthought.

Governance built during migration is governance the business will keep. Governance bolted on six months after cutover is governance the business will fight. The patterns that stick:

  • Required tags enforced from day one. Every resource has owner, environment, cost centre, and data classification tags. Resources without them either cannot be created or are flagged in a daily report.
  • Budget alerts at the subscription, project, or account level. Alerts at 50%, 80%, and 100% of monthly budget, sent to named owners.
  • Quarterly access reviews for privileged roles, automated where possible.
  • Change control integrated with the cloud provider’s deployment tooling — pull request approvals, automated policy checks, rollback plans documented.
  • Documentation as code — runbooks, architecture diagrams, and dependency maps stored in version control alongside the infrastructure.

If governance is added after the fact, it always feels like overhead. If it is part of how workloads land in the platform, it feels like the platform.

What this looks like in practice

Cloud migration is not just moving workloads. It is an opportunity to improve security, resilience, cost visibility, automation, and operating discipline. The teams that treat it as a technology project deliver workloads in a new place. The teams that treat it as a business-improvement programme deliver workloads in a better place.

The seven-step checklist is the structure we use in our cloud and platform engineering work. If you are planning a migration and want a vendor-neutral assessment of where you are against this list, start the conversation — the first hour usually surfaces the biggest gaps without needing access to anything sensitive.