In this video, Leon Kogan, Director of Infrastructure at OpsGuru, discusses the meticulous planning required to ensure zero downtime during migrations of mission-critical systems to the cloud.
Collaboration Across Teams: The process begins by working closely with various departments, including customer success, business, marketing, sales, and technical teams. This collaboration helps identify which workloads are mission-critical and which can tolerate brief downtimes.
Defining Expectations Early: Leon highlights the importance of defining Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) from the outset. This ensures clear expectations around acceptable downtime.
Creating Detailed Runbooks: With input from both OpsGuru and the customer, detailed runbooks are created. These runbooks outline the migration steps and include strategies such as:
Blue-Green Deployments
DNS Topologies These strategies are designed to minimize downtime and streamline the migration process.
Conducting Dry Runs: Before the actual migration, multiple dry runs are conducted. This allows the team to identify potential issues and ensure all stakeholders understand their roles during the cutover.
Leon shares an example where a migration involving 15 MSA scale databases and thousands of machines resulted in just 10 minutes of downtime, thanks to six dry runs and the coordination of 65 people during the cutover.