Disaster Recovery - SaaS

RunMyJobs allows cross-region disaster recovery (DR) by default.

To understand how cross-regional DR works with RunMyJobs, you need to understand a little bit about the RunMyJobs SaaS architecture. You must also understand RPO (Recovery Point Objective: the maximum amount of data loss that can be tolerated) and RTO (Recovery Time Objective: the amount of time it takes to recover from a disruptive event).

RunMyJobs SaaS Architecture

A RunMyJobs instance is tied to a particular AWS region. Within that region, it runs in a containerized, clustered environment.

Failover Within an AWS Region

An AWS region can include multiple Availability Zones (AZs), and RunMyJobs typically uses three AZs per region to ensure high availability within that region. If the AZ in which RunMyJobs is running goes down, the instance is automatically switched over to a different AZ, with an RPO of zero and a minimal RTO (due to the time it takes to spin up RunMyJobs in the new AZ).

Cross-Regional Failover

Every AWS region in which RunMyJobs runs has a designated secondary (failover) region. The secondary region is determined by Redwood and cannot be changed.

Hosting	Primary Region	Secondary Region
European	Dublin	Paris
USA/Americas	Oregon	Ohio
USA/Americas	Ohio	Oregon
Germany	Frankfurt	Zurich
Asia Pacific	Sydney	Melbourne
Asia Pacific	Singapore	Sydney

If a disaster (a region-wide sustained AWS outage with no ETA or a long ETA) occurs, all environments are brought up in the designated backup AWS region. Because the backup region is dedicated, the database and files are automatically synchronized, so that data and job processing losses are minimized.

For details on RTO and RPO times, refer to the Redwood Support Guide.