Disaster Recovery: Defining RTO & RPO for Executives

The CEO wants "zero downtime," but is he willing to pay for it? How to translate technical backups into business insurance policies.
Disaster Recovery: Defining RTO & RPO for Executives

There is a dangerous conversation that happens in every company.

The CEO asks the CTO: "Are our backups working?"

The CTO says: "Yes."

They both leave the room happy. But they are talking about two completely different things.

  • The CTO means: "We have a cron job that dumps the database to S3 every night at midnight."
  • The CEO means: "If the server explodes at 4:00 PM, I assume we lose zero data and are back online instantly."

When the inevitable crash happens, this misalignment turns into a career-ending event.

As a technical leader, your job is not just to "do backups." It is to negotiate the price of downtime. You must define two acronyms that translate technical risk into dollars: RTO and RPO.

1. The Definitions: Time vs. Data

You must strip away the jargon. Explain it to the Board like this:

RPO (Recovery Point Objective) = "How much data can we afford to lose?"

If we crash at 4:00 PM, and our last backup was at 12:00 PM, we have lost 4 hours of data.

  • Question to CEO: "Are you willing to re-enter 4 hours of orders manually? Or do you need the data to be live up to the last second?"

RTO (Recovery Time Objective) = "How long can we be dead?"

From the moment the server crashes, how many minutes/hours can pass before the "Buy" button works again?

  • Question to CEO: "Does a 4-hour outage kill the company, or is it just an annoyance?"

2. The Cost Curve: The Price of Zero

The CEO's natural instinct is to say: "I want zero data loss and zero downtime."

Your answer is: "We can do that. It costs $50,000 a month."

This is the Asymptotic Cost of Availability.

  • 99% Availability (RPO 24h / RTO 24h): Cheap. A nightly script. Cost: $100/mo.
  • 99.9% Availability (RPO 1h / RTO 4h): Moderate. Database replication. Cost: $1,000/mo.
  • 99.999% Availability (RPO 0s / RTO 0s): Expensive. Multi-region Active-Active clusters with real-time sync. Cost: $50,000/mo.

You must present Disaster Recovery as a Menu, not a binary switch.

3. The Strategy: The Tiered Menu

Don't treat all data equally. A "One Size Fits All" DR strategy is either too risky (for payments) or too expensive (for logs).

Present this table to your Executive Team:

Table 1: The Disaster Recovery Service Levels

TierWorkloadRPO (Data Loss)RTO (Downtime)ArchitectureCost
PlatinumPayments / Orders~0 Seconds< 5 MinsMulti-AZ, Auto-failover RDS, Hot Standby.$$$$
GoldUser Profiles / Inventory15 Mins1 HourRead Replicas, frequent snapshots.$$
SilverAnalytics / Reporting24 Hours48 HoursNightly S3 Dumps. Restore on demand.$
BronzeDev / StagingBest Effort1 WeekInfrastructure as Code (Rebuild from scratch).$0

When you frame it this way, the CEO will quickly decide that the "Marketing Blog" does not need Platinum-tier protection. You just saved the company money while clarifying the risk.

4. The Trap: The "Restore" Test

Having a backup is meaningless. Restoring is the only thing that counts.

Schrödinger’s Backup states: "The condition of any backup is unknown until a restore is attempted."

I have seen companies with "perfect" backups fail during a disaster because:

  1. The encryption key for the backup was on the server that crashed.
  2. The backup file was corrupted 6 months ago, and no one checked.
  3. The restore process took 18 hours to download the file (violating the 4-hour RTO).

The Executive Protocol:

  • Mandate a Quarterly Fire Drill.
  • Actually restore the production database to a staging environment.
  • Time it. If it takes 6 hours, and your RTO is 4 hours, you are failing compliance. Report this to the Board immediately as a risk to be mitigated.

Summary

Disaster Recovery is not a technical problem; it is an Insurance Policy.

Your job is to act as the Insurance Broker.

  1. Define the terms (RTO/RPO).
  2. Quote the premiums (Cost of Architecture).
  3. Let the Business decide the coverage.

If the business chooses "Silver Tier" coverage and the site goes down for 24 hours, you are not incompetent; you are compliant with the agreed policy. That is the difference between being fired and being a strategic partner.

Subscribe to my newsletter

No spam, no sharing to third party. Only you and me.

Member discussion