Disaster Recovery Planning Basics


The thought of starting the process of creating an IT Disaster Recovery Plan can seem daunting, but the process is actually fairly straight forward.  First, start by generating a list of all of the applications and data stores in use in your environment, and the servers on which they reside.  Once listed, the next step is to define the acceptable parameters in regards to the data to be recovered:

How soon after a disaster is access to data required, how recent does that data need to be, and what’s the longest I can wait to get it?

These parameters are generally referred to as “RTO”, “RPO”, and “MTD” respectively:

Recovery Time Objective (RTO):  This is the time factor:  How fast do users need their information after a disaster occurs? What is the duration of time that can pass before the disruption would cause before it would be unacceptable to the flow of normal business operations?

Recovery Point Objective (RPO):  This is the age factor: What is the oldest the data can be to be useful? Data with high rates of change will require a shorter RPO, while archived data with low rates of change can accept a longer RPO.

Maximum Tolerable Downtime (MTD):  This is also a time factor: How long can we wait until we can’t wait anymore?  If you’re down any longer than your MTD, you’re out of business.

Business Impact Analysis:

Following the inventorying of applications, data, and servers, we’ve found that a best-practice is to assign a single individual or small committee to assess time needs and age within each department.  If the analysis is entrusted to individuals or department heads to self-assess their own applications and data, they traditionally assign high priority to everything in their purview.

If everything is high priority, then nothing is high priority.

Therefore, mandate a discussion between I.T. and Management to determine if the RTO, RPO, and MTD priorities for each component within each division align with business priorities.

We’ve found another best-practice is to set up three “buckets” to categorize key technologies & key applications – distribution of applications and technologies should be fairly evenly distributed between all three options:

  • Bucket I  “Must Have”: RTO = typically 8 hours or less – High priority.
  • Bucket II “Want to Have”: RTO = typically 3 days or less – Medium priority.
  • Bucket III “Nice to Have”: RTO = typically 1 week or more – Low priority.

We tend to find MTD varies based on industry, and for “Bucket III”/Low priority items, MTD can range between 15-30 days.

Test, test, test!:

Once a DR plan is drafted with all applications and data, and the servers on which they reside, have been classified and confirmed, the next step is to consider an appropriate test of the DR plan.  In any test, the recovery environment must be setup as independent of any production environment main network to simulate a DR scenario where the main office is unavailable.  For example, one should not leverage the network infrastructure or internet connection of an office that was hypothetically destroyed.

For clients with Suite3 BRS or COBRA solutions, there are four common DR testing scenarios, the most common of which is the “Table Top DR Test”.  Details of any test are structured according to an individual client’s parameters and needs, but remember, you get what you inspect, not what you expect.  Talk to us about defining and testing a customized recovery plan for your business.