Article · Jul 23, 2018

The importance of effective DR testing

Disaster recovery testing should be considered essential for every business. However, many organizations rarely conduct DR tests, if at all. This problem is even more common among small to midsize businesses (SMBs) for a number of reasons. First, many SMBs have small IT teams that are stretched thin daily, without adding DR testing to the mix.

Second, DR testing can be a complex process requiring business downtime—especially if you are using dated technologies. So, testing can be unappealing to business stakeholders as well as IT.

Finally, many organizations simply take a “set it and forget it” approach to disaster recovery and assume that their plans will work. This is a dangerous assumption.

Why? Because your IT environment isn’t static. Business data changes by the second, applications are regularly updated, hardware fails or becomes obsolete—the list goes on and on. Testing is the only way to be certain that you’ll be able to restore business operations within a reasonable timeframe following an outage. Remember: the purpose of disaster recovery testing is finding flaws in your DR plan. So, you must design tests that reflect real-world restore demands. This may sound obvious, but is worth noting because many IT teams perform ineffective or incomplete tests. One common example of this is testing the ability to restore files but neglecting to test whether you can restore business-critical applications.

Conduct frequent DR tests

There’s no standard suggestion for how often you should conduct a full DR readiness test, but twice yearly is a good place to start. Additionally, it is important to conduct testing following changes to your environment. The scope of these ad hoc tests will depend on the changes made and may not require testing every aspect of your disaster recovery plan. For example, a test you conduct after spinning up a single new virtual machine will obviously look very different than a test following a rip and replace of legacy hardware. Tailor your testing schedule accordingly.

Evaluate your existing technology

Your ability to conduct effective DR tests will be impacted by the technology you have in place. For example, testing your ability to restore business operations from offsite tape is obviously very different than testing the failover/failback capabilities of a high availability solution. If you avoid or struggle with testing due to technology limitations, consider evaluating modern DR tools.

Just as testing is important for success and confidence in your DR plan, so is understanding your entire IT environment. Detailed documentation about your network can ease testing, and more importantly, streamline recovery following a disaster event. Depending on the complexity of your environment, this documentation might be:

A list of all the hardware and software in place
Where the software and hardware physically exist
Spare technology in stock
Complete support and tech support contact list

Larger environments might include a detailed network map. There are a variety of purpose-built IT documentation tools available, as well. Depending on your specific needs, this type of solution may be beneficial.

Failback matters!

High availability solutions continuously replicate data to a remote site or cloud. In the event that a primary system goes down, the remote, secondary system can be spun up and users are rerouted. This process is commonly referred to as “failover,” and it reduces downtime to seconds or minutes.

However, failover isn’t a permanent state. Once primary servers are up and running, data and applications must be restored so normal operations can resume. This process is known as failback, and it is very important from a DR testing standpoint. Here’s why: Not all replication technology is created equally when it comes to failback. In some cases, failing back to production servers can be painfully slow. Look for a solution that reverses replication before failback to reduce user downtime. This allows users to continue working during the resynchronization process.

DR testing: Worth the time

Failing to test your ability to restore operations can lead to extended periods of downtime. This can be particularly damaging for SMBs, because they are typically more dependent on generating new revenue that larger, established companies with diverse assets. How much potential revenue would your business stand to lose if business-critical applications were down for an hour? A day? A week? Use those numbers for motivation and carve out the time and resources to perform rigorous DR tests.

Finally, if you are evaluating new DR tools, look for products and services that offer native testing capabilities. You want to be able to perform non-disruptive failover testing in an isolated environment so there’s no impact on production. This type of capability can dramatically ease the DR testing process.