Hardware Today: Natural Disaster Survival Guide

by Drew Robb

When catastrophe strikes, your data is more important than the servers on which it sits. Disaster recovery tools and techniques ensure data makes it through and business continues as usual.

It is doubtful many of the vehicles in the recent record-breaking traffic streaming out of New Orleans and Houston were evacuating servers from disaster. Savvy enterprises had already transferred data to a remote disaster recovery (DR) site to avoid losing it in power outages or flooding from the storms. At the very least, these enterprises shipped backup data offsite. The not-so-smart, however, simply headed north, hoping Hurricanes Katrina and Rita would be gentle on the hardware they abandoned.

"Disaster Recovery plans are built to restore full functionality in relation to infrastructure, including facility and IT, at time of crisis," says Michael Croy, director, business continuity solutions at Forsythe, a Chicago-based IT consultancy and infrastructure firm. "The term crisis can include fire, flood, a plague of locusts, or any other interruption that denies the ability to run in a normal operational mode."

One of the big problems with DR planning is the unpredictability of Mother Nature. Typically, an actual event triggers a flurry of activity. A company invests in technology, develops a DR plan, and tests it for a while. Over time, the plan gradually fades from memory. Then, when disaster strikes, the plan is outdated; personnel turnover has rendered it unworkable, and confusion reigns.

"We tend to be event-driven people," says Mike Karp, an analyst with Enterprise Management Associates. "We wait until we get bitten ... by an alligator, before we see the necessity of draining the swamp."

Gartner reports DR spending has dropped in the past three years after a 9/11-driven spike in 2002.

Sungard Availability Services Vice president of Marketing David Palermo stresses the importance of keeping DR plans current, come hail, rain, or shine. "You have to manage any alterations to the plan and procedures based on business changes that occur on a regular basis," he says.

To do that Gartner analyst Roberta Witty suggests forming a dedicated team of people and the development of metrics to measure and report on the status of the program. But that takes money, and that's part of the problem. Gartner reports DR spending has dropped in the past three years after a 9/11-driven spike in 2002.

"We see DR spending going down slightly," says Witty. "It is vital to maintain a separate budget for DR spending."

Money for DR

Assuming money is available, what should the IS organization spend it on? According to Palermo, state-of-the-art recovery means engineering a system and infrastructure that provides constant access to critical data and systems.

"The challenge companies face is keeping up with the two-times the costs and the level of redundancy that data centers require to achieve true information availability," says Palermo.

High availability and a multiple data center strategy is ideal. If this is cost prohibitive, costs can be reduced while implementing a top-notch system. Karp suggests replication be done from a high-price storage device to a less-expensive one — for example, from an EMC Symmetrix box to an inexpensive SATA array at a remote site.

Cisco Vice President of Storage Technology Jackie Ross believes money need not be thrown at the problem. Rather, funds should be spent wisely based on business priorities.

"Analyze your business applications into different tiers, then assign different DR policies for each class," says Ross. "The speed of data recovery varies widely per application and line of business."

For example, Tier 1 might be "must recover within 24 hours," Tier 2 might be "recover within 72 hours," and Tier 3 might be "recover within 10 business days." Alternatively, Tier 1 might require no downtime of any sort, i.e. no matter what natural disaster occurs, not one single transaction can be lost.

Croy agrees with this approach. "What is best for the recovery of the business is what is critical," he says. "If it is zero loss of data, then newer technologies (probably at a higher cost) need to be utilized."

If the data is not mission or business critical, less sophisticated technologies may be used to save money. The determining factor, however, is not whether it is a hot site, cold site, vendor-run site, or co-location site, it is that the recovery plan meets the business requirement.

>> Other Factors

The Non-IT Side of DR

Most experts concur on another point — IT is only one aspect of DR.

"It's a big mistake to think that the IT department is the only department needed to develop, test, and recover the business," says Gartner's Witty.

She explains that DR includes every department of the organization and even extends to federal, state, and local authorities to coordinate evacuation routes and emergency service procedures. Such bodies must be included in the DR planning process although it can be hard in some cases to get them to test with you.

In a regional outage, for example, personnel cannot be expected to be onsite for business recovery if they are having problems at home related to the event. So DR must also encompass supporting them at home.

Witty also points out the vital role of employee health and welfare during an event. In a regional outage, for example, personnel cannot be expected to be onsite for business recovery if they are having problems at home related to the event. So DR must also encompass supporting them at home.

"The American Red Cross is used a lot for this part of the education and awareness training," says Witty.

Having people outside of the IS organization involved in DR planning is one way to prevent unfortunate publicity during natural disasters. Other common mistakes include having "redundant power" on the same utility grid, establishing DR sites too close together, and situating the DR site in too isolated a location so staff members can't easily reach it during an event.

"You have to have your DR site far enough away to be outside the immediate threat area but close enough to be practical enough for travel to the remote facility," says SunGard Vice President of Consulting, Product Development Jim Grogan. "Position a recovery side outside of the weather pattern or fault zone you are planning for, so you don't find yourself with your backup location damaged too."

But there comes a point where planning for that worse case scenario can be stretched well beyond the bounds of logic. Who would have predicted, for example, that a power outage would affect 14 states? Yet that's exactly what happened on the Eastern Seaboard in August 2003. Or how about two Category 5 Hurricanes hitting New Orleans within one month (which almost happened with Katrina and Rita). Those kind of watershed events can completely change how the planning process is conducted. The reality is that most people think smaller than such events and can hardly conceive of catastrophes that are national in scope.

"You have to plan carefully for high probability events and then also understand how you might be affected by the less likely occurrences," says Grogan.

This article was originally published on Monday Oct 10th 2005
Mobile Site | Full Site