Plan, test and recover, regardless of the outage or the cause
The New England Patriots didn’t become six-time Super Bowl Champs by developing a core set of playbooks and placing them on a shelf, taken down only on game day. What if the weather’s bad that day? What happens when a player is injured? What if the opponent’s key player is injured and they change their play?
The Patriots operate successfully in a highly dynamic environment where anything can happen, much like IT environments. Even though they can’t predict and address every potential risk – like injuries or bad calls or changes in weather – they are always ready to shift their play to get to the end zone.
That’s because they have a plan. The team’s playbooks were created based on an understanding of the players’ skills and gaps, their environment, their strengths, weaknesses, as well as what they know about the opposing team’s plays, and their players’ strengths, weaknesses and performances.
Just about every IT organization has experienced at least one outage in the past few years, most which have lasted at least one full work day. In fact, a recent study by Enlogic, a company that provides data center energy monitoring products, found that nearly 75% of unplanned data center downtime was caused by some kind of human error. And very few organizations recover rapidly–in under an hour – from an outage. And as IT environments become more complex, outages will likely happen more often.
And yet, many organizations are at risk with their DR plans – they develop good but static documents that become out of date almost immediately, and then they find out the hard way that those plans are ineffective when they’re needed most – during an outage.
DR Is about RECOVERY – Regardless of Root Cause
So let’s talk about the key steps organizations must take to create a DR recovery plan that they can execute and restore service to critical apps while meeting service-level agreements and compliance requirements.
First, IT needs to understand the scope of recovery requirements for the business and assess its current capabilities to meet those requirements. IT needs to work together with business colleagues to identify mission-critical apps, understand how they work, what their dependencies are and how they are hosted across hybrid environments.
Next, business needs to define a scoring method to assess an application’s business impact, or its criticality to business, and assess the impact of downtime for each app in measurable ways –such as cost of lost revenue or impact to reputation. IT can then map the right recovery processes to each app based on its score.
Then, it’s time for teams to work together across silos to test recovery plans – validate them, identify gaps and conflicts, prioritize tasks to close those gaps – and then retest.
“’Every battle is won before it is fought,’” New England Patriots Coach Belichik told CNBC contributor Suzy Welch in 2017. He says the quote from Sun Tzu’s military classic captures his philosophy on preparation: “You [have to] know what the opponents can do, what their strengths and weaknesses are … [and] what to do in every situation.”
When Belichick and his players aren’t on the field, they’re huddled together in practices and team meetings where they spend hours going over films and studying plays.
Testing your DR Plan may sound obvious – but, even though 95% of companies have DR plans, they are tested infrequently, and 23% of organizations never test their DR plans at all.
So how do IT and business teams work together to capture all the application data they need, understand how apps work, identify recovery strategies, map recovery methods to apps, then generate tasks in the correct order to ensure service is restored on time and without bringing additional systems down?
It’s important to have a solid but dynamic game plan. One that keeps pace and enables you to recover quickly, no matter what the cause. And as IT environments continue to become increasingly complex, business demands for new technology will certainly keep accelerating. In that uncertain environment, you’ve got to be ready because outages will occur.
To see step by step how to create a DR plan that protects critical applications — no matter what the cause — watch the webinar replay:
From lost or accidentally deleted files to ransomware, natural disasters and even internal threat factors, there are many things that can happen to your applications –which can in turn cripple your day-to-day operations.
You can't predict when a disaster will strike or when your normal operations are disrupted, but there are steps you can take to disaster-proof your applications and not only ensure business continues but key IT initiatives don’t get stopped in their tracks.
Hurricanes, floods, and earthquakes may be pretty rare (thank goodness) but day-to-day threats can destroy data and ruin a business.
Recently, TDS CTO and Product Manager both delivered an AWS Migrations Unplugged session to address challenges faced when scheduling migration waves, and offer some solutions that help organizations accelerate and de-risk each step in their journey to the cloud.
Here are some of the blogs and stories we shared this year which recognize the challenges we’ve all been faced with – and celebrate those who were prepared and are thriving.