by Steve Gunderson.
One of the most common questions that we are asked is “How can I shorten my Data Center downtime window?”
After more than 17 years of move events and hundreds of lessons learned sessions, we’ve developed procedures and methodologies designed for the shortest downtime window possible.
This sounds obvious, but believe it or not, we have some customers ask us to move their network the same day as the move, and even re-use cables. Neither of these are good ideas, and we would strongly advise against both. In order to ensure success, which is the ultimate goal, installing a shiny new network in the target data center is a must.
Take this opportunity to upgrade your core switches, routers, etc. Make sure you test all your connectivity so that there are no surprises on move day. And, oh by the way…do not re-use existing cabling. Considering the overall cost to move your data center, cabling is way less expensive versus unwanted downtime to your business.
The largest block of time spent in a data center move is in the process of un-racking and re-racking equipment so it must be faster to move servers in the cabinet instead, right? Our experience is rarely, if ever. If you’re talking about a storage cabinet then yes, move it in the cabinet, but make sure you get the proper OEM packing materials. Moving servers in cabinets introduces new and different risk factors.
▪ Most cabinets are not designed to take lateral loads and the more servers in a cabinet the higher the chances you will experience a collapsed cabinet, especially if it’s top heavy. If a cabinet tips over you lose multiple servers and it increases the potential injuries to your crew. If you don’t believe us, ask the server cabinet manufacturers.
▪ If moving full cabinets is still determined to be the best approach we highly recommend custom wooden crates to protect the servers, the cabinets as well as your move team. Crates introduce two trade-offs:
⁃ Significant time is lost loading the cabinets into crates
⁃ Crates often cost more than half the cost of a new cabinet
▪ The time saved avoiding un-racking and re-racking servers is lost on the time installing, mounting and grounding cabinets at the target site. Not to mention that if the cabinet you need to move is connected to ladder racking or has cables running through it that can’t be removed (yes, we’ve seen this).
▪ Pre-cabling for an entire cabinet to an empty spot marked on the floor is also a very difficult task that introduces risk to the move
Often it’s the little details that cause delays that you cannot afford. Consider buying additional rails either for the entire move or simply for the highest priority equipment. Often removing rails from the source cabinets will slow down the move team. By simplifying un-racking and re-racking tasks down to sliding the servers out of the source rail system and into a pre-installed target rail you will save several minutes per device. And there are very inexpensive generic server rails you could buy.
Clearly label the locations of active ports on all the devices. The goal is to do everything possible to make reconnecting cables an automatic and rapid exercise. Make sure to get the lengths as well. It might feel like the safer route would be to leave extra length for each cable to allow for more flexibility or mistakes, but this will come back to haunt you. Extra length on each cable when compounded will only equal very messy wiring in the cabinets.
We recommend you only leave about a foot of slack for each cable. One thing you might consider is using rack blanks to help you determine how much cable is needed to get to that device. It will also be important to know where on each device you will be connecting to, and incorporate this into your overall plan.
When performing a large local move, (typically less than 30 miles) another approach to consider is using two trucks and drivers operating in a shuttle system. The sooner you get the teams working at both sites the more people you can deploy and the more activities you will accomplish.
It may sound simplistic but assign extra resources to the job. If you estimate you’ll need eight techs bring ten or twelve. This will hold true for system admins, DBAs Etc. You will find work for all of them. During a move the only thing that you can expect is the unexpected. It may be as trivial as a rail being stuck or as severe as a data base corrupting but you will want to make sure you have back up resources to keep the event moving while theses unexpected items are triaged. (If you’re looking for productivity metrics to determine staffing needs, we can help you there as well).
Also when you’re trying to collapse your move window don’t send any junior resources. “No plan survives contact with the enemy” and you’ll need teams capable of reacting and executing based on the facts on the ground.
Don’t stop with un-rack and re-rack strategies and plans. What’s your load and unload plan? Can you immediately find any device on the truck or in the staging areas at any time during the move? You’ll get a sick feeling in your stomach whenever you watch someone rummage through the equipment carts looking for the next device to install at the target site. Those seconds (or minutes) compound on one another and consume any contingent time built into the plan.
Most move plans start with the logical process and priorities for shutdown and start up of systems. Be sure to map the physical layout of systems to make sure your plan doesn’t have two teams working in the same cabinet simultaneously.
Pre-event tabletop exercises and data center walkthroughs will help the team more effectively execute the move run book, (step-by-step tasks) and will significantly reduce the time spent per device. Read more about this in our blog post The Value Of Table Top Exercises.
By far the number one time waster is communication lag. If not addressed correctly, communication lags consume any efficiencies you designed into the logistics of the move. In a traditional migration event TDS has seen communication lags range from 30 seconds to five minutes per task. The compounding of time is “expensive.” For example, in a 100 device and 500 step move event with delays of even 30 seconds per task, can extend the move by more than four hours.
We have seen companies using cell phones, runners, walkie-talkies and speaker systems experience delays more than 30 seconds per task. TDS has designed a purpose-built migration software called TransitionManager, that shrinks the lag to less than five seconds per task.
Make sure to evaluate your move and make sure you are using the right tool for the job. Instead of using spreadsheets, use data center migration software. Understand that a move event will involve the orchestration of a very large number of tasks and activities by a diverse group of resources. By not having the right tool for the job, you will lose valuable time and increase risk to the move event.
After nearly two decades of data center relocation, migration and consolidation experience, TDS has created TransitionManager, built specifically to implement a complete set of best practices for planning and managing all aspects of both physical and virtual data center migrations, TransitionManager improves communications and collaboration across the team and accelerates migrations while also reducing the risk of unplanned outages.