Taken from the article I submitted for CR80 News.
Let’s start with this tidbit: Within two years after Hurricane Andrew struck in 1992, 80% of the affected companies that lacked a business-continuity plan failed, according to FEMA.
The campus card system is an important and intricate part of campus life. It facilitates campus security, access control and payments. Thus, lost transactions or down time have a direct impact on the entire campus, whether it’s a student being locked out of a residence or lost sales at retail locations. These impacts are financial and affect overall trust in the system.
Disaster recovery is the misunderstood, undervalued ugly duckling of the card service industry. It’s a dirty little secret that IT folks keep wrapped up to shelter clients from the realization that something bad can, and likely will, happen.
Though seldom discussed, everyone in the IT field has stories of lost email, documents, or even entire systems. The key is not loss, but rather the recovery.
Devise a plan
There are three basic parts to any disaster recovery plan: 1) build a business impact analysis; 2) define the scope of your new disaster recovery plan, including recovery times and individual responsibilities; and 3) create a communication strategy that includes who should be informed in each different failure state and who is responsible for sending out communication.
Though it seems simple, it is surprising how few people, departments and campuses actually implement the process.
The business impact analysis often moves beyond the scope of IT and should be considered a required part of any successful operation. It doesn’t take long to realize that if a POS system is offline across campus, transactions are not being processed and sales is being lost. Not only are you losing the transactions, you are also paying idle staff. These costs add up quickly and have a direct impact on your bottom line.
As the person responsible for the financial side of the card system, you should be able to see your disaster recovery plan at any time. If the person who maintains your card system infrastructure does not reside in your department, you should have ongoing dialog including how the last disaster recovery test went.
Business impact is the easy part of any new disaster recovery plan. The meat of the plan is in the definition of scope. It is also the most difficult part of the plan, but the reward time well spent pays off rapidly in the event of an outage.
The scope section should clearly define the situations addressed by the disaster recovery plan. This should be a comprehensive list, including everything from natural disasters to server-level crashes, lost hardware and anything in between.
In simple terms, the scope includes everything you are going to include in your disaster recovery plan.
What about me?
Perhaps as important as knowing what’s in scope is knowing what lies outside of it. This tends to be the touchier subject.
The unfortunate truth for some users is that their immediate needs might not align with what the institution constitutes an emergency.
The best way to get a full list of what you would like covered is to conduct a business impact analysis as part of an overall risk assessment of the system. This is ultimately a simple and clear cut way to ensure that everyone is on the same page when it comes to what services will be covered.
Impact analysis has three defined steps. First, a planning team needs to sit down, and actually make a decision on what is considered high, medium and low priority for your institution. Second, the analysis must be filled in to determine exactly what aspects of the system carry which level of importance. This enables campus officials to make more informed decisions on what priorities of the downed system you want to return service to first. The final step is to bring all stakeholders into the same room and ensure that everyone’s expectations are the same. Only then can rules can be created, and more importantly, followed.
It is a hard process because every department thinks that their needs and systems are top priority, but this is not always the case. Defining a business impact analysis before a crisis hits, enables people to be far more agreeable when it comes to prioritizing service return.
Defining a business impact analysis before a crisis hits enables people to be far more agreeable when it comes to prioritizing return of service
Talk to me
A communication plan is another vital part of a disaster recovery plan.
It should clearly outline how people will be communicated with in the event that a situation defined in the scope section occurs.
Every contact should not only have a name, phone number and email, but also have their responsibilities and secondary non-institutional contact details. If there is a localized problem that has disabled campus systems, sending internal email with vital instructions isn’t really going to help.
Your institution should ensure that all staff members maintain a work email account with a third-party provider that is different than the regular provider. This way, emergency communications can be distributed to both primary and secondary accounts.
Each year this communication list should be reviewed and updated as part of a regular process by a specific department. The responsibility should fall squarely on one department so that there is ownership of the process.
Testing, testing
Finally, the most important and least-accomplished part of disaster recovery planning is setting up a routine to test the plan.
A disaster recovery plan is only as good as its last test. Sure, you can have everything well documented, lots of information on service times, recovery procedures and which servers should be checked and when, but all that quickly goes out the window if something goes wrong.
Each disaster situation listed in the scope should be tested periodically. We don’t all have the staff or the time to do a full disaster preparedness drill on our campus, but we do all have time to make sure that our recovery plan is valid.
Without routine testing, the time spent designing and developing a disaster recovery plan is for naught.
Time should be taken during plan creation to define what should be tested and how often. This leads to the entire institution being more comfortable with the plan, and ultimately it increases response times when an issue inevitably occurs.
Everyone can, and will, suffer a data loss or disaster situation at some point. The trick is being confident that your campus will be minimally impacted through proper preparation. How well you prepare now will determine how effective and efficient you are when disaster strikes.
Oh yeah, now that you have a new disaster recovery plan, make sure to give a copy of it to as many people as necessary.