The previous sections of this disaster recovery plan will help in assessing risks and making decisions on where to cover the most critical risks. In this section, the recovery plan will determine and list the most probable effects of each disaster. The disaster recovery process of Loop Inc. will cover these specific effects. Multiple causes have been noted to produce the same effects and these effects can, in turn, lead to other effects. This recovery plan focuses on earthquakes and power supply cut as some of the main risks due to natural disaster and human-caused risks (Reason, 2016). An earthquake leads to the failure of several entities such as office facilities, operations staff, power system, telephone system, and data systems of the company. Below is a sample mapping of the cause, effects, and affected entities in cases of earthquakes or power supply cut.
Table 2 Disaster Affected Entities
|Risk (Disaster)||Effects||Affected Entity|
|Earthquake||Telecom failure||Telephone instruments and network|
|Desktops destroyed||Desktops and workstations|
|Office space destroyed||Office space|
|Operators cannot report to work||Office staff|
|Data systems destroyed||Data systems|
|Power supply cut||Data systems powered of||Data systems|
|Desktops powered off||Desktops/workstations|
|Telecom failure||Telephone instruments and network|
|Data network down||Network devices and links|
In the above table, it may be noted that several disasters may affect the same entities and this can help identify the entities which are most affected. Data systems and power are the main entities with the highest probability of being affected because they support most of the company’s entities.
Determining the effects of disasters also requires the company, through the recovery plan, to set downtime tolerance limits. The downtime limits will be based on the “Affected Entity” list with each entity having a set downtime limit (Torabi & Sahebjamnia, 2015). The tolerance limit will be sorted in an ascending order and those entities with the least tolerance limit will be highly prioritized for recovery. The cost of downtime has been used as one of the metrics for evaluating downtime tolerance limits.
Table 3 Risk Tolerance Limits
|Risk (Disaster)||Affected Entity||Cost of Downtime||Tolerance Limits|
|0-5||0 – 10|
|Earthquake||Telephone instruments and network||4||2|
|Desktops and workstations||3||3|
|Power supply cut||Data systems||5||1|
|Telephone instruments and network||4||2|
|Network devices and links||4||1|
The investment required for any recovery plan is based on the cost of downtime which can be either tangible or intangible costs. Tangible costs are consequences of the business’s interruption, productivity, and generating less revenue. Intangible costs can be identified as lost opportunities when the company loses reputation and customers approaching competitors among other factors. The recovery plan identifies there are several interdependencies from the affected entities. There are disaster affected entities which will need a detailed recovery sequence, for example, data system restoration is dependent on the restoration of power.
After preparing the list of affected entities and assessing their failure tendency, there is enough laid ground for analyzing different recovery methods available for each entity. This analysis helps to identify the best suitable recovery method for each entity.
i. Data systems
Disaster recovery facilities are key in supporting the effective data redundancy in the company’s onsite data center. These facilities will act as offsite data storage and will also have recovery systems from other entities such as power cuts, network outages, storage, connectivity to paths and devices. To increase redundancy and prevent the need for a disaster recovery, technologies such as the redundant array of independent disks (RAID) and mirroring will be used in the software layer (Chang, 2015). One way of providing fast recovery from any hardware or software error is having on-site data center redundancy because there will not be a need for disaster recovery.
The company’s business needs will determine the nature of each disaster recovery mechanism. Loop Inc. will have several duplications of its data center to ensure the company’s business processes are not affected by any site loss. The company can also build its own data center specifically for disaster recovery purposes with the basic and necessary required hardware to keep the business running. Loop Inc. can eventually opt for a colocation facility where the company can access data center services on a rental basis.
ii. Major incident process
The major incident process will have an objective of efficient resolution of incidents which have a key impact on the company’s critical business processes. The process ensures there are quality and quantity of communication in cases of major incidents. The process also ensures there are sufficient resources for the resolution of any major incident (Torabi & Sahebjamnia, 2015). The major incident process will offer a systematic incident review to prevent similar incidents from reoccurring. To ensure these objectives are realized, Loop Inc. will its customized “Major Incident Handling Plan Model” and it will be anchored on communication.
Figure 2 Major Incident Workflow
The major incident handling model will need a suitable major incident team for it to work successfully. The major incident team will consist of the problem manager, major incident manager, incident manager, and the service desk manager among other members. Loop Inc. will need a team that can accurately and swiftly tackle any incident in question while maintaining good customer relations. The team will also be responsible for the root cause analysis after resolving a given incident.
Loop Inc.’s disaster recovery process will happen under the activation, execution, and reconstitution sequential phases. The activation phase will involve the assessment and announcements of the disaster effects (Kerzner & Kerzner, 2017). The execution phase will involve the execution of the actual procedures for the company to recover from each disaster. The company’s business operations are restored on the recovery facilities or systems. In the last phase, the reconstitution phase, the execution phase procedures are stopped after the original system is restored.
i. Activation phase
The activation phase will involve notification procedures, damage assessment, and disaster recovery activation planning. Notification procedures will be highly dependent on effective communication because they are the primary measures taken as soon as an emergency or disruption has been predicted or detected. Notification procedures will contain the process to notify the recovery personnel during working and outside working hours (Torabi & Sahebjamnia, 2015). A notification will be sent to the damage assessment team after the disaster detection for them to assess the real damage and instrument subsequent actions.
Notifications from one team to another can take place through a pager, telephone, cell-phone or an e-mail. Loop Inc. has a notification policy that describes the procedures to be followed when required personnel cannot be contacted. These policies are clearly documented in the contingency plan. To document primary and alternate contact methods, Loop Inc. will use a call tree as shown below. The call tree has procedures to be followed in cases when a specific individual cannot be contacted.
Figure 3 Call Tree Chart
The contact list on the plan will unmistakably identify staff to be alerted and they will be classified by name, role, and contact information. In cases where disrupted systems are interconnected with external organizations, the plan will provide a point of contact in the given organizations (Richie & Kliem, 2015).
Damage assessment will help establish ways the contingency plan will be executed when the business’s services are disrupted. The nature and degree of the damage to the system are assessed quickly as conditions permit. The evaluation should be done with personal safety as the highest priority and the damage assessment team should be the first to be notified of the incident and they will use the damage assessment guidelines for investigating different types of disasters (Richie & Kliem, 2015). Power outage in the data center facility can have an assessment on whether power can be restored before the facilities UPS system runs out of static power. If the power cannot be restored, a disaster recovery plan can be activated immediately.
Damage assessment processes will vary with each given emergency and Loop Inc. can use the following general procedures.
· Origin of the disruption or disaster.
· The potential for additional emergencies or damages
· The area affected by the disaster.
· Status of the physical infrastructure
· Inventory of the key equipment.
· Functionality status of the important equipment
· Type of damage to equipment
· Items to be replaced.
· Estimated restoration time for normal services.
The disaster recovery plan should only be activated when a thorough damage assessment has been conducted to avoid stalling normal business operations as a result of false alarms. The Disaster Recovery Committee will do a disaster activation planning depending on the extent of the damage from the disaster (Cook, 2015). The committee’s plan should:
· Plan for communication between teams
· Catalog systems and services that need to be restored
· Catalog instructions for reporting failures to the team
· Showtime estimations for each restoration
ii. Execution Phase
The execution phase is involved in bringing up the disaster recovery system, for example, temporal manual processing, operation, and recovery on an alternate system. Sequence recovery activities should include instructions to coordinate with other teams in given situations, for example, when items need to be procured, completion of a key step, and when an action is not realized within the estimated time frame (Lan & Mojtahedi, 2017). Listed recovery procedures will provide detailed processes of restoring the system and its components. Loop Inc.’s procedures for IT service damage will address actions such as:
· Acquire access authorization to damaged premises
· Notify users linked with the system
· Procure needed office supplies and a working space
· Restore critical application software and operating system
· Restore system data.
· Test system functionality and security controls
· Connect the system back to other external systems of the network
iii. Reconstitution phase
In this phase, the business’s operations are transferred back to the original facility. Rebuilding can also be done in cases where the original facility is unrecoverable (Lan & Mojtahedi, 2017). This phase may last several days depending on the nature and severity of the destruction. The Disaster Recovery Committee will be involved in:
· Constantly monitoring the site or facility’s suitability for reoccupation
· Verifying the site or facility is fr