Business continuity is the activity performed by an organization to ensure that critical business functions will be available to customers, suppliers, regulators, and other entities that must have access to those functions. These activities include many daily chores such as project management, system backups, change control, and help desk. Business continuity is not something implemented at the time of a disaster; Business Continuity refers to those activities performed daily to maintain service, consistency, and recoverability.
The foundation of business continuity are the standards, program development, and supporting policies; guidelines, and procedures needed to ensure a firm to continue without stoppage, irrespective of the adverse circumstances or events. All system design, implementation, support, and maintenance must be based on this foundation in order to have any hope of achieving business continuity, disaster recovery, or in some cases, system support. Business continuity is sometimes confused with disaster recovery, but they are separate entities. Disaster recovery is a small subset of business continuity. It is also sometimes confused with Work Area Recovery (due to loss of the physical building which the business is conducted within); which is but a part of business continuity.
The term Business Continuity describes a mentality or methodology of conducting day-to-day business, whereas business continuity planning is an activity of determining what that methodology should be. The business continuity plan may be thought of as the incarnation of a methodology that is followed by everyone in an organization on a daily basis to ensure normal operations.
Dynamic Vault provides Business Continutiy Planning
The analysis phase in the development of a BCP manual consists of an impact analysis, threat analysis, and impact scenarios with the resulting BCP plan requirement documentation.
Business Impact Analysis (BIA)
An impact analysis results in the differentiation between critical (urgent) and non-critical (non-urgent) organization functions and activities. A function may be considered critical if the loss of that that function or activity would cause significant or unacceptable damage to the organization and/or other stakeholders including customers or vendors. Perceptions of the acceptability of disruption may be modified by the cost of establishing and maintaining appropriate business or technical recovery solutions. A function may also be considered critical if dictated by law. For each critical (in scope) function, two values are then assigned:
- Recovery Point Objective (RPO) – the point in time (measured in days, hours or minutes pre-event) from which data can be recovered
- Recovery Time Objective (RTO) – the acceptable amount of time it will take to restore the function
The Recovery Point Objective must ensure that the Maximum Tolerable Data Loss for each activity is not exceeded. The Recovery Time Objective must ensure that the Maximum Tolerable Period of Disruption (MTPD) for each activity is not exceeded.
Next, the impact analysis results in the recovery requirements for each critical function. Recovery requirements consist of the following information:
- The business requirements for recovery of the critical function, and/or
- The technical requirements for recovery of the critical function
After defining recovery requirements, documenting potential threats is recommended to detail a specific disaster’s unique recovery steps. Some common threats include the following:
- Cyber attack
- Sabotage (insider or external threat)
- Hurricane or other major storm
- Utility outage
- Theft (insider or external threat, vital information or material)
- Random failure of mission-critical systems
All threats in the examples above share a common impact: the potential of damage to organizational infrastructure – except one (disease). The impact of diseases can be regarded as purely human, and may be alleviated with technical and business solutions. However, if the humans behind these recovery plans are also affected by the disease, then the process can fall down. During the 2002-2003 SARS outbreak, some organizations grouped staff into separate teams, and rotated the teams between the primary and secondary work sites, with a rotation frequency equal to the incubation period of the disease. The organizations also banned face-to-face contact between opposing team members during business and non-business hours. With such a split, organizations increased their resiliency against the threat of government-ordered quarantine measures if one person in a team contracted or was exposed to the disease. Damage from flooding also has a unique characteristic. If an office environment is flooded with non-salinated and contamination-free water (e.g., in the event of a pipe burst), equipment can be thoroughly dried and may still be functional.
After defining potential threats, documenting the impact scenarios that form the basis of the business recovery plan is recommended. In general, planning for the most wide-reaching disaster or disturbance is preferable to planning for a smaller scale problem, as almost all smaller scale problems are partial elements of larger disasters. A typical impact scenario like ‘Building Loss’ will most likely encompass all critical business functions, and the worst potential outcome from any potential threat. A business continuity plan may also document additional impact scenarios if an organization has more than one building. Other more specific impact scenarios – for example a scenario for the temporary or permanent loss of a specific floor in a building – may also be documented. Organizations sometimes underestimate the space necessary to make a move from one venue to another. It is imperative that organizations consider this in the planning phase so they do not have a problem when making the move.
After the completion of the analysis phase, the business and technical plan requirements are documented in order to commence the implementation phase. A good asset management program can be of great assistance here and allow for quick identification of available and re-allocatable resources. For an office-based, IT intensive business, the plan requirements may cover the following elements which may be classed as ICE (In Case of Emergency) Data:
- The numbers and types of desks, whether dedicated or shared, required outside of the primary business location in the secondary location
- The individuals involved in the recovery effort along with their contact and technical details
- The applications and application data required from the secondary location desks for critical business functions
- The manual workaround solutions
- The maximum outage allowed for the applications
- The peripheral requirements like servers, desktops, laptops, printers, phones, copier, fax machine, calculators, paper, pens etc.
Other business environments, such as production, distribution, warehousing etc. will need to cover these elements, but are likely to have additional issues to manage following a disruptive event.
The implementation phase, quite simply, is the execution of the design elements identified in the solution design phase. Work package testing may take place during the implementation of the solution, however; work package testing does not take the place of organizational testing.
Excercising the Plan
The purpose of testing is to achieve organizational acceptance that the business continuity solution satisfies the organization’s recovery requirements. Plans may fail to meet expectations due to insufficient or inaccurate recovery requirements, solution design flaws, or solution implementation errors. Testing may include:
- Crisis command team call-out testing
- Technical swing test from primary to secondary work locations
- Technical swing test from secondary to primary work locations
- Application test
- Business process test
At minimum, testing is generally conducted on a biannual or annual schedule. Problems identified in the initial testing phase may be rolled up into the maintenance phase and retested during the next test cycle.
Simple exercises: A Simple exercise is often called a ‘Desktop’ or ‘Workshop’. It typically involves a small number of people, perhaps 5-20, and concentrates on a specific aspect of a Business Continuity Plan or a specific subject area. (For example, Human Resources, Information Technology or Media) However, the beauty of a Simple exercise is that it can easily accommodate complete teams from various areas of a business. The numbers may increase and with it the logistics but the objectives will remain the same. Alternatively it could involve a single representative from several teams rather than needing the whole team to attend. It will seldom involve the provision of a Virtual World environment or the need for other than everyday resources. Typically, participants will be given a simple scenario and then be invited to discuss specific aspects of a company’s BCP. For example, a fire is discovered out of working hours – what are the current call out procedures – how is the incident management team activated – where does it meet – do the current documented procedures cover all eventualities? It will probably last no more than three hours and is often split into two or three sessions, each concentrating on a different theme. In this case either two or three different scenarios can be used or one scenario can be progressively developed to introduce themes that need to be addressed. Real time pressure is not usually an element of Simple exercises. Questions will need to be crafted ahead of time so that facilitators ensure discussions are productive and germane to the objectives of the event.
Medium exercises: A Medium exercise will invariably be conducted within a Virtual World and will usually bring together several departments, teams or disciplines. It will typically concentrate on more than one aspect of the BCP prompting interaction between teams. The scope of a Medium exercise can range from a small number of teams from one organisation being co-located in one building to multiple teams operating from dispersed locations. Attempts should be made to create as realistic an environment as practicable and the numbers of participants should reflect a realistic situation. Depending on the degree of realism required it may be necessary to produce simulated news broadcasts, together with simulated websites. A Medium exercise will normally last between two and three hours, though they can take place over several days. They typically involve a Scenario Cell who feed in pre-scripted injects throughout the exercise to give information and prompt actions.
Complex exercises: A Complex exercise is perhaps the hardest to define as it aims to have as few boundaries as possible. It will probably incorporate all the aspects of a Medium exercise and many more. Elements of the exercise will inevitably have to remain within a Virtual World, but every attempt should be made to achieve realism. This might include a no-notice activation, actual evacuation and actual invocation of a Disaster Recovery site. While a start and cut off time will have to be agreed, the actual duration of the exercise might be unknown if events are allowed to run their course in real time. If it takes two hours to get to the DR site instead of the expected forty-five minutes, the exercise must be flexible enough to cater for this. If a key player is unavailable a deputy must be prepared to step in.
For more information on how Dynamic Vault can help your organization with a comprehensive Business Continutiy Plan Fill out the BCP Pre-Qualification Form