This document provides an overview of business continuity planning (BCP). It discusses the key components of a BCP, including conducting a business impact analysis to understand critical business processes and their maximum tolerable downtimes. The document also covers developing resumption strategies, communicating and training on the BCP, and reviewing and updating the plan on an ongoing basis. The ultimate goal of a BCP is to minimize disruption to an organization and allow for the timely recovery of critical business functions in the event of a disaster or business interruption.
3. • Introducing Business Continuity (BC) and
Disaster Recovery (DR)
• Commencing Business Continuity
Lifecycle and Activities
• Defining Business Continuity Universe
• Conducting Business Impact Analysis
December 2014 BCP 3
4. • Defining Resumption Planning
• Communicating and Socializing BCP
• Training and Testing BCP
• Implementing and Monitoring BCP
• Reviewing and Updating BCP
• Post Test
• Wrapping-Up and Closing
December 2014 BCP 4
6. How to continue doing business until recovery
is accomplished
How to restore core businesses operations
when disasters occur
Continuation of critical business processes
when a disaster destroys data processing
capabilities
Preparation, testing and maintenance of
specific actions to operate like normal
processing
BCPDecember 2014 6
7. Used to be just a data center
These days, it includes:
• Operational activities
• Personnel, networks, infrastructures
• All aspects of IT environment: policies,
processes, procedures, hardware,
software
December 2014 BCP 7
8. Create, test, monitor, review and update a
plan that will:
• Allow timely resumption of critical business
operations
• Indirectly allow timely recovery of critical
business operations and furthermore non-
critical business operations (DR domain)
• Minimize loss (human safety and assets)
• Meet legal and regulatory requirements
December 2014 BCP 8
9. According to The Institute of Internal
Auditors (IIA) www.theiia.org:
Availability as the main focus (critical
business processes)
Confidentiality of the company (tangible
and intangible assets)
Integrity of data and information
December 2014 BCP 9
10. General Business
First responder:
Evacuation, fire, health…
Damage Assessment
Emergency Mgmt
Legal Affairs
Transportation/
Relocation/Coordination
(people, equipment)
Supplies
Salvage
Training
IT-Specific Functions
Software
Application
Emergency operations
Network recovery
Hardware
Database/Data Entry
Information Security
Contact information is
important!
December 2014 BCP 10
11. It’s an on-going process, not a project with a
beginning and an end
• Creating, socializing, training, testing,
monitoring, controlling, reviewing and
updating
• “Critical” business functions may evolve
BCP team must constitute both business and IT
personnel
Requires support from top management and
executives
December 2014 BCP 11
12. Focus IT Business
Event
Resumption
Resumption Plan
Procedures to resume at
secondary/temporary site
Resumption Plan
Procedures to resume business
operations at secondary/temporary site
IT Contingency Plan: Recovers
major application or system
Emergency Response Plan
Protect life and assets during physical
threat
Cyber Incident Response Plan:
Malicious cyber incident
Crisis Communication Plan
Provide status reports to public and
personnel
Business
Continuity
Business Continuity Plan
Continuity of Operations Plan
Longer duration outages
December 2014 BCP 12
13. Imagine an organization:
Bank with 50 million accounts, social security
numbers, credit cards, loans…
Airline serving 60,000 people on 300 flights
daily…
Pharmacy system filling 15 million
prescriptions per year, some of the
prescriptions are life-saving…
Factory with 2000 employees producing
500,000 products per day using robots…
December 2014 BCP 13
14. Imagine a failure like
Production server failure
Transaction Disk System
failure
Hacker break-in
Extended power failure
Tsunami
Spyware
Malevolent virus or worm
Earthquake, tornado
Employee error or revenge
How will this affect each
business?
December 2014 BCP 14
15. • Should be oriented
towards recovering
AFTER the
DISASTER.
• Focus more into how
organizations could get
fully recovered into their
normal level of all of their
business processes.
December 2014 BCP 15
16. • Pre-incident readiness
• Evacuation procedures
• Identifying persons in charge, contact
information (SW and HW vendors, insurance,
recovery facilities, suppliers, offsite media,
human relations, law enforcement)
• Step-by-step procedures
• Required resources for recovery operations
December 2014 BCP 16
22. Interruption Window: Time duration organization can wait
between point of failure and service resumption
Service Delivery Objective (SDO): Level of service in Alternate
Mode
MTO: Max time in Alternate Mode where BCP take its role
Regular Service
Alternate Mode
Regular
Service
Interruption
Window
Maximum Tolerable Outage (MTO)
SDO
Interruption
Time…
DRP is Implemented
DRP succeeds
December 2014 BCP 22
24. Work Area
Business Units
Suppliers Customers
Processes
Control Centre
Recovery
Teams
Objectives
Computer Centre
INFORMATION
TECHNOLOGY
• Computer Equipment
• Communications
• Operating Systems
• Applications
DATA STORAGE
• Back Up
• Mirroring
December 2014 BCP 24
25. Evacuation plan: People’s LIVES always take
FIRST priority
Disaster declaration: Who, how, for what?
Responsibility: Who covers necessary disaster
recovery functions
Procedures for Business Continuity
Procedures for Alternate Mode operation
Resource Allocation: During recovery &
continued operation
Copies of the plan should be off-site
December 2014 BCP 25
26. • Processes established a secure and
resilient business environment capable of
mounting an immediate and effective
response to major incidents.
• It safeguards the interests of key
stakeholders, reputation/credibility, brand
within the organizations.
December 2014 BCP 26
27. • According to Business Continuity Institute
(BCI) and PAS 561
holistic management processes
identifies potential impacts
framework for resilience and response
capability
safeguard interests of key stakeholders
1 Guide to Business Continuity Management is a Publicly Available Specification developed through the British Standards Institution.
December 2014 BCP 27
28. • It’s more than just a
document and a paper
plan.
• It requires planning,
assessment, analysis,
communication,
socialization, training,
rehearsal and more.
December 2014 BCP 28
30. Identify overall strategic
objectives, goals, and
activities; identify
stakeholders, business
processes, products and
services
Analyse financial and
non-financial business
impacts resulting from
disruption of business
processes (BIA); identify
business-critical
processes; identify gaps
in recovery capability;
develop prioritised
recovery timeline.
Design recovery strategies providing practical, cost-effective
solutions to close the gaps; design organisational structure to
implement strategic objectives to respond to major incidents.
Develop BCP in line
with agreed strategies;
embed BCM within
culture of the
organisation.
Measure results through
auditing, exercising,
maintenance and
training. Support
continuous improvement
through constructive
feedback.
BCM program management – driven top-down by
executive management ensuring ownership and
establishing policy. Managed at corporate/operational and
operational/facility levels.
December 2014 BCP 30
31. Disaster
Recovery
Emergency
Response
Crisis
Management
Business
Recovery
• Initial control of
emergency
situation
• Blue light services
– safeguarding
human life
• Stabilizing, security,
damage
assessment
• Crisis
communications –
internal and
external
• Co-ordination of
service recovery
efforts
• Phased recovery of business-critical
processes
• Recovery of infrastructure and services
• Returning to “business as normal”
December 2014 BCP 31
33. • Aimed for establishing
a capability to protect
people and business
• More than an
organization’s chart or
paper plan
• Requires planning,
training, communicating
and more
December 2014 BCP 33
34. Why?
• Safeguard employees, visitors, and public
• Protect physical assets (buildings and
equipment)
• Minimise damage and business impact
• Avoid environmental contamination
• Protect reputation and image
• Ensure regulatory compliance
• Good corporate or enterprise governance
December 2014 BCP 34
35. Without
crisis management
Damage to
reputation,
financial results,
and
key relationships
Lost time/productivity
Time
It reduces
negative
impact
and
speeds
recovery
from all
kinds of
corporate
crisis
Negativeimpact
With
crisis management
Crisis
event
IMPACT
December 2014 BCP 35
39. ① Initiate Project Management
② Conduct Business Impact Analysis (BIA)
③ Define Resumption (and Recovery)
Strategies
④ Plan, Communicate and Socialize
⑤ Train and Test
⑥ Implement and Monitor
⑦ Review and Update
40. Establish need (through business case)
Get management support
Establish team (functional, technical, and
Business Continuity Coordinator)
Create work plan (scope, goals, objectives,
methods, timeline)
Initial report to management
Obtain management approval to proceed
41. If need isn’t there, no management support for
sure
Be aware BCP have cost to develop and
maintain. No ROI either
Functional leads are necessary as IT don’t
understand the businesses comprehensively
BCC is Project Manager for initiating BCP
Work plan will be like the phases of a traditionally-
managed project
42. Business Processes and Analysis
Identification of business processes and
their interrelationships
Prioritizations of business processes
based on downtime tolerance
Resource needs (must be shifted during
crisis)
December 2014 BCP 42
43. Which business processes are of strategic
importance?
What disasters could occur?
What impact would they have on the
organization financially? Legally? On human
life? On credibility?
What is the required recovery time period?
Methods: questionnaire, observation,
interviews, or meeting with key users
December 2014 BCP 43
44. Goal
Obtain formal agreement with senior
management on MTD for each time-critical
business resources
MTD is maximum tolerable downtime
a.k.a Maximum Allowable Outage (MAO)
A lot of BCP development is driven by the
MTDs assigned to various business
functions
45. Quantifies losses due to business outage
Opportunity cost
Recovery cost
Customer satisfaction
Legal charge
Does not estimate probability of kinds of
incidents, only quantifies the consequences
46. The question is not:
“How likely is it we’ll suffer a total loss of our
data center from a fire?”
The question should be:
“What would be the loss to the business if
we suffered the total loss of our data
center?”
47. When a disaster occurs, the highest
priority is:
1. Ensuring everyone is safe
2. Minimizing data loss by saving important
data
3. Recovery of backup tapes
4. Calling a manager
December 2014 BCP 47
51. Common perspectives
Fire, flood, hurricane, tornado, earthquake,
volcanoes.
Plane crashes, vandalism, terrorism, riots,
sabotage, loss of key personnel.
Anything destroys and diminishes normal
data processing activities.
December 2014 BCP 51
52. Business perspectives
If it harms critical business processes, it
may be a disaster.
Time-based definition – how long can the
business stand the pain?
Probability of occurrence.
Business impacts of the events.
December 2014 BCP 52
53. Besides financial figures’ contributions:
Critical: Cannot be performed manually.
Tolerance to interruption is very low
Vital: Can be performed manually for very short
time
Sensitive: Can be performed manually for a
period of time, but may cost more in staff
Non-sensitive: Can be performed manually for
an extended period of time with little additional
cost and minimal recovery effort
December 2014 BCP 53
54. Corporate
Sales (1) Shipping (2) Engineering (3)
Web Service (1) Sales Calls (2)
Product A (1)
Product B (2)
Product C (3)
Product A (1)
Orders (1)
Inventory (2)
Product B (2)
December 2014 BCP 54
55. Negligible: No significant cost or damage
Minor: A non-negligible event with no material or
financial impact on the business
Major: Impacts one or more departments and may
impact outside clients
Crisis: Has a major material or financial impact on
the business
Minor, major and crisis events should be
documented and tracked to recover
December 2014 BCP 55
56. Problematic Event
or Incident
Affected Business Process(es)
(Assuming a university)
Impact Classification &
Effect on finances, legal
liability, human life,
reputation
Fire Class rooms, business
departments
Crisis, at times Major,
Human life
Hacking Attack Registration, advising, Major,
Legal liability
Network
Unavailable
Registration, advising, classes,
homework, education
Crisis
Social engineering,
Fraud
Registration, Major,
Legal liability
Server Failure
(Disk/Server)
Registration, advising, classes,
homework, education.
Major, at times: Crisis
56
58. Interruption Window: Time duration organization can wait
between point of failure and service resumption
Service Delivery Objective (SDO): Level of service in Alternate
Mode
MTO: Max time in Alternate Mode where BCP take its role
Regular Service
Alternate Mode
Regular
Service
Interruption
Window
Maximum Tolerable Outage (MTO)
SDO
Point of Failure
Time
DRP is Implemented
DRP succeeds
December 2014 BCP 58
59. Case Study
A documented process where one
determines the most crucial IT operations
from the business perspective
1. Business Continuity Plan
2. Disaster Recovery Plan
3. Resumption Plan
4. Business Impact Analysis
December 2014 BCP 59
61. ① Choose information gathering methods
(questionnaire, interviews, observation,
FGD)
② Identify personnel as SMEs, policies and
procedures to gather necessary
information
③ Assess and analyze data and information
62. ④ Define and agree on BC Universe
(Disaster, Criticality and Impact)
⑤ Define and agree on criticality of
business processes
⑥ Agree and assign IW, SDO and MTO
⑦ Obtain management approval
63. Some notes to ponder:
• Selection of interviewees is very important.
Should be Subject Matter Experts from
business units who know businesses for quite
some time.
• Customize questionnaire: there is no standard
set of questions as it varies with each business
• Time-criticality: some processes are more
critical than others. Example: printing a payroll
is important, but not time-critical usually
December 2014 BCP 63
66. Recovery strategies are based on IW,
SDO and MTO
Management-approved (resources to
implement).
Predefined: we don’t have to make it up
as we go along.
We shall have a documented, tested
plan in place.
67. Different technical strategies
Different costs and benefits
How to choose?
Do Cost-Benefit Analysis (CBA or
Benefit Cost Ratio) carefully
Driven by business requirements (BIA)
68. Should address the resumption of:
• Business operations
• Facilities and supplies
• Users (employees, customers and
other stakeholders)
• Network and Data Center (technical)
• Data (off-site backups of data and
applications)
69. Scope of Work
• Data Center (obviously)
• Network connectivity
• Telecommunications (such as PABX,
Fax machine)
• Electrical instruments (if any)
71. Be mindful with:
• What IT and other requirements are
necessary to support them?
• Facilities and supplies: Where do we sit at
the secondary/temporary site? Where is our
working space?
• Users: Can manual processes be used as
part of DR? If so, how does the manual
processing get integrated back into the
electronic processing later? Do we need
housing, or transportation?
December 2014 BCP 71
72. Be mindful with:
• Recovery of data centers and networks is
an obvious necessity requiring careful
planning.
• We mean we’ve got all these computers in
the our main office/site and no data? Who
forgot about the data?
December 2014 BCP 72
74. Subscription Service Sites:
• Hot
Fully equipped, expensive, testable
• Warm
Missing key components, not testable
• Cold
Empty data center, slower to recover,
cheaper
77. Mutual aid agreements example:
Company B, a sister company of Company A
agree to help each other in this case.
But they probably don’t have identical hardware
and software at both corporations.
Most likely neither institution has excess capacity
to make available in a DR scenario.
Does this sound like it’s going to work? Probably
better than nothing, though.
78. Redundant processing centers
• Expensive
• Maybe not enough spare capacity for
critical operations
• In details:
Think of load balanced redundant sites, for
instance. Operations are going OK, but are
both sites running at less than 50% capacity?
Can site A handle the load if site B goes
down?
79. Service Providers
• Many clients share facilities
• Almost as expensive as a hot site
• Must negotiate agreements with other
clients
• Usually run at 100% capacity
• If a client transfer operations to SB as
part of a DR, the other clients take a hit
in diminished processing capacity
80. Data
• Backups of data and applications
• Off-site vs. On-site storage of media
• How fast can data be recovered?
• How much data can you lose?
• Security of off-site backup media
• Types of backups (full, incremental,
differential, etc.)
81. Redundancy
Includes:
Routing protocols
Fail-over
Multiple paths
Alternative Routing
>1 Medium or
> 1 network provider
Diverse Routing
Multiple paths,
1 medium type
Last-mile circuit protection
E.g., Local: microwave & cable
Long-haul network diversity
Redundant network providers
Voice Recovery
Voice communication backup
December 2014 BCP 81
82. • Full: what you think they are – everything.
• Incremental: Files changed since
*previous* backup, which might be a full
backup or an incremental. Long recovery.
• Differential: All files changed since
previous full backup – quicker recovery.
• Continuous: Like a journal file system
Geographically separated systems are
kept up to date in real time.
December 2014 BCP 82
83. Daily Events Full Differential Incremental
Monday: Full Backup Monday Monday Monday
Tuesday: A Changes Tuesday Saves A Saves A
Wednesday: B Changes Wed’day Saves A + B Saves B
Thursday: C Changes Thursday Saves A+B+C Saves C
Friday: Full Backup Friday Friday Friday
If a failure occurs on Thursday, what needs to be
reloaded for Full, Differential, Incremental?
Which methods take longer to backup? To
reload?
December 2014 BCP 83
84. Grandfather
Dec ‘13 Jan ‘14 Feb ‘14 Mar ‘14 Apr ‘14
April 30 May 6 May 13 May 20
May 21 May 22 May 23 May 24 May 25 May 26 May 27
Father
Son
graduates
Frequency of backup = daily, 3 generations
December 2014 BCP 84
85. Backups are kept off-site (one or more)
Off-site is sufficiently far away (disaster-
redundant)
Library is equally secure as main site; unlabelled
Library has constant environmental control
(humidity, temperature-controlled, UPS, smoke/
water detectors, fire extinguishers)
Detailed inventory of storage media and files are
maintained
December 2014 BCP 85
86. Data Set Name = Master Inventory
Volume Serial # = 14.1.24.10
Date Created = Jan 24, 2014
Accounting Period = 3W-1Q-2014
Offsite Storage Bin # = Jan 2014
Backup could be disk…
December 2014 BCP 86
87. • Hot sites
• Ready to run (power, HVAC, computers):
Just add data.
• Considerations: Rapid readiness vs high
cost.
• Cold sites
• Building facilities, power, HVAC,
communication to outside world only.
• No computer equipment.
• Might require too long to get operating.
December 2014 BCP 87
88. • Site sharing
• Site sharing with a firm’s sites: problem of
equipment compatibility and data synchronization).
• Site sharing across firms: potential problem of
prioritization, sensitive actions).
• Hosting
• Provider runs production server at their site.
• Will continue production server operation if user
firm’s site fails.
• If hosting site goes down, there have to be
contingencies.
December 2014 BCP 88
89. • RAID: Local disk redundancy
• Fault-Tolerant Server: When primary server
fails, backup server resumes service.
– Distributed Processing: Distributes load over
multiple servers. If server fails, remaining server(s)
attempt to carry the full load.
• Storage Area Network (SAN): disk network
supports remote backups, data sharing and data
migration between different geographical
locations
December 2014 BCP 89
90. Hot Site: Fully configured, ready to operate within hours
Warm Site: Ready to operate within days: no or low power
main computer. Does contain disks, network, peripherals.
Cold Site: Ready to operate within weeks. Contains
electrical wiring, air conditioning, flooring
Duplicate or Redundant Info. Processing Facility:
Standby hot site within the organization
Reciprocal Agreement with another organization or
division
Mobile Site: Fully- or partially-configured trailer comes to
your site, with microwave or satellite communications
December 2014 BCP 90
91. Costs include basic subscription, monthly fee,
testing charges, activation costs, and hourly/
daily use charges
Issues include other subscriber access,
speed of access, configurations, staff
assistance, audit & test
For emergency use, not long term
May offer warm or cold site for extended
durations
December 2014 BCP 91
92. Advantage: Low cost
Problems may include:
Quick access
Compatibility (computer, software, …)
Resource availability: computer, network, staff
Priority of visitor
Security (less a problem if same organization)
Testing required
Susceptibility to same disasters
Length of welcomed stay
December 2014 BCP 92
94. • Set up a policy, procedure and system to
communicate and socialize it through
multiple channels.
• Get buy-in from stakeholders at all levels.
• Monitor and assess the progress.
• Drive stakeholder commitment to the
change.
December 2014 BCP 94
95. • Communicated to the appropriate
members of staff, and kept up to date.
• Equipment, positions, and processes
change; the documentation needs to
reflect this.
• People change positions, and new people
join companies; they need to know what to
do in the event of a disaster.
December 2014 BCP 95
97. Must be on-going basis
Needs to be part of the standard on-
boarding (at least the orientation)
Needs to be part of corporate culture
(employee handbook, public area,
visitor’s guide for instance)
98. Few points to ponder:
• How often do disasters occur?
• How good are people at executing
procedures that they don’t use very often?
• How do you ensure something is part of
the corporate culture when it’s designed to
deal with an event we hope never
happens?
99. Existing plans will result in successful
recovery of infrastructure & business
processes
Identify gaps or errors
Verify assumptions
Test time lines
Train and coordinate staff
December 2014 BCP 99
100. Pre-Test: Set the Stage
• Set-up equipment
• Prepare staff
Test: Actual test
Post Test: Cleanup
• Returning resources
• Calculate metrics: Time required,
% success rate in processing,
ratio of successful transactions in
alternate mode vs. normal mode
• Delete test data
• Evaluate plan
• Implement improvements
PreTest
Test
PostTest
December 2014 BCP 100
101. Tests start simple and
become more challenging
with progress
Include an independent
third party (i.e. auditor) to
observe the test
Retain documentation for
audit reviews
Develop test
objectives
Execute Test
Evaluate Test
Develop recommendations
to improve test effectiveness
Follow-Up to ensure
recommendations
implemented
December 2014 BCP 101
102. Until it’s tested, we actually don’t have a
plan yet
Types of testing
• Structured walk-through
• Checklist
• Simulation
• Parallel
• Full interruption
103. Structured Walk-Through: step-by-step
review of the BCP by functional reps who
meet together. No one is actually walking
anywhere.
Checklist: similar to SWT but checklists are
distributed to business units, who review the
checklists individually.
Simulation: kind of like “war games” in which
simulation stops at point where equipment
would be relocated.
104. Parallel: DR site is put into full operation
without taking down the primary. Results
compared between the two
Full interruption: Full-scale test of BCP by
a planned fail-over to the secondary site
and fail-back to the primary. Risky.
Side Note: more than one kind of test may
be useful. For instance, a simulation and a
parallel test complement each other.
105. The first and most important BCP test is the:
1. Fully operational test
2. Preparedness test
3. Security test
4. Desk-based paper test
December 2014 BCP 105
106. The PRIMARY goal of the Post-Test is:
1. Write a report for audit purposes
2. Return to normal processing
3. Evaluate test effectiveness and update
the response plan
4. Report on test to management
December 2014 BCP 106
111. • Comparing Current Level with Desired
Level
• Which processes need to be improved?
• Where is staff or equipment lacking?
• Where does additional coordination need
to occur?
December 2014 BCP 111
112. Fix problems found in testing
Implement change management
Audit and address audit findings (internal
or external auditors)
Annual review of plan
Build plan into organization
113. Things to assess and review:
Is BIA complete with IW, SDO, MTO
defined for all services?
Is the BCP in-line with business goals,
effective, and current?
Is it clear who does what in the BCP and
DRP?
Is everyone trained, competent, and happy
with their jobs?
December 2014 BCP 113
114. Is the DRP detailed, maintained, and
tested?
Is the BCP and DRP consistent in their
recovery coverage?
Are people listed in the BCP/phone tree
current and do they have a copy of BC
manual?
Are the backup/recovery procedures being
followed?
December 2014 BCP 114
115. Are the backup/recovery procedures being
followed?
Does the hot site have correct copies of all
software?
Is the backup site maintained to
expectations, and are the expectations
effective?
Was the BCP and DRP test documented
well, and was the BCP and DRP updated?
December 2014 BCP 115
116. During an audit of the business continuity
plan, the finding of MOST concern is:
1. The phone tree has not been double-
checked in 6 months.
2. BIA has not been updated regularly.
3. A test of the backup-recovery system is
not performed regularly.
4. The backup library site lacks a UPS.
December 2014 BCP 116
117. • RAID
• Backups: Incremental, differential backup
• Networks: Diverse routing, alternative routing
• Alternative Site: Hot site, warm site, cold site,
reciprocal agreement, mobile site
• Testing: checklist, structured walkthrough,
simulation, parallel, full interruption
• Insurance
December 2014 BCP 117
118. IPF &
Equipment
Data & Media Employee
Damage
Business Interruption:
Loss of profit due to IS
interruption
Valuable Papers &
Records: Covers cash
value of lost/damaged
paper & records
Fidelity Coverage:
Loss from dishonest
employees
Extra Expense:
Extra cost of operation
following IPF damage
Media Reconstruction
Cost of reproduction of
media
Errors & Omissions:
Liability for error resulting
in loss to client
IS Equipment &
Facilities: Loss of IPF &
equipment due to damage
Media Transportation
Loss of data during xport
IPF = Information Processing Facility
December 2014 BCP 118