Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cyber resilience: planning to bounce back

by Andrew Lenaghan

  • Login to see the comments

  • Be the first to like this

Cyber resilience: planning to bounce back

  1. 1. Cyber resilience: planning to bounce back Dr Andrew Lenaghan. OxCERT 28/11/2017
  2. 2. A Lenaghan. OxCERT, Thursday 9, 9:45-1015 Cyber resilience: planning to bounce back JISC Security Conference2017 , Manchester UK
  3. 3. Qu: What could hurt us?
  4. 4. Possible Scenarios Ref Description of example scenario 1 Permeant loss or 2+ staff members (Death /dismissal/leaving) within 1 month. 2 Contagious illness causes shortage of staff. E.g. new flu strain causes ½ staff to be absent >1 week. 3 Training and holiday commitment cause shortage of staff. E.g. 2 staff down > 3 days 4 Fire guts office all equipment (laptops/computers/screens/phones/printers) lost. 5 Break-in leads to the theft of laptop and desktop computers from OxCERT office. 6 Loss of service – plumbing, heating, telephony, internet access (VoIP) 7 Evacuation due to gas leak - unexpected loss of access to offices (>4h). Offices undamaged 8 Unexpected short term loss of mains power to a data centre (<2h). No damage to equipment. 9 DDOS on JANET causes loss of internet connectivity for a prolonged period > 4h. 10 Loss of fibre connectivity between DC’s 11 Incident causing irrecoverable loss of equipment at data centre eg fire. 12 Loss of mains power to OxCERT offices in Wellington Square <2h. 13 [With in Uni] Loss of VM in hosting 14 [With Vendor] Disruption to AV signature distribution mail and desktop AV cannot be updated 15 Component failure on the of the server acting as XEN (VM) host cause crash and failure to restart. 16 Cryptolocker style compromise on NAS lead to data becoming irretrievable due to encryption. 17 Rootkit infection of bastion host requires it to be isolated for investigation and rebuild. 18 Police seizure of server for criminal investigation. 4
  5. 5. Possible Scenarios Ref Description of example scenario Resource impacted 1 Permeant loss or 2+ staff members (Death /dismissal/leaving) within 1 month. Lack of people2 Contagious illness causes shortage of staff. E.g. new flu strain causes ½ staff to be absent for >1 week. 3 Training and holiday commitment cause shortage of staff. E.g. 2 staff down > 3 days 4 Fire guts office all equipment (laptops/computers/screens/phones/printers) lost. Lack of Access 5 Break-in leads to the theft of laptop and desktop computers from OxCERT office. 6 Loss of service – plumbing, heating, telephony, internet access (VoIP) 7 Evacuation due to gas leak - unexpected loss of access to offices (>4h). Offices undamaged 8 Unexpected short term loss of mains power to a data centre (<2h). No damage to equipment. Lack of Infrastructure 9 DDOS on JANET causes loss of internet connectivity for a prolonged period > 4h. 10 Loss of fibre connectivity between DC’s 11 Incident causing irrecoverable loss of equipment at data centre eg fire. 12 Loss of mains power to OxCERT offices in Wellington Square <2h. 13 [With in Uni] Loss of VM in hosting 3rd Party service14 [With Vendor] Disruption to AV signature distribution mail and desktop AV cannot be updated 15 Component failure on the of the server acting as XEN (VM) host cause crash and failure to restart. Miscellaneous 16 Cryptolocker style compromise on NAS lead to data becoming irretrievable due to encryption. 17 Rootkit infection of bastion host requires it to be isolated for investigation and rebuild. 18 Police seizure of server for criminal investigation. 5
  6. 6. Our outlook : Guarded optimism Hope for the best, plan for the worst6
  7. 7. Artefacts & Audiences Business Impact Assessment (BIA) Business Continuity Plan (BCP) Disaster Recovery ProceduresBackup arrangements Keeping running…. Restarting from scratch Parameters EngineeringManagement Potential Scenarios Operations Exercises 1 2 3 4 7
  8. 8. Principles (& dog food) ❖ Eating your own dog food (Credibility) Get our own house in order before we start laying down the law to others. ❖ Being open (& setting users expectations) Be transparent about the service levels we set & be held to account by our users we fall short. ❖ Building a predictable response Do the engineering, planning and testing to have confidence we can achieve the targets 8
  9. 9. CERT Requirement OxCERT must continue to operate even where there is significant damage to, or sustained hostile activity against, ourselves or the network infrastructure of the University we defend 9 Be Resilience
  10. 10. Cyber Resilience - is this new? Traditional information security Assumes a stable environment, evolutionary change Aim: Deal effectively with known risks / threats ❖ Best practice ❖ Lessons learned ❖ Risk adverse 10 Cyber Resilience (Culture) Assumes turbulent environment / disruptive technologies, step changes which are unknown / unpredictable Aim : Anticipate & adapt ❖ Agility - Ability to change ❖ Anticipating / Forward looking ❖ Innovation / creativity to meet threats
  11. 11. Cyber Resilience - is this new? Traditional information security Assumes a stable environment, evolutionary change Aim: Deal effectively with known risks / threats ❖ Best practice ❖ Lessons learned ❖ Risk adverse 11 Cyber Resilience (Culture) Assumes turbulent environment / disruptive technologies, step changes which are unknown / unpredictable Aim : Anticipate & adapt ❖ Agility - Ability to change ❖ Anticipating / Forward looking ❖ Innovation / creativity to meet threats Getting better Getting different
  12. 12. Business Organisation Impact Assessment Its not about how or why or the likelihood of a failure, just focus on ‘if’
  13. 13. Artefacts & Audiences 13 Business Impact Assessment (BIA) Business Continuity Plan (BCP) Disaster Recovery ProceduresBackup arrangements Keeping running…. Restarting from scratch Parameters EngineeringManagement 13 Potential Scenarios Operations Exercises
  14. 14. What did we needed to think about? Geographic locations OxCERT operates from The services we offer and the relative priorities for recovering them Dependancies ❖ Stakeholders who depend on OxCERT ❖ External systems, services, vendors OxCERT depends on Single points of failure in our infrastructure Key person risks in the team 14
  15. 15. The shape of a disaster 15 Time BAU Service Level Lastgoodbackup 100% Recovery Time ObjectiveRPO Maximum Acceptable Outage Response Full Service restored Minimum Acceptable Service Level Downtime Recovery Failed Disaster strikes Recovery Achieved
  16. 16. The shape of a disaster 16 Time Service Level 100% Recovery Time Objective Response Minimum Acceptable Service Level DowntimeDisaster strikes Recovery Achieved ç
  17. 17. The shape of a disaster 17 Time Service Level 100% Maximum Acceptable Outage Response Full Service restored Minimum Acceptable Service Level Recovery Failed ç Recovery succeed Disaster strikes
  18. 18. OxCERT BIA: On one page….Service Name Relative priority Recovery time objective (RTO) Maximum Acceptable Outage (MAO) Security Incident Response 1 3 days 1 week Network monitoring 2 1 week 2 weeks Advising and alerting (vulnerabilities) 3 2 weeks 2 months A Business Impact Assessment on a page
  19. 19. How service impact grows over time… eg Security incident response service 19 Catastrophic MAO * High * * Acceptable * * Marginal * * Duration 2h 4h 8h 24h 48h 1 week 2 weeks 1month
  20. 20. BIA Reflections Conducted between Q3/Q4, 2016 ❖ Planned 9.5 days days effort, an underestimate ❖ Biggest issue - capturing what we did in a structured way. Keep it simple : Focus on identifying a few high level services (divided these down into internal activities) Quick wins! : Analysis helped us identify: • Single points of failure - firewall, Office VPN server • Key person risks - sysadmin skills Buy-in - Targets were: • Reviewed by team & Management • Signed off by CISO 20
  21. 21. Business Continuity Planning
  22. 22. Artefacts & Audiences 22 Business Impact Assessment (BIA) Business Continuity Plan (BCP) Disaster Recovery ProceduresBackup arrangements Keeping running…. Restarting from scratch Parameters EngineeringManagement 22 Potential Scenarios Operations Exercises
  23. 23. No 3. Activate the Plan? 1. Disaster Occurs 2. Perform an initial damage assessment Stop Yes Recogniz e Phase Objective 1 DISASTER OCCURRENCE Safety of staff and visitors 2 INITIAL DAMAGE ASSESSMENT Develop an initial overview of the situation 3 ACTIVATING THE PLAN Decide whether to activate the plan based on the initial damage assessment of locations and system 23
  24. 24. (5). Relocate Recovery Team to alternate site & establish operations? 4. Form Recovery Team & Designate Coordinator Yes React Phase Objective 4 FORM RECOVERY TEAM Form the recovery team, designate a recovery coordinator 5 (RELOCATE TO ALTERNATE SITE) Establish a working environment from which to conduct the recovery and resume services. 24
  25. 25. 7. Incident Coordination. Execute specific recovery procedures 8. Stand-down the Recovery Team & Transition back to normal operations Recover 6. Open an incident log & Communicate to key staff & teams Phase Objective 6 OPEN AN INCIDENT LOG Maintain a record of key milestones and decisions taken during in the recovery process EXTERNAL COMMUNICATION ACTIONS Inform key staff and teams that recovery is underway 7 INCIDENT COORDINATION Limit damage, prioritise performing recovery procedures, estimate recovery time. 8 STANDING DOWN Establish business as usual, inform key staff and teams
  26. 26. No 3. Activate the Plan? 1. Disaster Occurs (5). Relocate Recovery Team to alternate site & establish operations? 7. Incident Coordination. Execute specific recovery procedures 8. Stand-down the Recovery Team & Transition back to normal operations 2. Perform an initial damage assessment 4. Form Recovery Team & Designate Coordinator Stop Yes Recogniz e React Recover 6. Open an incident log & Communicate to key staff & teams A Business Continuity Plan on a page
  27. 27. How are we getting on?
  28. 28. Climbing the BCP/DR Maturity ladder 28 Approach Characteristics Level 5 Resilent • BCP/DR thinking integrated into processes • Metrics & continuous improvement • Audited / Reported on to Snr Mngt. Level 4 Proactive • Documented and maintained recovery plan • Exercises validate plan • Importance recognised & resourced Level 3 Prepared • Clear recovery procedures • Established recovery targets (RPO/RTO) • Need recognised, coordinated action Level 2 Reactive • Partial backups / fragmented approach • Informal/undocumented plan/key person risk • Need recognised but inconsistently enacted Level 1 Ad hoc • No recovery plan • Minimal or no backups • No buy-in
  29. 29. Climbing the BCP/DR Maturity ladder 29 Approach Characteristics OXCERT Recovery Level 5 Resilent • BCP/DR thinking integrated into processes • Metrics & continuous improvement • Audited / Reported on to Snr Mngt. Confident / consistent Level 4 Proactive • Documented and maintained recovery plan • Exercises validate plan • Importance recognised & resourced Likely to meet targets Level 3 Prepared • Clear recovery procedures • Established recovery targets (RPO/RTO) • Need recognised, coordinated action Probable but vulnerable to surprises Level 2 Reactive • Partial backups / fragmented approach • Informal/undocumented plan/key person risk • Need recognised but inconsistently enacted Possible Level 1 Ad hoc • No recovery plan • Minimal or no backups • No buy-in Partial / unlucky Start End
  30. 30. On to BCP exercises…. "Everybody has a plan until they get punched in the mouth.” 30
  31. 31. Dr Andrew Lenaghan, OxCERT JISC Security conference 2017, Manchester UK, V.04 Cyber resilience: planning to bounce back
  32. 32. jisc.ac.uk THankyou Dr Andrew Lenaghan (OxCERT) 28/11/2017 Cyber resilience: planning to bounce back 32

×