Houston, we have a problem. Too many companies are fixated on the definitions of ITIL and problem management instead of actually getting value out of them. When you don't have a bonafide problem management policy, plus the training and skills to back it up, your team is likely to spend too much energy on ineffective activities. But there's hope. IT veteran John Custy will introduce the concepts you need to understand, dispel a few misconceptions, and explain the different problem management methodologies. He'll also cover the pros and cons of each methodology, and when to use each of them.
Making Problem Management Work for Your Organization
1. JOHN CUSTY • ITSM CONSULTANT • JPG GROUP • @ITSMNINJA
Problem Management:
Making It Work For Your Organization
2. John Custy
Service Management Practitioner,
Consultant and Educator
jpcgroup@outlook.com
•Ron Muns Lifetime Achievement Award
•IT Industry Legend – Cherwell Software
•Distinguished Professional in IT Service Management
•ITIL Expert and ITIL Accredited Trainer
•ISFS, ISMAS based on ISO/IEC 27002
•ISO/IEC 20000 Consultant
•DevOps Certified Instructor
•KCS Verified Consultant
•HDI Faculty & Certified Instructor
Twitter: @ITSMNinja
Facebook: John Custy
LinkedIn: johncusty
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
3. Get insightsRecover fasterMeet availability
Why problem management?
Increase value
Decrease time-to-
resolution (MTRS,
MTTR).
Ensure services meet
the availability of the
business
Reports are more than
just outages. How much
time is lost for
reoccurring issues?
Improve the availability
of your IT services,
reduce downtime and
cut down costs.
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
4. The value of
problem
management
Decreased costs
Customers can be more productive
due to improved service availability.
Reduced downtime
Customers experience less downtime
due to increased IT service availability.
Improved productivity
Lower costs due to reduction of
recurring incidents.
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
5. • Big cool statistic
• 2,56
9
• Add-Ons in Marketplace
Problem Management:
Current state and challenges
9. PROBLEM MANAGEMENT TECHNIQUES
WHY DO PROBLEM MANAGEMENT
NEXT STEPS
Agenda
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
10. 1. Prevent problems and related
incidents from happening
Problem Management:
2. Eliminate recurring incidents
3. Minimize the impact of
incidents that can’t be prevented
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
11. Key concepts:
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
Incidents
Incident
Problem
Problem
Known error
Request for change
Your service management tooling must support these relationships
12. Problem Management Process Metrics
Total # of problems recorded in the period Backlog of outstanding problems
Percentage of problems resolved within their targets Avg cost of handling a problem
# of problems that exceed their target resolution times # of known errors added to the KEDB
% of problems that exceed their target resolution times % of major problem reviews completed successfully & on time
# of major problems % of accuracy of the KEDB
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
13. Problem Management Value Metrics
Downtime eliminated
(Productivity improvements for the
business and IT)
Confidence/Image/Perception
Cost of problem management Reduce stories about downtime
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
22. 1. Clear goals and objectives
2. Clear policies
3. Resources allocated to problem management
4. Roles clearly defined
5. Process relationships (incident, knowledge, change, release,
deployment, financial and service levels)
6. Value communicated and understood
Problem Management Challenges:
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
23. • Big cool statistic
• 2,56
9
• Add-Ons in Marketplace
Problem Management
Techniques
24. 5 WHY’S
BRAINSTORMING
PAIN VALUE ANALYSIS
CHRONOLOGICAL ANALYSIS
Proven problem analysis techniques that have shown
to deliver positive results:
ISHIKAWA DIAGRAMS
PARETO ANALYSIS
KEPLER TREGOE
25. •The most common type of problem analysis
•Relevant experts meet together (physically or virtually)
•Identify their ideas on the potential cause of the problem
•Sessions can be very constructive, but can also be time-
consuming
•Sessions should be structured with a moderator
• Documents the session
• Identifies actions
•Follow-up items listed and assignments
Brainstorming:
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
26. •This analysis is used when attempting to understand the
impact of incidents/problems on the business.
•It is possible to design a formula to measure the level of pain
using variables:
•The investigation may also bring up info to help diagnose,
assess and ultimately correct the problem.
Pain Value Analysis:
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
# of users affected
Length of downtime
Timing of the downtime
Cost to the business (user time, lost sales, penalties, etc).
27. 1.Develop a timeline
2.Document all events in a chronological order
3.Determine which events triggered other events
4.Discount claims that are not supported by evidence
5.Correlate and identify root cause
6.Attempt recreation, if practical, to confirm root cause
Chronological analysis:
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
Builds a timeline of what happened when (from event and/or incident data). The
timeline can be used to identify cause and effect events and validate
assumptions not supported by the events.
28. Chronological analysis:
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
Builds a timeline of what happened when (from event and/or incident data). The
timeline can be used to identify cause and effect events and validate
assumptions not supported by the events.
Incident Detection Repair Restore Solved incident
Diagnosis Recovery
Time
29. Ishikawa
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
Developed by Kaoru Ishikawa, this is a graphical technique that helps
identify all possible causes of an effect, such as a problem. It’s
sometimes called a “fishbone” diagram.
30. Ishikawa Diagram
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
Main Cause Main Cause
Level 1 Causes
Level 2 Causes
Level 1 Causes
Level 2 Causes
Main Cause Main Cause
Level 2 Causes Level 2 Causes
Level 1 Causes Level 1 Causes
Level 2 Causes Level 2 Causes
Problem to be
resolved (effect)
31. Root Cause Analysis Flow Chart:
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
32. 1.Build a table showing potential causes
2.Sort the rows by importance (descending)
3.Plot causes (x-axis) and cumulative % (y-axis) and draw line
connecting the points (curve)
4.Plot bar graph with causes on x-asis
5.Draw line at 80% of y-axis (parallel to x-axis)
6.Where line and curve intersect, drop a line to x-axis
7.You’ll see important causes to the left, trivial to the right
Pareto Analysis:
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
A statistical approach to problem-solving that is oriented to focusing on the
potential issues causing the greatest effect.
33. Pareto Analysis:
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
A statistical approach to problem-solving that is oriented to focusing on the
potential issues causing the greatest effect.
Cause % of errors Cumulative % of errors
Network controller 35 % 35 %
File corruption 26 % 61 %
Addressing conflicts 19 % 80 %
Server OS 6 % 86 %
Scripting error 5 % 91 %
Untested change 3 % 94 %
Operator error 2 % 96 %
Backup failure 2 % 98 %
Intrusion attempts 1 % 99 %
Disk failure 1 % 100 %
34. Pareto Analysis:
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
0%
20%
40%
60%
80%
100%
120%
0%
5%
10%
15%
20%
25%
30%
35%
40%
N
etw
orkFile
corruption
ServerO
SScripting
error
Untested
changeO
peratorerrorBackup
failure
Intrusion
D
isk
failure
Cumulative %
% of Errors
Important Trivial
35. •A rational model that is well respected in
business management circles. An
important aspect of KT decision-making is
the assessment and prioritization of risk.
•KT is not about finding a perfect
solution, but rather the best possible
choice, based on actually achieving the
outcome with minimal negative
consequences.
Kepner-Tregoe (KT) Analysis:
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
36. Four steps in
Kepner-Tregoe
Decision Making
Potential problem analysis
Clarify the situation, outline concerns and
choose a direction.
Problem analysis
Define the problem and determine root cause.
Situational appraisal
Further scrutinize alternatives against
potential problems and negative
consequences to find the best.
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
Decision analysis
Identify alternatives and analyze risk for each.
37. Technique Purpose Pros Cons
Chronological Analysis
Useful for complex problems
with conflicting reports about
what happened
• Can provide a timeline to help discover causes
• Does a good job of documenting what/when and
where the event occurred
• Often times, little or no analysis occurs - events just recorded
• Creative thinking is limited to a pre-defined set of questions
• Can produce data that requires investigation and doesn’t lead
to clarity
Brainstorming Useful for generating ideas
• Easy
• Reduces domination
• Prioritizes ideas
• Process may appear too mechanical or rigid
Kepler & Tregoe
Useful when there are many
potential causes
• Mature technique
• Detailed
• Well-documented
• Can be time-consuming as you consider many possible
causes
Ishikawa Diagrams
Useful for identifying all
probable causes
• Act as a checklist of possible causes
• Works well with cross-functional teams
• Difficult to create a list of causes that can account for all
possible causes
• Identifying true root cause can be challenging
Pareto Analysis
Useful for identifying the most
important potential causes
• Statistical approach to problem-solving, leaves a
positive perception with stakeholders
• Intended to direct resources to most common
causes
• Limited by accuracy of the data used to create the histogram
• Best used as a tool to identify where to start your analysis
Five Why’s
Useful for identifying the root
cause on minor problems
• Most simplistic technique to use
• Identifies causal relationships
• Limited to the knowledge and experience of the problem
owner in determining root cause
• Not as useful for problems that require investigation by cross-
functional teams
Fault Tree Analysis
Useful for identifying links
between possible causes
• Works well to identify possible system or design
failures
• Works well to identify causal relationships
• Helps to determine if certain causal relationships are
probable
• Limited to known failure rates of components
• Best used to support root cause analysis
Problem Management Techniques:
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
38. • Big cool statistic
• 2,56
9
• Add-Ons in Marketplace
Next Steps:
What do you need to do to improve
your problem management process?
39. Assess current stateClarify goalsSponsorship
Next steps:
Clarify roles
Is it more available
services? Improved
productivity of staff and
customers?
Who is the champion
for this initiative?
What’s your maturity
state? Assess culture,
people and tools in
addition to process.
What is your strategy?
What are the process
relationships and roles
involved?
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
40. Successful problem-solving approach:
Copyright 2015 JPCGroup ‘Making Problem Management Work in Your Organization’
Quality Adoption Results
Problem-Solving
Skill Transfer
Coaching
+
Alignment of:
Processes and triggers
Expectations > Consequences >
Feedback Measurement
Documentation and Knowledge
Creation (software)
Role modeling (leadership)
Resolution time
Cost per incident/problem
CSAT