Delivering New Visibility and Analytics for IT Operations
TB8568_8568_Presentation
1.
2. Please give me your feedback
–Use the mobile app to complete a session survey
1. Access “My schedule”
2. Click on the session detail page
3. Scroll down to “Rate & review”
– If the session is not on your schedule, just find it via the Discover app’s “Session Schedule” menu, click on this session, and scroll down
to “Rate & Review”
– If you have not downloaded our event app, please go to your phone’s app store and search on “Discover 2016 Las Vegas”
– Thank you for providing your feedback,
which helps us enhance content for future events.
Session ID: TB8568 Speaker: Gary Brandt and Ronnie Falgout
#HPEDiscover
3. Fast-track the value of
central IT with HPE
Operations Analytics
Ronnie Falgout, IT Delivery Manager
Gary Brandt, OpsA Product Manager
#HPEDiscover
@HPE_Discover
June 2016
4. 4
Speaker biography/multiple speakers
Gary Brandt
Hewlett Packard Enterprise Software
gary.brandt@hpe.com
– Number of years in IT 17
– Previous experience (industry experience) 4
– Domain knowledge
– Operational Analytics
– IT Operations Management
– Enterprise Architecture
Ronnie Falgout
GIT Global Delivery
ronnie.falgout@hpe.com
– Number of years in IT 16
– Previous experience (industry experience) 26
– Domain knowledge
– IT Operations Management
– IT Automation and Service Management
– Operations Analytics
#HPEDiscover
5. Outside forces are disrupting businesses and government
Internet of Things,
explosion of devices
5
New, disruptive business
models
The Idea Economy
Cloud is redefining how applications
and devices are written and delivered
No business, industry or
government is safe
Turning ideas into new
products or services has
never been easier
#HPEDiscover
6. Inside forces are pushing you to evolve IT
6
Shadow IT is
everywhere
Technology is business
strategy
Developers are
the new Kingmakers
DevOps driving
culture shifts
#HPEDiscover
7. Distributed compute
Distributed systems and
containers
Distributed
data
Data locality
and latency
Multi-cloud
brokerage
Management and
governance
Continuous
delivery
DevOps
speed
Analytics and
visualization
Real-time and
predictive
7
Backend
Frontend
Devices
Humans
App-to-app
#HPEDiscover
Technology architectures are rapidly shifting
8. Traditional IT Digital enterprise
Provide hardened
systems and networks
Manage and mitigate risk
Efficiently host
workloads and services
Continuously create and
deliver new services
Store and manage data
Software automates
business systems
Software differentiates
products and services
Provide real-time
insight and understanding
IT must bridge the traditional and new
8
The right
balance between
traditional
and digital
#HPEDiscover
9. The customer transformation journey
Automate, orchestrate and transform
9
Traditional
IT
Digital
enterprise
Transform
delivery
Orchestrate
processes
Automate
tasks
Focus on user experience
Gain customer engagement and loyalty
Leverage Big Data
Realize continuous improvement
Once implemented, these
three steps will give you Efficiency Agility Experience
#HPEDiscover
10. IT Operations Management solutions
Simplified ITOM solution set
10
Automation
solutions
Intelligently drive efficiency across the
virtualized datacenter
Transform
solutions
Modernize customer experience for
cloud native and traditional applications
Orchestration
solutions
Increase speed of delivery in a
heterogeneous, hybrid cloud environment.
Datacenter automation
Operations bridge
Service management automation Cloud orchestration
Service Broker
User experience management
Solutions are delivered in a simple, consistent way to drive TTV.
SaaS I software appliance | remote managed service
System Network Storage
#HPEDiscover
12. HPE
Operations
Analytics
Market trends and growth drivers
ITOA
growth drivers
Customer
expectations
– Outdated systems
– Point tools limitations
– Complex diverse environments
– Flexible, scalable architecture
– Transform data into intelligence
– Problem detection and prediction
#HPEDiscover
13. Servers Network Storage
All green doesn’t always mean
all-clear
Analyzing performance problems is hard
13
Servers Network Storage
One single problem can
trigger multiple events
?
Limited
view into
resource utilization
Hidden
performance issues
and trends
Low
visibility across
OneView domains
#HPEDiscover
14. The answer lies in your data
But how do you make sense of it?
14
siloed data sources types of data of device types
of different operating
systems
data per server/day
Mobile app
Network
Cloud
System LOB data
Storage
#HPEDiscover
15. Results
Reduce outages
Faster resolution
Optimize
resources
Increased
productivity
Introducing HPE Operations Analytics
Predictive
analytics
Machine
learning
Relationship score
Automated
log and event
analysis
Anomaly
detection
and alerting
Visual
analytics
and RCA
HPE Operations
analytics
Standalone, scalable platform
HPE Vertica
Data types
Mobile app
Network
Cloud
System LOB data
Storage
#HPEDiscover
16. Behavioral learning
Clustering
Predicting
future behavior
Event analytics
Anomaly detection
Unstructured
text indexing,
search and inference
Machine learning powers HPE Operations Analytics
Developed in collaboration with HPE Labs
16
Machine
Learning
Predictive
analytics
Relationship score
Automated
log and event
analysis
Anomaly
detection
and alerting
Visual
analytics
and RCA
HPE Operations
Analytics
Standalone, scalable platform
Machine
learning
#HPEDiscover
18. HPE Operations Analytics
Key features
Log and event analytics
Focus on relevant items
for quicker resolution
Automated analysis of
logs and events
#HPEDiscover
19. HPE Operations Analytics
Key features
Root Cause Analysis (RCA)
Identify when problems start
Visual analytics
Clear, intuitive dashboards
Performance
heat map
Performance
overview
#HPEDiscover
20. HPE Operations Analytics
Key features
Advanced log search
Deep-dive into messages
Relationship score
Connection between metrics
Smart filter
Relationship score
#HPEDiscover
21. HPE Operations Analytics
Key features
Predictive analytics
Forecast future performance with one click
Anomaly alerting
Real-time problem warnings
Predict button
Dynamic
baselines
#HPEDiscover
22. HPE Operations Analytics
Standard use cases
22
Anomaly detection and
troubleshooting
Historical and
predictive analytics
Business insights
Big Data store and analysis
#HPEDiscover
23. HPE IT Operations Analytics
How HPE IT uses Big Data in IT operations
23
24. HPE IT key operational data
24
Number of incident
tickets per month
Average help desk
calls per month
Average number of
major incidents/
meetings per
month
Proactive monitoring
of planned changes
per month
Configuration items in uCMDB68,770
48,000
5,000,000
57,000
66,000
2,000
Servers
Network devices
Applications
900
300/800
Scheduled jobs executed per
month
19,000,000
Event notifications sent
per month
1,500
4 Private cloud
datacenters
>8,000 simulated
transactions
365 global
locations
4 traditional
datacenters
#HPEDiscover
25. Troubleshooting without Operations Analytics
– Many subject matter experts
involved in major incidents
– Manual analysis in isolation
– Manual correlation of data
– Long time to identify root cause
Operation support
team
Application SME
Network SME
Security SME
Server SME
Application ecosystem
Physical or virtual server
Business
application
Network
25
Storage
Storage SME
Database SME
#HPEDiscover
26. Troubleshooting with Operations Analytics
– All relevant data in a single dashboard
– Data is timely and correlated
– Data easily viewed in visual analytics
– Historical view of data instantly
available
– Faster time to identify root cause with
fewer people involved
OpsA Operation support
Application ecosystem
Physical or virtual server
Business
application
Network
Storage
8#HPEDiscover
27. HPE Operations Analytics in HPE IT
Trend
analysis
Predictive
insights
Anomaly
detection
Unknown
root cause
resolution
Application
(SiteScope)
Network
(Network Node Mgr +iSPIs)
Cloud VMs
(Operations Agents)
System Perf
(Operations Agents)
Third-party tools
(SCOM, Lync, Exchange)
Event Data
(OMi)
Metrics
Events
Topology
Logs
Log Data
(ArcSight)
Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able to take in
data from all sources and utilize different data types, not just performance metrics and events, but topology data and logs.
Big Data Store
Vertica
HPE Operations Analytics
Visual
analytics
Automated log
analytics
Predictive
analytics
Content
framework
Intelligent
search
Guided
troubleshooting
8 datacenters
2600 apps
25K databases
57K servers
5M objects
66K network devices
Custom Data
(CSV, XML)
#HPEDiscover
28. HPE IT use cases and scenarios
Widespread production network outage in HPE IT
28
29. Widespread production network outage in HPE IT
Transient network outage
Affecting multiple HPE work sites
“All hands on deck” production issue
Network monitoring causing event storm
Saturated support teams
Steam of incidents causing noise
29
1000s of critical network
events detected in BSM
#HPEDiscover
30. Analytics-based abnormality detection
30
Solution
Benefit
– OpsA is Big Data
– Correlate multiple sources together to narrow
problem
– Identifying patterns in huge amounts of
network syslog data
– Patterns reveal leading cause
Pattern detected in log data
that reveals the problem
Correlate metrics and logs
with problem time (network
events)
– Huge time savings (less than 30 min. to find
cause)
– Faster restoration of service
– Fewer SME required to troubleshoot
#HPEDiscover
31. HPE IT use cases and scenarios
Troubleshooting application and database
production issues
32. Troubleshooting application and database production issues
– Solving problems before performance is
affected
– Siloed teams mean no big picture
Challenge
17#HPEDiscover
ACT
33. Analyze millions of messages to reveal root cause with automated log
analytics
Automatically reveals time
and count of most significant
log events.
#HPEDiscover
34. Drill down to actual root cause log messages
Automatically reveals time
and count of most significant
log events.
View the log message
content to identify root cause
Number of occurrences of
significant log data over time
#HPEDiscover
35. Troubleshooting application and database production issues
35
– Real-time dashboards
– Automated log analytics
– Fast root cause identification (¼ the time)
– Fewer experts involved (5 SMEs to 1)
– Cuts order backlog by 50%
Solution
Benefit
#HPEDiscover
36. HPE IT 3PAR use case
OpsA increasing value to LOB
37. Hewlett Packard Enterprise 3PAR storage line of business
HPE premier storage business
Proactive “phone home” monitoring service available to 3PAR
customers
Service enables 3PAR customers with latest capabilities and
proactive protection of potential problems
HPE IT systems enable/support “phone home” services
37#HPEDiscover
38. Optimizing HPE 3PAR operations using Big Data analytics
– Isolate problems
– Reduction of file transfers late
– Reduce file transfer overdue
– Difficult to isolate problem
– Manually interpreting behavior
– Near real-time metric collection using
OpsA
– Define and measure big-picture view
of 3PAR ecosystem
– OpsA baselines defines “normal”
behavior
– OpsA guided troubleshooting
– Quickly identify what is not ‘normal’
– Faster to diagnose problems
– Eliminated manual efforts of
collecting and correlating data
– Decreased Mean Time To Recover
(MTTR)
38
Solution BenefitsBusiness challenge
#HPEDiscover
39. Use baselines to define “what’s normal”
Baselining metrics help IT define normal
behavior
Starting point for troubleshooting
39#HPEDiscover
40. Analyze the eco-system
40
– Define services that describe the ecosystem
– Quickly analyze the ecosystem
– Correlate metrics from disparate areas
– Identify areas of impacts
Network metrics
(nfs call rate)
Application metrics
(processing rate)
Database metrics
(active session count)
Application metrics
(file queues)
File system metrics
(disk queues)
41. Identify trends and take action before problem occurs
41
Dangerous rate of
file system growth
#HPEDiscover
42. HPE IT use cases and scenarios
Analytics for predictive anomaly detection
42
43. @HPE office@Remote
Remote Microsoft® Lync user experience
– Microsoft® Lync depended on daily by
thousands of HPE employees
– Complex infrastructure means difficulty
in diagnosing issues
Challenge
Total PSTN conferences/week 14,909
PSTN (public switched telephone network)
Mobile users
Remote road warrior
users
Lync
application
Coffee houses
Airports
14#HPEDiscover
44. OpsA advance troubleshooting
The correlation values vary from -1 to 1. The higher the absolute value of the correlation, the closer the relationship.
Metric A Metric BCalculated correlation
score of metrics A and B
Seeming dozens of unrelated metrics from
many disparate sources
Challenge
Analytic real-time scoring determines how closely
related multiple disparate metrics are to the
problem
Solution
Outcome 90% statistical correlation between “Bad Requests Received” and Lync Edge Server authentication failures.
#HPEDiscover
45. OpsA advance troubleshooting, another example
The correlation values vary from -1 to 1. The higher the absolute value of the correlation, the closer the relationship.
Outcome “Sends Outstanding” performance metrics correlate 100% with server network errors.
Metric A Metric B
Calculated correlation
score of metrics A and B
Use analytics to determine how closely related multiple disparate metrics are to the problem
#HPEDiscover
46. Analytics-based abnormality detection
46
take action before a problem occurs
Performance metrics Dynamic baseline
Zone of prevention
Fixed threshold
Solution
Benefit
Near real-time alerting of
anomaly trend
– Analytics to narrow focus of troubleshoot
– Abnormality detection
– Alerts triggered by anomaly trends
– Faster diagnostic time
– Less reaction, more prevention
– Automated correction
#HPEDiscover
47. HPE IT use cases and scenarios
Miscellaneous examples
47
49. HPIT uses analytics to detect dangerous event patterns
Identified HPIT application generating events at an increasing rate over 5 day period
Detect a specific HPIT application generating events in 90th percentile (i.e. ‘nosiest’ application).
Identify risk of dangerous pattern of a trend of events increasing spiking at midnight.
Support proactively took action before major problem occurred.
Breach in normal baseline
Breach in normal baseline
Dynamic normal baseline
#HPEDiscover
50. HPIT applies predictive analytics to prevent problems
Predictive views of server performance behavior under specific workloads
1
2
3
35% Increase
1 Dynamic baselines automatically created for all metrics collected for a server.
Server Memory Utilization metrics show increasing over time.
Applying predictive analytics on server’s memory metrics predicts a 35% increase in
memory utilization under current workload over next two weeks.
3
2
#HPEDiscover
51. Applying predictive analytics to key applications
Predictive views of HPIT applications performance behavior
Predict future performance
patterns based on
historical baselines.
#HPEDiscover
52. HPE IT’s OpsA journey
Continuous progress
52
Early 2014 Mid 2014 Late 2014 2015 2016 (Next)
New data sources
Expand to key
Applications
Sitescope
Integration
Application metrics
Introduce predictive
capabilities into
support
New data sources
Integrated OMi
Event data
Integrated Network
metrics
~60K devices
Database Logs
Analyics
~20K DBs
Analytics on key
applications and
Business (i.e. 3PAR)
OpsA PoCs
Troubleshooting
Microsoft®
Exchange Proof of
Concept
Expanded IT
Private Cloud 18K
virtual servers
Introduced OpsA
Server metrics IT
Private Cloud (~10K
virtual servers).
Cloud Infrastructure
Support team
Expand coverage
Traditional server
metrics (40K
servers) and virtual
cloud (+20K VMs)
Network
Outage
Opportunity
Apply analytics to
large scale
production network
outage.
Global Telecom
Support team
Continue
expansion
Predictive alerting
Event Analytics
Integration into
HPE Helion Cloud
OpsA PoCs
Troubleshooting and
Anomaly detection
Microsoft® Lync
Proof of Concept
#HPEDiscover
53. 8 datacenters
HPE IT Operations Analytics solution
OpsA highly
scalable
collection
framework
Integration
with ArcSight,
OMi, Sitescope,
BPM,
Logstash,
JDBC,
TCP/UDP,
REST WS
Big data
analytics
platform
Highly scalable
Cluster-based
Column -
oriented
Visual Analytics
Play back
dashboard
results
Phrased
Search
Guided
Troubleshooting
User defined
topologies
Predictive
analytics
Industry
analytics (R
packages)
Pattern
detection via
correlation
coefficient
Abnormality
behavior
detection
Automated
machine
learning drives
log and event
analytics
Collection Vertica Analytics VisualizationForensicsEnvironment Users
2600 apps
25K databases
66K network devices
56K servers
5M objects
38#HPEDiscover
55. Get more information
55
Attend these sessions:
– HOL9100 Go hands-on with HPE Operations Analytics; reveal
what’s hidden in your data
– BB8013 HPE Operations Analytics; providing validity to
Safeguard Properties’ monitoring footprint
– RT 9084 Breaking Bad processes; align central IT and the
business using HPE Operations Analytics
– RT9083 Increase the efficiency of support teams with
automated Analytics-as-a-Service
Visit these demos: Follow us on Social Media:
– DEMO8816 HPE Operations Analytics;
automated machine learning and predictive
analysis at the speed of business
– TPS9206 The Future of Operations Analytics
– Twitter @HPE_ITOps
– LinkedIn linkedin.com/company/hpe-software
– Facebook facebook.com/HPESoftware
– Blog http://hpsw.co/BSMblog
#HPEDiscover