"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning", Marco Gessner, Corporate Systems Engineer at HPE
1. From Big Data To Big Value
with HPE Vertica’s Predictive Analytics &
Machine Learning
Marco Gessner, Corporate Systems Engineer – marco.gessner@hpe.com
Zurich, 17th Nov 2016
2. The world is changing and accelerating
Big Data is no longer just a Buzzword – It’s EVERYWHERE and growing … and this is just
what’s structured …
2020*
40 ZB
2005 2010
2012
2015
8.5 ZB
2.8 ZB1.2 ZB0.1 ZB
Volume
Variety
Value??
IDC Estimates that by 2020, business transactions on the
internet - business-to-business and business-to-consumer
- will reach 450 billion per day.
*Source : IDC Digital Universe in 2020
Mobility
Big Data
Cloud
MobileTransactional
Data
CRM, SCM, ERP
$ € ¥
Social MediaIT Ops
Velocity
Log
Files
Sensors,
Counters
3. Enterprises realize only 10-15% of the value expected
on their big data investments
Translating Data to ValueTechnology GapSilos and Lack of Alignment
Barriers:
55%
“determining how to get
value from big data” - top 3
challenge with big data***
41%
don't know if big data ROI
will be positive or negative****
57%
“obtaining the necessary
skills and capabilities
needed” - top 3 challenge
for Hadoop**
41%
systems cannot process
large volumes of data
from different sources*
*** Gartner – Survey Analysis: Practical Challenges Mount as Big Data
Moves to Mainstream – 9/2015
****Lisa Kart, Gartner – Big Data Industry Insights – presentation
* PWC – Capitalizing on the promise of Big Data – 1/2013
** Gartner – Survey Analysis: Hadoop Adoption Drivers and Challenges
– 5/2015
4. Business Value
Level of Intelligence
Prescriptive Analytics
How can we make it
Happen?
Predictive Analytics
What will Happen?
Diagnostic Analytics
Why did it Happen?
Descriptive Analytics
What Happened?
1
2
3
Predictive /
Machine Learning
4
Business
Intelligence
Big Data Value Model
5. A Real-World Example:
Smartmeter data for 38 million households in a country for 13 months
counter_id |reading_ts |notification_ts |tc|reading |power
100012700004100|2008-11-30 23:00:05|2008-11-30 23:01:29|FT|271006206|273
100012700014100|2008-12-01 05:30:15|2008-12-01 05:33:53|HT|203859364|294
100012700014100|2008-12-01 21:30:17|2008-12-01 21:34:27|LT|922915648|1472
5
• 73 bytes per line every 10 minutes
• X 144 lines per day (6 per hour x 24)
• X 38,000,000 households
• X 395 13 months’ data retention
= 157,785,120,000,000 Bytes – ~158
Terabytes
April 10, 2015 HP Confidential
7. The Vertica Real-Time Analytics Engine
Confidential 7
Leverages BI, ETL,
Hadoop/MapReduce and
OLTP investments
No disk I/O bottleneck
simultaneously load &
query
Native DB-aware
clustering on low-cost
x86 Linux nodes
Built-in redundancy that
also speeds up queries
Automatic setup,
optimization, and DB
management
Up to 90% space
reduction using 10+
algorithms
50x – 1000x faster
than traditional
RDBMS
Scales fromTB to PB
with industry-
standard hardware
Simple integration
with existing ETL and
BI solutions
SQL-99+ compliant
Ultimate deployment
flexibility
Extended advanced
analytics
24/7 Load & Query
8. #SeizeTheData
Building Machine Learning (ML) into the Core of Vertica
- Run in parallel across hundreds of nodes in a
Vertica cluster
- Eliminating all data duplication typically required
of alternative vendor offerings
- No need to “down-sampling” which can lead to
less accurate predictions
- A single system for SQL analytics and Machine
Learning
Confidential 8
Node 1 Node 2…. Node n
New capabilities deliver predictive analytics at speed and scale
9. A few ML & PA functions for answering various business
questions
– Classification / Scoring
– Who will churn, fraud or buy next week, next month ?
– Regression
– How many products will a customer buy next month, next quarter ?
– Segmentation / Clustering
– What are the groups of customers with similar behavior or profile ?
– Forecasting
– How much will be the monthly revenue or number of churners next year ?
– Recommendations
– What is the best offer or action for a customer or internet user ?
9
10. Some ML & PA Use Cases by Industry
Customer churn, network
optimization (forecast system load),
cross/up selling, customer retention,
network fraud detection
Credit risk management, anti-money
laundering, fraudulent card usage
detection; Identify key behaviors of
customers likely to leave the bank;
Forecasting, inventory planning,
cross/up selling, customer
segmentation, market basket
analysis; Intelligent selection of store
locations based on demographics;
Logistics optimization, fraud
prevention; Predict community
movement and trends that affect
taxing districts; anticipate revenue
Health management, fraud
prevention; Medical: predict the
causes, likelihood and spread of
disease, genome analysis, research.
Customer profitability, fraudulent claims
detection and prevention.
Maybe creating insurance tariffs in the
first place: The ML algorithms were done
by hand when insurances began in
Venice and London in the 17th century.
Market Demand forecasting, Launch
Analysis (predict best selling
configuration); service parts
optimization; Customer satisfaction;
production predictive maintenance;
React on customer tastes.
Price optimization, assortment planning,
forecasting; predictive maintenance
Predictive asset maintenance, market
and credit risks; Forecast demand and
usage for seasonal operations; provide
anticipated resources. Forecast of
electricity consumption by geography for
powering up or throttling nuclear power
stations 20 hours in advance.
Utilities
Healthcare
Public Sector
Telecommunications
Banking/Finance
Insurance
Manufacturing/Wholesale
Automotive
Retail/CPG
11. Use Cases by LoB….
Sales, Service, Finance & Marketing
Market basket analysis
Customer loyalty
programs
Cross-sell and up-sell
opportunities
Marketing campaign
response rates
Better pipeline and
revenue forecasting
Service and maintenance
staffing and planning
Logistics and inventory
management
Predictive asset
maintenance
Human Resources
Accurate prediction of churn/retirement
for staffing and planning; enable proactive
churn prevention by targeting likely churn
candidates = greater retention of top
employees
Recruitment (headcount) and retention
planning
Employee performance and productivity:
identify factors influencing high
performance
Workforce training enablement and
effectiveness
Succession planning
Operations
More accurate orders
based on customer
demand and ensure
inventory adequately
positioned to satisfy
demand
Increase turn, with more
efficient balance of
stocks based on demand
More accurate forecasting to enable
efficient recruiting
Enable better operational planning for
new hires (e.g. facilities requisitions,
space, equipment)
Significant improvements
to the earlier (beginning of
the quarter) Pipeline
Forecasting accuracy
Improved Revenue Quality
through increased
effectiveness of Sales and
Marketing
Better Service profitability
through improved renewal
and pricing
Increase in revenue from
segmented customers and
targeted campaigns
IT
Optimize staffing to handle
trend/seasonality in support
demand. (avoid over-staffing,
reduce cost; avoid under-
staffing, improve customer
satisfaction)
Proactive predictive
maintenance prevents costly
unscheduled downtime due to
equipment failure; enable
scheduling of planned
maintenance windows
Support/call center analysis
Asset utilization demand
planning
Procurement
planning/forecasting
Project optimization and
assessment
Anticipate peek
performance issues and root
cause
Read how Google is using advanced
analytics to improve their HR:
http://online.wsj.com/article/SB12426
9038041932531.html?hat_input=googl
e+staffing#articleTabs=articles
12. #SeizeTheData
Machine Learning Pack – 8.0
Algorithm Model Training Prediction Evaluation
Linear Regression
Logistic Regression
K-means
Confidential 12
Model
Management
Summarize
models
Rename models Delete models
Data Preparation
Normalization Imbalanced data
processing
Sampling
R integration
13. Linear Regression Use Cases
Real Estate
Model residential home prices
(response) as a function of the
home’s living area, number of
bedrooms, number of bathrooms
and so on (predictors)
Demand Forecasting
Model the demand for a service or
good (response) based on its
features (predictors); for example,
demand for different models of
laptops based on monitor size,
weight, price, operating system,
etc.
Manufacturing
Determine linear relationship
between the compressive
strength of concrete (response)
and varying amounts of its
components (predictors) like
cement, slag, fly ash, water, super
plasticizer, coarse aggregate, etc.
Confidential 13
14. Logistic Regression Use Cases
Finance
Use a loan-applicant’s credit
history, income, and loan
conditions (predictors) to
determine probability that
applicant will default on loan
(response). The result can be
used for approving, denying, or
changing loans terms
Engineering
Predicting the likelihood that a
particular mechanical part of a
system will malfunction or require
maintenance (response) based on
operating conditions and
diagnostic measurements
(predictors)
Confidential 14
Medicine
Determine the likelihood of a
patient’s successful response to a
particular medicine or treatment
(response) based on factors like
age, blood pressure, smoking and
drinking habits (predictors)
15. K-means Clustering Use Cases
Customer Segmentation
Segment customers and buyers
into distinct groups (cluster)
based on similar attributes like
age, income, product preferences,
etc. in order to target promotions,
provide support and explore
cross-sell opportunities
Fraud Detection
Identify individual observations
that don’t align to a distinct group
(cluster) and identify types of
clusters that are more likely to be
at risk of fraudulent behavior
Confidential 15
16. Architected to embrace an ecosystem of innovation
Advanced
analytics
Cloud
BI/visualization
Platform
Data
transformation
HPE Vertica
17. HPE Big Data Software
Performance and scale
Consume and deploy anywhere
Open source without compromise
Machine Learning accessible to all
Advanced analytics from the core
Analytics is everything we do
19. HPE Vertica empowers Philips Transform From Reactive to Proactive Customer Service
19
Big Data & Analytics
Predictive Maintenance Platform
Philips Predictive Maintenance Platform
HPEVertica technology empowers Philips transform
Customer Services from Reactive to Proactive using Big
Data andAnalytics.
• 24 different data sources (events, errors, sensor,
business data) integrated into a singleVertica
database
• 10K+ connected MRI systems
• 38 Predictive and proactive maintenance models,
25 additional models in development
• 140 billion rows and growing fast
• 60+TB historic data
• 1M weekly system files processed
Business Outcomes
- Decreased unplanned downtime
- Increased scheduled maintenance
- Improved customer satisfaction
- Input for R&D to optimize product quality
- Foundation for many added-value services
System problem Dispatch FSECustomer call On site diagnosis Parts delivery Repair/replace System functional
Reactive
Problem avoidedFailure prediction Scheduled service
Proactive
20. Community Edition
- Free Download 1TB, 3 nodes
- my.vertica.com/
Learn More About – and Try! - HPE Vertica