SlideShare a Scribd company logo
1 of 31
Machine Learning in Big Data
- Look forward or be left behind
V. William Porto
Hadoop Summit Dublin 2016
Overview of RedPoint Global
2  RedPoint Global Inc. 2016 Confidential
Launchedin2006
Foundedandstaffedbyindustryveterans
Headquarters: Wellesley,Massachusetts
OfficesinUS,UK,Australia,Philippines
Globalcustomerbase
Servesmostmajorindustries
Overview of RedPoint Global
3  RedPoint Global Inc. 2016 Confidential
MAGIC QUADRANT
Data Quality
MAGIC QUADRANT
Integrated Marketing
Management
MAGIC QUADRANT
Multichannel Campaign
Management
MAGIC QUADRANT
Digital Marketing Hubs
FORRESTER WAVE™
Cross-channel
Campaign Management
FORRESTER WAVE™
Data Quality Solutions
4  RedPoint Global Inc. 2015 Confidential
With apologies to Gary Larson
Hadoop
5  RedPoint Global Inc. 2015 Confidential
Machine Learning – why bother?
If you have always done it that way, it is probably wrong” - Charles Kettering
6  RedPoint Global Inc. 2015 Confidential
Machine Learning – keeping ahead of the curve
• Three basic tenants for success in today’s world
• Prediction - you need to learn and use what you’ve learned
• Optimization - the world is a dynamic place
• Automation - because people don’t scale well
7  RedPoint Global Inc. 2015 Confidential
Machine Learning – what really is it all about?
• Learning vs. instruction
• Humans learn instinctively – computers not so much
• Intelligent Systems
• Memory
• Prediction (modeling)
• Assessment
• Feedback
• Adaptation
8  RedPoint Global Inc. 2015 Confidential
Data Modeling – what, why, how
• Regression – what happened in the past
• Prediction – what will happen in the future
“Prediction is very difficult – especially if it’s about the future”
- Nihls Bohr
9  RedPoint Global Inc. 2015 Confidential
Data Modeling – what, why, how
The wide world of data modeling
• Supervised models
• you have historical data and known correlated outputs (truth)
• Unsupervised models
• historical data, but may not have (or trust) associated outputs
10  RedPoint Global Inc. 2015 Confidential
Decision Trees
Major Assumption: the world is discrete
• fast, easy to understand, no linearity assumptions
• ‘human time’ required, unbalanced and/or large trees
11  RedPoint Global Inc. 2015 Confidential
Standard Linear Models
Assumption: the world is linear
• the real world really isn’t linear
• all errors are not all equal
• easy to get misleading results
? !
Which line is best?
12  RedPoint Global Inc. 2015 Confidential
Generalized ‘Non-Linear’ Models
Assumptions
• underlying functional mapping is known
• all errors are equal
• data is ‘well-conditioned’
• ‘standard’ error distribution
• Polynomials
• Exponentials (e.g., Gaussian, Poisson)
• Piece-wise linear
13  RedPoint Global Inc. 2015 Confidential
Non-Linear Models
Assumption: data is representative
• ‘universal’ modeling tools
• fast execution
• no linearity assumptions
• lots of parameters, many techniques
• difficult to explain
Artificial Neural Network
14  RedPoint Global Inc. 2015 Confidential
User Story: Predict Retention / Attrition
Historical Behavioral Data
Customer
Rating
Retention Customer Name
Loyalty
Member
Days Since
Last Purchase
Immediate
Relatives
Household
Children
Customer ID
Latest
Purchase
Price
Latest
Purchase
Item ID
Region
Code
Customer
Capture
Method
Customer
Contact Code
Domicile
1 1 Allen, Geraldine yes 29 0 2 24160 211.39 B5 MW 2 6 St Louis, MO
1 1 Anderson, Harry no 48 0 3 19952 26.55 E12 NE 3 New York, NY
1 1 Andrews, Cynthia yes 63 1 0 13502 77.95 D7 NE 10 6 Hudson, NY
1 0 Andrews, Thomas Jr no 39 0 0 112050 0 A36 SW Los Angeles, CA
1 1 Appleton, Mary yes 53 2 3 11769 51.49 C101 NE D Bayside, Queens, NY
1 0 Ashbury, Jeffrey no 47 1 0 PC 17757 29.99 C62 C64 NE 124 New York, NY
1 1 Aston, Mrs. yes 18 1 0 PC 17757 29.99 C62 C64 NE 4 New York, NY
1 1 Barber, Ellen yes 26 0 2 19877 78.85 S 6
1 1 Barkley, Henry no 80 0 0 27042 30 A23 NE B Yorktown, PA
1 0 Baumann, David no 0 0 PC 17318 25.99 NE New York, NY
1 1 Bazzeno, Alice yes 32 0 1 11813 76.95 D15 C 8 34
1 0 Beattie, Mr. Samuel no 36 0 0 13050 75.29 C6 C A 11 Winnipeg, MN
1 1 Beckworth, June yes 47 1 1 11751 52.49 D35 NE 5 New York, NY
1 1 Behr, John no 26 0 0 111369 30 C148 NE 5 New York, NY
1 1 Biden, Roseanne yes 42 0 0 PC 17757 127.99 C 4
1 1 Bird, Ellen yes 29 0 0 PC 17483 18.95 C97 S 8
1 0 Birnbaum, Jason no 25 0 0 13905 26 C 148 San Francisco, CA
15  RedPoint Global Inc. 2015 Confidential
User Story: Predict Customer Retention / Attrition
Machine Learning Processing Chain - Training
16  RedPoint Global Inc. 2015 Confidential
User Story: Predict Retention / Attrition
Machine Learning Processing Chain - Prediction
Reward predicted
‘retainees’ with
targeted product
offerings
Give potential attrition
customers special
incentives to stay with
the business
17  RedPoint Global Inc. 2015 Confidential
User Story: Accurate vs. Useful Prediction
Sparse data + Least-Squares (Linear) Classifier
• Task: predict chance of purchasing a sundry item
• Result: ‘best’ model always predicts “none”
• Analysis: LS algorithm assumes all errors are equal
Bread
Cake &
Pie
Chocolate Coffee Cookie Diesel
Juice &
Smoothies
Lubricants Milk
Other
Bakery
Premium Sandwich Snack Tea
Total
Transaction
Total
Revenue
0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3000
0 0 0 0 0 3 0 0 0 0 0 0 0 0 3 2000
0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 1800
0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 4800
0 0 0 2 0 0 0 0 0 0 0 0 0 0 2 100
0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1828
0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 16460
0 0 0 0 0 2 0 0 0 0 0 0 0 0 2 1000
0 0 0 0 0 2 0 0 0 0 0 0 0 0 2 1500
0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 4600
0 0 0 0 0 11 0 0 0 0 0 0 0 0 11 19381.5
0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1860
0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3000
0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 9838.82
0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 11000
0 0 0 0 0 5 0 0 0 0 0 0 0 0 19 18225
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 500
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 800
0 0 0 0 0 0 0 0 0 0 0 1 0 0 7 7990
0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 3820
0 0 0 0 0 1 0 0 0 0 0 0 0 0 55 43230
18  RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation – group think
Collaborative Filtering
Relationship Matrix
19  RedPoint Global Inc. 2015 Confidential
Personalization – not really
!=
20  RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation
Similarity?
Customer Browser Gender
Age
Sector
Income
Sector
Married Children Homeowner
Recent Baby
Clothes
Purchase
George IE9 M 0 A N 0 1 N
Carol Chrome F 1 B Y 1 0 Y
Mary IE9 F 0 A N 1 0 Y
Dist(George,Carol) = 8
Dist(George,Mary) = 4
Dist(Carol,Mary) = 4
Can you afford to target (George,Mary) the
same way as (Carol,Mary) ?
21  RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation
Basic Question – which one describes the data the best?
Raw data
How many clusters are there ?
Two Clusters
Four Clusters
Six Clusters
22  RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation with Statistics
• relatively simple
• data distribution assumptions
• initialization dependencies
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100
Raw Data
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100
Ellipsoidal Clustering
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100
K-Means Clustering
23  RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation – data driven
• let the data speak for itself
• multiple data projection ‘views’
• important boundary relationships
(“swing voters”)
Customer Demographics
24  RedPoint Global Inc. 2015 Confidential
User Story: Clustering / Segmentation
ML Clustering - Training ML Clustering – Processing New Data
25  RedPoint Global Inc. 2015 Confidential
Model Selection – how to choose?
• Basic Model Type (prediction or segmentation)
• inputs + correlated outputs
• inputs only?
• Basic Questions:
• what to use for my problem?
• parameters?
• is this the best choice?
• could I do better, and how?
26  RedPoint Global Inc. 2015 Confidential
Optimization – Evolving better solutions
• Simulated Evolution
• fast, efficient search
• always have a solution
• arbitrary ‘evaluation’ functions
• can start with existing solution(s)
• Variation – alter model type, parameters
• Assessment – how well does the model work?
• Selection – survival of the fittest
27  RedPoint Global Inc. 2015 Confidential
Evolutionary Optimization – Evaluation Function
• can use any measureable data
• no continuity assumptions
• no differentiability assumptions
• no symmetry assumptions
Sunshine Hurricane
20 -1000
5 50
Sunshine
Hurricane
Prediction
Reality (Truth)
28  RedPoint Global Inc. 2015 Confidential
User Story: Optimizing Classification Models
Task: Predict Retention/Attrition
62.00
70.2
72.3 73.4 75.2
34.8
28.8
24.5
22.1 20.9
0.00
20.00
40.00
60.00
80.00
100.00
0 1 2 3 4 5 6
Performance
Generation
Model Performance Optimization
Classification Accuracy
Test Set Error (RMS)
17 Potential input features
(customer demographics)
2 outputs (retention/attrition)
1300 Training Samples (50 – 50, A / B Split)
1300 Test Samples ( naïve test data )
29  RedPoint Global Inc. 2015 Confidential
Use Case – Fully Adaptive Feedback (Next Best Offer)
DB
Historical User
Behavior
(stimulus/response)
Train / Update
Model
Non-Adaptive
(Fixed) Mode
Randomized A/B/C
Offer Selection
Adaptive
ML Mode
ML Prediction
Offer Selection
Operation
(Trigger)
Ad / Offer
(stimulus)
Feedback
Cycle
30  RedPoint Global Inc. 2015 Confidential
Five Keys to Successful Machine Learning
• Let the data speak for itself – don’t force fit your models
• Remember, all errors are not all equal – use this to your advantage
• True learning requires continual adaptation !
• Automate the process with feedback – remove the “man-in-the-loop”
• Trust the optimization process – it really works!
31  RedPoint Global Inc. 2015 Confidential
Q&A
Contact Info
Visit : www.redpoint.net
Bill Porto
Sr. Engineering Analyst
RedPoint Global Inc.
vwporto@redpoint.net
Want More Information about this topic?
Fill out your card or go to redpoint.net/hadoopeurope

More Related Content

What's hot

FACE RECOGNITION USING NEURAL NETWORK
FACE RECOGNITION USING NEURAL NETWORKFACE RECOGNITION USING NEURAL NETWORK
FACE RECOGNITION USING NEURAL NETWORKcodebangla
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Simplilearn
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Machine Learning Ml Overview Algorithms Use Cases And Applications
Machine Learning Ml Overview Algorithms Use Cases And ApplicationsMachine Learning Ml Overview Algorithms Use Cases And Applications
Machine Learning Ml Overview Algorithms Use Cases And ApplicationsSlideTeam
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project LifecycleJason Geng
 
Responsible AI
Responsible AIResponsible AI
Responsible AINeo4j
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentationDavid Raj Kanthi
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Suraj Aavula
 
Issues in knowledge representation
Issues in knowledge representationIssues in knowledge representation
Issues in knowledge representationSravanthi Emani
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)Fellowship at Vodafone FutureLab
 
Ppt on data science
Ppt on data science Ppt on data science
Ppt on data science Ansh Budania
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
 
Machine learning Algorithm
Machine learning AlgorithmMachine learning Algorithm
Machine learning AlgorithmMd. Farhan Nasir
 
One R (1R) Algorithm
One R (1R) AlgorithmOne R (1R) Algorithm
One R (1R) AlgorithmMLCollab
 
Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2DigiGurukul
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationVikas Jain
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
Uncertainty Quantification in AI
Uncertainty Quantification in AIUncertainty Quantification in AI
Uncertainty Quantification in AIFlorian Wilhelm
 

What's hot (20)

FACE RECOGNITION USING NEURAL NETWORK
FACE RECOGNITION USING NEURAL NETWORKFACE RECOGNITION USING NEURAL NETWORK
FACE RECOGNITION USING NEURAL NETWORK
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Machine Learning Ml Overview Algorithms Use Cases And Applications
Machine Learning Ml Overview Algorithms Use Cases And ApplicationsMachine Learning Ml Overview Algorithms Use Cases And Applications
Machine Learning Ml Overview Algorithms Use Cases And Applications
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Responsible AI
Responsible AIResponsible AI
Responsible AI
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Issues in knowledge representation
Issues in knowledge representationIssues in knowledge representation
Issues in knowledge representation
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
 
Ppt on data science
Ppt on data science Ppt on data science
Ppt on data science
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Machine learning Algorithm
Machine learning AlgorithmMachine learning Algorithm
Machine learning Algorithm
 
One R (1R) Algorithm
One R (1R) AlgorithmOne R (1R) Algorithm
One R (1R) Algorithm
 
Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Uncertainty Quantification in AI
Uncertainty Quantification in AIUncertainty Quantification in AI
Uncertainty Quantification in AI
 

Viewers also liked

Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyHarnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyDataWorks Summit
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016alanfgates
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016alanfgates
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalCaserta
 
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016alanfgates
 
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015alanfgates
 
Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016alanfgates
 
Hortonworks apache training
Hortonworks apache trainingHortonworks apache training
Hortonworks apache trainingalanfgates
 
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015alanfgates
 
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...DataWorks Summit
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big DataDataWorks Summit
 
Py data scikit-production
Py data scikit-productionPy data scikit-production
Py data scikit-productionTuri, Inc.
 

Viewers also liked (17)

Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyHarnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case Study
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Fast Distributed Online Classification
Fast Distributed Online Classification Fast Distributed Online Classification
Fast Distributed Online Classification
 
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobal
 
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
 
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
 
Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Hortonworks apache training
Hortonworks apache trainingHortonworks apache training
Hortonworks apache training
 
The Heterogeneous Data lake
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
 
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015
 
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 
Real-World NoSQL Schema Design
Real-World NoSQL Schema DesignReal-World NoSQL Schema Design
Real-World NoSQL Schema Design
 
Py data scikit-production
Py data scikit-productionPy data scikit-production
Py data scikit-production
 

Similar to Machine Learning in Big Data

Propensity Modelling for Banks
Propensity Modelling for BanksPropensity Modelling for Banks
Propensity Modelling for BanksProfinit
 
Advanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesAdvanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesDATAVERSITY
 
Presentation for the Nexus Conference on the Internet of Things and the Evolu...
Presentation for the Nexus Conference on the Internet of Things and the Evolu...Presentation for the Nexus Conference on the Internet of Things and the Evolu...
Presentation for the Nexus Conference on the Internet of Things and the Evolu...Lora Cecere
 
Supply Chain 2030: Presentation by Lora Cecere at CLX Conference
Supply Chain 2030: Presentation by Lora Cecere at CLX ConferenceSupply Chain 2030: Presentation by Lora Cecere at CLX Conference
Supply Chain 2030: Presentation by Lora Cecere at CLX ConferenceLora Cecere
 
Optimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deckOptimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deckTamrMarketing
 
Business and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupBusiness and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupKen Tucker
 
20151008 REx Predictive presentation v 1 0 - distributed
20151008 REx Predictive presentation v 1 0 - distributed20151008 REx Predictive presentation v 1 0 - distributed
20151008 REx Predictive presentation v 1 0 - distributedJefferson Lynch
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
The New Tech Toolbelt: Digital Twins, IoT, Cobots, & More
The New Tech Toolbelt: Digital Twins, IoT, Cobots, & MoreThe New Tech Toolbelt: Digital Twins, IoT, Cobots, & More
The New Tech Toolbelt: Digital Twins, IoT, Cobots, & MoreAggregage
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?Ganes Kesari
 
Future of Supply Chain Technologies
Future of Supply Chain TechnologiesFuture of Supply Chain Technologies
Future of Supply Chain TechnologiesLora Cecere
 
The Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninThe Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninInside Analysis
 
Entering the Data Analytics industry
Entering the Data Analytics industryEntering the Data Analytics industry
Entering the Data Analytics industryGramener
 
Clarity First - Problem Solving
Clarity First - Problem Solving Clarity First - Problem Solving
Clarity First - Problem Solving TKMG, Inc.
 
Powering Supply Chain Transformation Through Analytics Innovation
Powering Supply Chain Transformation Through Analytics InnovationPowering Supply Chain Transformation Through Analytics Innovation
Powering Supply Chain Transformation Through Analytics Innovationloracecere1
 
Organisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven PracticesOrganisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven PracticesBarry Magee
 
Organisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven PracticesOrganisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven PracticesBarry Magee
 
PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)
PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)
PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)Chief Analytics Officer Forum
 

Similar to Machine Learning in Big Data (20)

Propensity Modelling for Banks
Propensity Modelling for BanksPropensity Modelling for Banks
Propensity Modelling for Banks
 
Advanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesAdvanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use Cases
 
Presentation for the Nexus Conference on the Internet of Things and the Evolu...
Presentation for the Nexus Conference on the Internet of Things and the Evolu...Presentation for the Nexus Conference on the Internet of Things and the Evolu...
Presentation for the Nexus Conference on the Internet of Things and the Evolu...
 
Supply Chain 2030: Presentation by Lora Cecere at CLX Conference
Supply Chain 2030: Presentation by Lora Cecere at CLX ConferenceSupply Chain 2030: Presentation by Lora Cecere at CLX Conference
Supply Chain 2030: Presentation by Lora Cecere at CLX Conference
 
Optimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deckOptimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deck
 
Business and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupBusiness and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April Meetup
 
20151008 REx Predictive presentation v 1 0 - distributed
20151008 REx Predictive presentation v 1 0 - distributed20151008 REx Predictive presentation v 1 0 - distributed
20151008 REx Predictive presentation v 1 0 - distributed
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
The New Tech Toolbelt: Digital Twins, IoT, Cobots, & More
The New Tech Toolbelt: Digital Twins, IoT, Cobots, & MoreThe New Tech Toolbelt: Digital Twins, IoT, Cobots, & More
The New Tech Toolbelt: Digital Twins, IoT, Cobots, & More
 
Big Data and E-Commerce
Big Data and E-CommerceBig Data and E-Commerce
Big Data and E-Commerce
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?
 
Future of Supply Chain Technologies
Future of Supply Chain TechnologiesFuture of Supply Chain Technologies
Future of Supply Chain Technologies
 
The Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninThe Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine Learnin
 
Entering the Data Analytics industry
Entering the Data Analytics industryEntering the Data Analytics industry
Entering the Data Analytics industry
 
Clarity First - Problem Solving
Clarity First - Problem Solving Clarity First - Problem Solving
Clarity First - Problem Solving
 
Why Embracing Digital Transformation Keeps Manufacturers Ahead of the Competi...
Why Embracing Digital Transformation Keeps Manufacturers Ahead of the Competi...Why Embracing Digital Transformation Keeps Manufacturers Ahead of the Competi...
Why Embracing Digital Transformation Keeps Manufacturers Ahead of the Competi...
 
Powering Supply Chain Transformation Through Analytics Innovation
Powering Supply Chain Transformation Through Analytics InnovationPowering Supply Chain Transformation Through Analytics Innovation
Powering Supply Chain Transformation Through Analytics Innovation
 
Organisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven PracticesOrganisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven Practices
 
Organisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven PracticesOrganisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven Practices
 
PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)
PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)
PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Machine Learning in Big Data

  • 1. Machine Learning in Big Data - Look forward or be left behind V. William Porto Hadoop Summit Dublin 2016
  • 2. Overview of RedPoint Global 2  RedPoint Global Inc. 2016 Confidential Launchedin2006 Foundedandstaffedbyindustryveterans Headquarters: Wellesley,Massachusetts OfficesinUS,UK,Australia,Philippines Globalcustomerbase Servesmostmajorindustries
  • 3. Overview of RedPoint Global 3  RedPoint Global Inc. 2016 Confidential MAGIC QUADRANT Data Quality MAGIC QUADRANT Integrated Marketing Management MAGIC QUADRANT Multichannel Campaign Management MAGIC QUADRANT Digital Marketing Hubs FORRESTER WAVE™ Cross-channel Campaign Management FORRESTER WAVE™ Data Quality Solutions
  • 4. 4  RedPoint Global Inc. 2015 Confidential With apologies to Gary Larson Hadoop
  • 5. 5  RedPoint Global Inc. 2015 Confidential Machine Learning – why bother? If you have always done it that way, it is probably wrong” - Charles Kettering
  • 6. 6  RedPoint Global Inc. 2015 Confidential Machine Learning – keeping ahead of the curve • Three basic tenants for success in today’s world • Prediction - you need to learn and use what you’ve learned • Optimization - the world is a dynamic place • Automation - because people don’t scale well
  • 7. 7  RedPoint Global Inc. 2015 Confidential Machine Learning – what really is it all about? • Learning vs. instruction • Humans learn instinctively – computers not so much • Intelligent Systems • Memory • Prediction (modeling) • Assessment • Feedback • Adaptation
  • 8. 8  RedPoint Global Inc. 2015 Confidential Data Modeling – what, why, how • Regression – what happened in the past • Prediction – what will happen in the future “Prediction is very difficult – especially if it’s about the future” - Nihls Bohr
  • 9. 9  RedPoint Global Inc. 2015 Confidential Data Modeling – what, why, how The wide world of data modeling • Supervised models • you have historical data and known correlated outputs (truth) • Unsupervised models • historical data, but may not have (or trust) associated outputs
  • 10. 10  RedPoint Global Inc. 2015 Confidential Decision Trees Major Assumption: the world is discrete • fast, easy to understand, no linearity assumptions • ‘human time’ required, unbalanced and/or large trees
  • 11. 11  RedPoint Global Inc. 2015 Confidential Standard Linear Models Assumption: the world is linear • the real world really isn’t linear • all errors are not all equal • easy to get misleading results ? ! Which line is best?
  • 12. 12  RedPoint Global Inc. 2015 Confidential Generalized ‘Non-Linear’ Models Assumptions • underlying functional mapping is known • all errors are equal • data is ‘well-conditioned’ • ‘standard’ error distribution • Polynomials • Exponentials (e.g., Gaussian, Poisson) • Piece-wise linear
  • 13. 13  RedPoint Global Inc. 2015 Confidential Non-Linear Models Assumption: data is representative • ‘universal’ modeling tools • fast execution • no linearity assumptions • lots of parameters, many techniques • difficult to explain Artificial Neural Network
  • 14. 14  RedPoint Global Inc. 2015 Confidential User Story: Predict Retention / Attrition Historical Behavioral Data Customer Rating Retention Customer Name Loyalty Member Days Since Last Purchase Immediate Relatives Household Children Customer ID Latest Purchase Price Latest Purchase Item ID Region Code Customer Capture Method Customer Contact Code Domicile 1 1 Allen, Geraldine yes 29 0 2 24160 211.39 B5 MW 2 6 St Louis, MO 1 1 Anderson, Harry no 48 0 3 19952 26.55 E12 NE 3 New York, NY 1 1 Andrews, Cynthia yes 63 1 0 13502 77.95 D7 NE 10 6 Hudson, NY 1 0 Andrews, Thomas Jr no 39 0 0 112050 0 A36 SW Los Angeles, CA 1 1 Appleton, Mary yes 53 2 3 11769 51.49 C101 NE D Bayside, Queens, NY 1 0 Ashbury, Jeffrey no 47 1 0 PC 17757 29.99 C62 C64 NE 124 New York, NY 1 1 Aston, Mrs. yes 18 1 0 PC 17757 29.99 C62 C64 NE 4 New York, NY 1 1 Barber, Ellen yes 26 0 2 19877 78.85 S 6 1 1 Barkley, Henry no 80 0 0 27042 30 A23 NE B Yorktown, PA 1 0 Baumann, David no 0 0 PC 17318 25.99 NE New York, NY 1 1 Bazzeno, Alice yes 32 0 1 11813 76.95 D15 C 8 34 1 0 Beattie, Mr. Samuel no 36 0 0 13050 75.29 C6 C A 11 Winnipeg, MN 1 1 Beckworth, June yes 47 1 1 11751 52.49 D35 NE 5 New York, NY 1 1 Behr, John no 26 0 0 111369 30 C148 NE 5 New York, NY 1 1 Biden, Roseanne yes 42 0 0 PC 17757 127.99 C 4 1 1 Bird, Ellen yes 29 0 0 PC 17483 18.95 C97 S 8 1 0 Birnbaum, Jason no 25 0 0 13905 26 C 148 San Francisco, CA
  • 15. 15  RedPoint Global Inc. 2015 Confidential User Story: Predict Customer Retention / Attrition Machine Learning Processing Chain - Training
  • 16. 16  RedPoint Global Inc. 2015 Confidential User Story: Predict Retention / Attrition Machine Learning Processing Chain - Prediction Reward predicted ‘retainees’ with targeted product offerings Give potential attrition customers special incentives to stay with the business
  • 17. 17  RedPoint Global Inc. 2015 Confidential User Story: Accurate vs. Useful Prediction Sparse data + Least-Squares (Linear) Classifier • Task: predict chance of purchasing a sundry item • Result: ‘best’ model always predicts “none” • Analysis: LS algorithm assumes all errors are equal Bread Cake & Pie Chocolate Coffee Cookie Diesel Juice & Smoothies Lubricants Milk Other Bakery Premium Sandwich Snack Tea Total Transaction Total Revenue 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3000 0 0 0 0 0 3 0 0 0 0 0 0 0 0 3 2000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 1800 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 4800 0 0 0 2 0 0 0 0 0 0 0 0 0 0 2 100 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1828 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 16460 0 0 0 0 0 2 0 0 0 0 0 0 0 0 2 1000 0 0 0 0 0 2 0 0 0 0 0 0 0 0 2 1500 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 4600 0 0 0 0 0 11 0 0 0 0 0 0 0 0 11 19381.5 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1860 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 9838.82 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 11000 0 0 0 0 0 5 0 0 0 0 0 0 0 0 19 18225 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 500 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 800 0 0 0 0 0 0 0 0 0 0 0 1 0 0 7 7990 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 3820 0 0 0 0 0 1 0 0 0 0 0 0 0 0 55 43230
  • 18. 18  RedPoint Global Inc. 2015 Confidential Clustering/Segmentation – group think Collaborative Filtering Relationship Matrix
  • 19. 19  RedPoint Global Inc. 2015 Confidential Personalization – not really !=
  • 20. 20  RedPoint Global Inc. 2015 Confidential Clustering/Segmentation Similarity? Customer Browser Gender Age Sector Income Sector Married Children Homeowner Recent Baby Clothes Purchase George IE9 M 0 A N 0 1 N Carol Chrome F 1 B Y 1 0 Y Mary IE9 F 0 A N 1 0 Y Dist(George,Carol) = 8 Dist(George,Mary) = 4 Dist(Carol,Mary) = 4 Can you afford to target (George,Mary) the same way as (Carol,Mary) ?
  • 21. 21  RedPoint Global Inc. 2015 Confidential Clustering/Segmentation Basic Question – which one describes the data the best? Raw data How many clusters are there ? Two Clusters Four Clusters Six Clusters
  • 22. 22  RedPoint Global Inc. 2015 Confidential Clustering/Segmentation with Statistics • relatively simple • data distribution assumptions • initialization dependencies 0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100 Raw Data 0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100 Ellipsoidal Clustering 0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100 K-Means Clustering
  • 23. 23  RedPoint Global Inc. 2015 Confidential Clustering/Segmentation – data driven • let the data speak for itself • multiple data projection ‘views’ • important boundary relationships (“swing voters”) Customer Demographics
  • 24. 24  RedPoint Global Inc. 2015 Confidential User Story: Clustering / Segmentation ML Clustering - Training ML Clustering – Processing New Data
  • 25. 25  RedPoint Global Inc. 2015 Confidential Model Selection – how to choose? • Basic Model Type (prediction or segmentation) • inputs + correlated outputs • inputs only? • Basic Questions: • what to use for my problem? • parameters? • is this the best choice? • could I do better, and how?
  • 26. 26  RedPoint Global Inc. 2015 Confidential Optimization – Evolving better solutions • Simulated Evolution • fast, efficient search • always have a solution • arbitrary ‘evaluation’ functions • can start with existing solution(s) • Variation – alter model type, parameters • Assessment – how well does the model work? • Selection – survival of the fittest
  • 27. 27  RedPoint Global Inc. 2015 Confidential Evolutionary Optimization – Evaluation Function • can use any measureable data • no continuity assumptions • no differentiability assumptions • no symmetry assumptions Sunshine Hurricane 20 -1000 5 50 Sunshine Hurricane Prediction Reality (Truth)
  • 28. 28  RedPoint Global Inc. 2015 Confidential User Story: Optimizing Classification Models Task: Predict Retention/Attrition 62.00 70.2 72.3 73.4 75.2 34.8 28.8 24.5 22.1 20.9 0.00 20.00 40.00 60.00 80.00 100.00 0 1 2 3 4 5 6 Performance Generation Model Performance Optimization Classification Accuracy Test Set Error (RMS) 17 Potential input features (customer demographics) 2 outputs (retention/attrition) 1300 Training Samples (50 – 50, A / B Split) 1300 Test Samples ( naïve test data )
  • 29. 29  RedPoint Global Inc. 2015 Confidential Use Case – Fully Adaptive Feedback (Next Best Offer) DB Historical User Behavior (stimulus/response) Train / Update Model Non-Adaptive (Fixed) Mode Randomized A/B/C Offer Selection Adaptive ML Mode ML Prediction Offer Selection Operation (Trigger) Ad / Offer (stimulus) Feedback Cycle
  • 30. 30  RedPoint Global Inc. 2015 Confidential Five Keys to Successful Machine Learning • Let the data speak for itself – don’t force fit your models • Remember, all errors are not all equal – use this to your advantage • True learning requires continual adaptation ! • Automate the process with feedback – remove the “man-in-the-loop” • Trust the optimization process – it really works!
  • 31. 31  RedPoint Global Inc. 2015 Confidential Q&A Contact Info Visit : www.redpoint.net Bill Porto Sr. Engineering Analyst RedPoint Global Inc. vwporto@redpoint.net Want More Information about this topic? Fill out your card or go to redpoint.net/hadoopeurope