Submit Search
Upload
Spark machine learning predicting customer churn
•
5 likes
•
1,306 views
Carol McDonald
Follow
Using Spark Machine learning to predict customer churn
Read less
Read more
Software
Report
Share
Report
Share
1 of 58
Download now
Download to read offline
Recommended
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Carol McDonald
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Carol McDonald
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
Carol McDonald
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Carol McDonald
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Carol McDonald
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
Carol McDonald
Applying Machine Learning to Live Patient Data
Applying Machine Learning to Live Patient Data
Carol McDonald
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
Carol McDonald
Recommended
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Carol McDonald
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Carol McDonald
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
Carol McDonald
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Carol McDonald
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Carol McDonald
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
Carol McDonald
Applying Machine Learning to Live Patient Data
Applying Machine Learning to Live Patient Data
Carol McDonald
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
Carol McDonald
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
Carol McDonald
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
Carol McDonald
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Carol McDonald
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
Carol McDonald
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Carol McDonald
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
Carol McDonald
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Carol McDonald
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Carol McDonald
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
Introduction to machine learning with GPUs
Introduction to machine learning with GPUs
Carol McDonald
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
MapR Technologies
Real time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystem
Chris Huang
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
Carol McDonald
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
Mathieu Dumoulin
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
MapR Technologies
Scaling big-data-mining-infra2
Scaling big-data-mining-infra2
Chris Huang
Approaching real-time-hadoop
Approaching real-time-hadoop
Chris Huang
When Streaming Becomes Strategic
When Streaming Becomes Strategic
MapR Technologies
Spark graphx
Spark graphx
Carol McDonald
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
MapR Technologies
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
Kevin Lee
More Related Content
What's hot
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
Carol McDonald
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
Carol McDonald
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Carol McDonald
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
Carol McDonald
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Carol McDonald
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
Carol McDonald
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Carol McDonald
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Carol McDonald
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
Introduction to machine learning with GPUs
Introduction to machine learning with GPUs
Carol McDonald
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
MapR Technologies
Real time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystem
Chris Huang
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
Carol McDonald
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
Mathieu Dumoulin
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
MapR Technologies
Scaling big-data-mining-infra2
Scaling big-data-mining-infra2
Chris Huang
Approaching real-time-hadoop
Approaching real-time-hadoop
Chris Huang
When Streaming Becomes Strategic
When Streaming Becomes Strategic
MapR Technologies
Spark graphx
Spark graphx
Carol McDonald
What's hot
(20)
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
Introduction to machine learning with GPUs
Introduction to machine learning with GPUs
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Real time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystem
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
Scaling big-data-mining-infra2
Scaling big-data-mining-infra2
Approaching real-time-hadoop
Approaching real-time-hadoop
When Streaming Becomes Strategic
When Streaming Becomes Strategic
Spark graphx
Spark graphx
Similar to Spark machine learning predicting customer churn
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
MapR Technologies
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
Kevin Lee
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital Markets
Amazon Web Services
Vi sem
Vi sem
Lavesh Kaushik
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
Vadlamudi Saketh
big-data-anallytics.pptx
big-data-anallytics.pptx
Sangamesh Kalyan
Machine Learning With ML.NET
Machine Learning With ML.NET
Dev Raj Gautam
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analytics
imtiaz khan
Data Mining 101
Data Mining 101
Ali Septiandri
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
MapR Technologies
7 inspiring Big Data factories in AWS
7 inspiring Big Data factories in AWS
Sebastien BONNOTTE
Using Machine Learning in the delivery of ads
Using Machine Learning in the delivery of ads
Ruth Garcia Gavilanes
Big Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with Redis
Matt Stubbs
Alpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold Reinwald
Chester Chen
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
Institute of Contemporary Sciences
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
KamleshKumar394
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
Ian Downard
Data Mining - The Big Picture!
Data Mining - The Big Picture!
Khalid Salama
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
Similar to Spark machine learning predicting customer churn
(20)
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital Markets
Vi sem
Vi sem
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
big-data-anallytics.pptx
big-data-anallytics.pptx
Machine Learning With ML.NET
Machine Learning With ML.NET
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analytics
Data Mining 101
Data Mining 101
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
7 inspiring Big Data factories in AWS
7 inspiring Big Data factories in AWS
Using Machine Learning in the delivery of ads
Using Machine Learning in the delivery of ads
Big Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with Redis
Alpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold Reinwald
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
Data Mining - The Big Picture!
Data Mining - The Big Picture!
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
More from Carol McDonald
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Carol McDonald
Apache Spark Machine Learning
Apache Spark Machine Learning
Carol McDonald
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache HBase
Carol McDonald
Apache Spark streaming and HBase
Apache Spark streaming and HBase
Carol McDonald
Machine Learning Recommendations with Spark
Machine Learning Recommendations with Spark
Carol McDonald
Apache Spark Overview
Apache Spark Overview
Carol McDonald
Introduction to Spark
Introduction to Spark
Carol McDonald
CU9411MW.DOC
CU9411MW.DOC
Carol McDonald
Getting started with HBase
Getting started with HBase
Carol McDonald
Introduction to Spark on Hadoop
Introduction to Spark on Hadoop
Carol McDonald
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
Carol McDonald
More from Carol McDonald
(11)
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Apache Spark Machine Learning
Apache Spark Machine Learning
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache HBase
Apache Spark streaming and HBase
Apache Spark streaming and HBase
Machine Learning Recommendations with Spark
Machine Learning Recommendations with Spark
Apache Spark Overview
Apache Spark Overview
Introduction to Spark
Introduction to Spark
CU9411MW.DOC
CU9411MW.DOC
Getting started with HBase
Getting started with HBase
Introduction to Spark on Hadoop
Introduction to Spark on Hadoop
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
Recently uploaded
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
Willy Marroquin (WillyDevNET)
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
MyIntelliSource, Inc.
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
Andolasoft Inc
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
SolGuruz
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
harshavardhanraghave
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
ABDERRAOUF MEHENNI
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
Jhone kinadey
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
bodapatigopi8531
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Alberto González Trastoy
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
kellynguyen01
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Steffen Staab
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
panagenda
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
Wave PLM
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
ComplianceQuest1
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
shikhaohhpro
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
kalichargn70th171
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
Recently uploaded
(20)
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
Spark machine learning predicting customer churn
1.
© 2017 MapR
Technologies Spark Machine Learning Carol McDonald @caroljmcdonald
2.
© 2017 MapR
Technologies Agenda • Introduction to Machine Learning Techniques – Classification – Clustering • Use Decision Tree to Predict Customer Churn
3.
© 2017 MapR
Technologies What is Machine Learning? Data Build ModelTrain Algorithm Finds patterns New Data Use Model (prediction function) Predictions Contains patterns Recognizes patterns
4.
© 2017 MapR
Technologies Examples of ML Algorithms Supervised • Classification – Naïve Bayes – SVM – Random Decision Forests • Regression – Linear – Logistic Machine Learning Unsupervised • Clustering – K-means • Dimensionality reduction – Principal Component Analysis – SVD
5.
© 2017 MapR
Technologies Supervised Algorithms use labeled data Data features Build Model New Data features Predict Use Model
6.
© 2017 MapR
Technologies Supervised Machine Learning: Classification & Regression Classification Identifies category for item
7.
© 2017 MapR
Technologies Classification: Definition Form of ML that: • Identifies which category an item belongs to • Uses supervised learning algorithms – Data is labeled Sentiment
8.
© 2017 MapR
Technologies If it Walks/Swims/Quacks Like a Duck …… Then It Must Be a Duck swims walks quacks Features: walks quacks swims Features:
9.
© 2017 MapR
Technologies Car Insurance Fraud Example • What are we trying to predict? – This is the Label or Target outcome: – The amount of Fraud • What are the “if questions” or properties we can use to predict? – These are the Features: – The claim Amount
10.
© 2017 MapR
Technologies Label: Amount of Fraud Y X Feature: claimed amount Data point: fraud amount, claimed amount AmntFraud = intercept + coeff * claimedAmnt Car Insurance Fraud Regression Example
11.
© 2017 MapR
Technologies Credit Card Fraud Example • What are we trying to predict? – This is the Label: – The probability of Fraud • What are the “if questions” or properties we can use to predict? – These are the Features: – transaction amount, type of merchant, distance from and time since last transaction
12.
© 2017 MapR
Technologies Label Probabilty of Fraud 1 X Features: trans amount, type of store, Time Location difference last trans. Fraud 0 Not Fraud .5 Credit Card Fraud Logistic Regression Example
13.
© 2017 MapR
Technologies Supervised Learning: Classification & Regression • Classification: – identifies which category (eg fraud or not fraud) • Linear Regression: – predicts a value (eg amount of fraud) • Logistic Regression: – predicts a probability (eg probability of fraud)
14.
© 2017 MapR
Technologies Examples of ML Algorithms Machine Learning Unsupervised • Clustering – K-means • Dimensionality reduction – Principal Component Analysis – SVD Supervised • Classification – Naïve Bayes – SVM – Random Decision Forests • Regression – Linear – Logistic
15.
© 2017 MapR
Technologies Unsupervised Algorithms use Unlabeled data Customer GroupsBuild ModelTrain Algorithm Finds patterns New Customer Purchase Data Use Model (prediction function) Predict Group Contains patterns Recognizes patterns Customer purchase data
16.
© 2017 MapR
Technologies Unsupervised Machine Learning: Clustering Clustering group news articles into different categories
17.
© 2017 MapR
Technologies Clustering: Definition • Unsupervised learning task • Groups objects into clusters of high similarity
18.
© 2017 MapR
Technologies Clustering: Definition • Unsupervised learning task • Groups objects into clusters of high similarity – Search results grouping – Grouping of customers – Anomaly detection – Text categorization
19.
© 2017 MapR
Technologies Clustering: Example • Group similar objects
20.
© 2017 MapR
Technologies Clustering: Example • Group similar objects • Use MLlib K-means algorithm 1. Initialize coordinates to center of clusters (centroid) x x x x x
21.
© 2017 MapR
Technologies Clustering: Example • Group similar objects • Use MLlib K-means algorithm 1. Initialize coordinates to center of clusters (centroid) 2. Assign all points to nearest centroid x x x x x
22.
© 2017 MapR
Technologies Clustering: Example • Group similar objects • Use MLlib K-means algorithm 1. Initialize coordinates to center of clusters (centroid) 2. Assign all points to nearest centroid 3. Update centroids to center of points x x x x x
23.
© 2017 MapR
Technologies Clustering: Example • Group similar objects • Use MLlib K-means algorithm 1. Initialize coordinates to center of clusters (centroid) 2. Assign all points to nearest centroid 3. Update centroids to center of points 4. Repeat until conditions met x x x x x
24.
© 2017 MapR
Technologies Predict Churn
25.
© 2017 MapR
Technologies ML Discovery Model Building Model Training/ Building Training Set Test Model Predictions Test Set Evaluate Results Historical Data Deployed Model Predictions Data Discovery, Model Creation Production Feature Extraction Feature Extraction New Data Customer Data Call Center Records Web Clickstream Server Logs ● Churn Modelling
26.
© 2017 MapR
Technologies Telecom Customer Churn Data • State: string • Account length: integer • Area code: integer • International plan: string • Voice mail plan: string • Number vmail messages: integer • Total day minutes: double • Total day calls: integer • Total day charge: double • Total eve minutes: double • Total eve calls: integer • Total eve charge: double • Total night minutes: double • Total night calls: integer • Total night charge: double • Total intl minutes: double • Total intl calls: integer • Total intl charge: double • Customer service calls: integer
27.
© 2017 MapR
Technologies Customer Churn Example • What are we trying to predict? – This is the Label: – Did the customer churn? True or False • What are the “if questions” or properties we can use to predict? – These are the Features: – Number of Customer service calls, Total day minutes …
28.
© 2017 MapR
Technologies Decision Trees • Decision Tree for Classification prediction • Represents tree with nodes • IF THEN ELSE questions using features at each node • Answers branch to child nodes If the number of customer service calls < 3 If the total day minutes > 200 Churned: T If the total day minutes < 200 Churned: F T Churned: T Churned: F F FF TT
29.
© 2017 MapR
Technologies Example Decision Tree
30.
© 2017 MapR
Technologies Spark ML workflow
31.
© 2017 MapR
Technologies Spark ML workflow with a Pipeline Pipeline Estimator Extract Features Load Data Train Model Estimator Data frame Transformer Cross Validate Pipeline Model TransformerTest Data frame Evaluate fit Train Load Data Evaluator Predict With model Extract Features Evaluator transform
32.
© 2017 MapR
Technologies Zeppelin Notebook with Spark Data Engineer Data Scientist
33.
© 2017 MapR
Technologies Load the data into a Dataframe: Define the Schema case class Account(state: String, len: Integer, acode: String, intlplan: String, vplan: String, numvmail: Double, tdmins: Double, tdcalls: Double, tdcharge: Double, temins: Double, tecalls: Double, techarge: Double, tnmins: Double, tncalls: Double, tncharge: Double, timins: Double, ticalls: Double, ticharge: Double, numcs: Double, churn: String) Input CSV File sample: KS,128,415,No,Yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False OH,107,415,No,Yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
34.
© 2017 MapR
Technologies Data Frame Load data Load the data into a Dataset val train: Dataset[Account] = spark.read.option("inferSchema", "false") .schema(schema).csv("/user/user01/data/churn-bigml-80.csv").as[Account]
35.
© 2017 MapR
Technologies Dataset merged with Dataframe in Spark 2.0, DataFrame APIs merged with Datasets APIs
36.
© 2017 MapR
Technologies Extract the Features Image reference O’Reilly Learning Spark + + ̶+ ̶ ̶ Feature Vectors and Label Model Featurization Training Model Evaluation Best Model Label: Churned=T Features: Number customer Service calls Number day minutes Training Data Label: Churned=F Features: Number customer Service calls Number day minutes + + ̶+ ̶ ̶ + + ̶+ ̶ ̶ + + ̶+ ̶ ̶ + + ̶+ ̶ ̶
37.
© 2017 MapR
Technologies Data Frame Add column Use StringIndexer to map Strings to Numbers val ipindexer = new StringIndexer() .setInputCol("intlplan") .setOutputCol("iplanIndex”) Data Frame
38.
© 2017 MapR
Technologies Data Frame Add column Use StringIndexer to map churn True False to Numbers Val labelindexer = new StringIndexer() .setInputCol(”churn") .setOutputCol(”label”) Data Frame
39.
© 2017 MapR
Technologies Data Frame Load data Add column DataFrame + Features Use VectorAssembler to put features in vector column val featureCols = Array(”temins", "iplanIndex", "tdmins", "tdcalls”…) val assembler = new VectorAssembler() .setInputCols(featureCols) .setOutputCol("features")
40.
© 2017 MapR
Technologies Data Frame Load data transform Estimator val dTree = new DecisionTreeClassifier() .setLabelCol("label") .setFeaturesCol("features") Create DecisionTree Estimator, Set Label and Features DataFrame + Features
41.
© 2017 MapR
Technologies val pipeline = new Pipeline() .setStages(Array(ipindexer, labelindexer, assembler, dTree)) Put Feature Transformers and Estimator in Pipeline Pipeline ipIndexer feature transform assembler Dtree estimatorlabelindexer feature transform assemble Features Produce model
42.
© 2017 MapR
Technologies Spark ML workflow with a Pipeline Pipeline Transfomers Load Data estimator Train model Data frame Extract Features evaluator Pipeline Model Test Data frame evaluator Use fitted model Train Load Data fit transform
43.
© 2017 MapR
Technologies K-fold Cross-Validation Process Data Model Training/ Building Training Set Test Model Predictions Test Set data is randomly split into K partition training and test dataset pairs
44.
© 2017 MapR
Technologies K-fold Cross-Validation Process Data Model Training Training Set Test Model Predictions Test Set Train algorithm with training dataset
45.
© 2017 MapR
Technologies ML Cross-Validation Process Data Model Training Set Test Model Predictions Test Set Evaluate the model with the Test Set
46.
© 2017 MapR
Technologies K-fold Cross-Validation Process Data Model Training/ Building Training Set Test Model Predictions Test Set Train/Test loop K times Repeat K times select the Model produced by the best-performing set of parameters
47.
© 2017 MapR
Technologies Cross Validation transformation estimation pipeline Pipeline Cross Validator evaluatorParameter Grid fit Set up a CrossValidator with: • Parameter grid • Estimator (pipeline) • Evaluator Perform grid search based model selection
48.
© 2017 MapR
Technologies Parameter Tuning with CrossValidator with a Paramgrid CrossValidator • Given: – Estimator – Parameter grid – Evaluator • Find best parameters and model val paramGrid = new ParamGridBuilder() .addGrid(dTree.maxDepth, Array(2,3,4,5,6,7)).build() val evaluator= new BinaryClassificationEvaluator() .setLabelCol("label") .setRawPredictionCol("prediction") val crossval = new CrossValidator() .setEstimator(pipeline) .setEvaluator(evaluator) .setEstimatorParamMaps(paramGrid) .setNumFolds(3)
49.
© 2017 MapR
Technologies val cvModel = crossval.fit(ntrain) Cross Validator fit a model to the data Pipeline Cross Validator evaluatorParameter Grid fit Pipeline Model fit a model to the data with provided parameter grid
50.
© 2017 MapR
Technologies Evaluate the fitted model Pipeline Transfomers Load Data estimator Train model Data frame Extract Features evaluator Pipeline Model Test Data frame evaluator transform Train Load Data Predict With model Extract Features fit
51.
© 2017 MapR
Technologies fitted model Evaluate the Predictions from DecisionTree Estimator Evaluator transform Test features val predictions = cvModel.transform(test) val accuracy = evaluator.evaluate(predictions) evaluate prediction accuracy
52.
© 2017 MapR
Technologies Area under the ROC curve Accuracy is measured by the area under the ROC curve. The area measures correct classifications • An area of 1 represents a perfect test • an area of .5 represents a worthless test
53.
© 2017 MapR
Technologies To Learn More: • Read about and download example code • https://mapr.com/blog/churn-prediction-sparkml/
54.
© 2017 MapR
Technologies To Learn More: • End to End Application for Monitoring Uber Data using Spark ML • https://mapr.com/blog/monitoring-real-time-uber-data-using-spark-machine- learning-streaming-and-kafka-api-part-1/
55.
© 2017 MapR
Technologies To Learn More: • MapR Free ODT http://learn.mapr.com/
56.
© 2017 MapR
Technologies For Q&A : • https://community.mapr.com/ • https://community.mapr.com/community/answers/pages/qa
57.
© 2017 MapR
Technologies Open Source Engines & Tools Commercial Engines & Applications Enterprise-Grade Platform Services DataProcessing Web-Scale Storage MapR-XD MapR-DB Search and Others Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability MapR Streams Cloud and Managed Services Search and Others UnifiedManagementandMonitoring Search and Others Event StreamingDatabase Custom Apps MapR Converged Data Platform HDFS API POSIX, NFS Kakfa APIHBase API OJAI API
58.
© 2017 MapR
Technologies Q&A ENGAGE WITH US
Download now