SlideShare a Scribd company logo
1 of 92
Download to read offline
Numberofcombinations
Number of items in set
8
256
Numberofcombinations
Number of items in set
8 20
256
1,048,576
Numberofcombinations
Number of items in set
8 20 140,000
256
1,048,576
???
Theory Meets Reality
Large Scale Frequent Pattern Mining with Apache Spark in the Real World
Kexin Xie, Architect of Marketing Cloud Einstein
kexin.xie@salesforce.com, @realstraw
Wanderley Liu, Senior Data Science Engineer
wanderley.liu@salesforce.com
Marketing Cloud Einstein Journey Insights
Track the entire consumer journey
Gather online and offline interactions to stitch together a
complete view of the consumer
Discover the optimal path to conversion
Use AI to analyze all journey permutations and
automatically recommend the best channels, offers and
sequences that lead to conversion
Learn how customers are actually interacting with your brand
GA
What is
Frequent Pattern
Mining
Mine Shaft Mural Painting by Frank Wilson
a b c d e
User Items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3
User Items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b 5
a, c 4
a, d 4
a, e 2
... ...
User Items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b 5
a, c 4
a, d 4
a, e 2
... ...
Min Support = 4
User Items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b 5
a, c 4
a, d 4
a, e 2
... ...
Min Support = 4
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b 5
a, c 4
a, d 4
a, e 2
... ...
User Items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b 5
a, c 4
a, d 4
a, e 2
... ...
Min Support = 4
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b 5
a, c 4
a, d 4
a, e 2
... ...
User Items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
L1 Patterns
L2 Patterns
A-priori Principle
A Priori in Berkeley, CA
“All sub-patterns of a frequent pattern are
frequent”
Min Support = 4
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b ?
a, c ?
a, d ?
a, e ?
... ...
Min Support = 4
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b ?
a, c ?
a, d ?
a, e ?
... ...
Min Support = 6
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b ?
a, c ?
a, d ?
a, e ?
... ...
FP-Growth
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
c 6
FP Results
item support
a 8
b 7
c 6
root
a: 8 b: 2
b: 5
c: 3 c: 1
c: 2
a, b, c 3
a, c 1
b, c 2
c 6
FP Results
Header Table
item support
a 8
b 7
c 6
root
a: 8 b: 2
b: 5
c: 3 c: 1
c: 2
a, b, c 3
a, c 1
b, c 2
c 6
FP Results
Header Table
item support
a 8
b 7
c 6
root
a: 8 b: 2
b: 5
c: 3 c: 1
c: 2
a, b, c 3
a, c 1
b, c 2
c 6
FP Results
Header Table
item support
a 8
b 7
c 6
root
a: 8 b: 2
b: 5
c: 3 c: 1
c: 2
a, b, c 3
a, c 1
b, c 2
c 6
FP Results
Header Table
FP-Tree | c
a, b 3
a 1
b 2
item support
b 5
a 4
root
b: 5
a: 4
c 6
FP Results
Header Table
FP-Tree | c
a, b 3
a 1
b 2
item support
b 5
a 4
root
b: 5
a: 4
c 6
FP Results
Header Table
c 6
item support
b 5
a 4
root
b: 5
a: 4
a 4
b 5
a, b 4
FP-Tree | c
FP Results
Header Table
c 6
a, c 4
b, c 5
a, b, c 4
item support
b 5
a 4
root
b: 5
a: 4
a 4
b 5
a, b 4
FP-Tree | c
FP Results
Header Table
Scaling Up
https://www.firestock.ru/strela-na-grafike-arrow-on-the-chart/
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
User Items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
Header Table
Number of rows
Numberofitems
Number of rows
Numberofitems
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
user items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
Header Table
item support
a 8
b 7
c 6
d 5
e 3 a
[a], b
[a, b], c
[a, b, c], d
[a, b, c, d], e
Header Table
user items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3 a
[a], b
[a, b], c
[a, b, c], d
[a, b, c, d], e
u-2 b, c, (d)
u-3 a, c, (d, e)
u-5 a, b, c
u-6 a, b, c, (d)
u-8 a, b, c
u-10 b, c, (e)
Header Table
user items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3 a
[a], b
[a, b], c
[a, b, c], d
[a, b, c, d], e
u-2 b, c, (d)
u-3 a, c, (d, e)
u-5 a, b, c
u-6 a, b, c, (d)
u-8 a, b, c
u-10 b, c, (e)
Header Table
user items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
u-3 a, c, d, e
u-4 a, d, e
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3 a
[a], b
[a, b], c
[a, b, c], d
[a, b, c, d], e
u-2 b, c, d
u-3 a, c, d, (e)
u-4 a, d, (e)
u-6 a, b, c, d
u-9 a, b, d
Header Table
u-3 a, c, d, e
u-4 a, d, e
u-10 b, c, e
user items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
u-2 b, c, (d)
u-3 a, c, (d, e)
u-5 a, b, c
u-6 a, b, c, (d)
u-8 a, b, c
u-10 b, c, (e)
item support
a 8
b 7
c 6
d 5
e 3 a
[a], b
[a, b], c
[a, b, c], d
[a, b, c, d], e
u-1 a, b
u-2 b, (c, d)
u-5 a, b, (c)
u-6 a, b, (c, d)
u-8 a, b, (c)
u-9 a, b, (d)
u-10 b, (c, e)
Header Table
user items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
u-2 b, c, (d)
u-3 a, c, (d, e)
u-5 a, b, c
u-6 a, b, c, (d)
u-8 a, b, c
u-10 b, c, (e)
u-2 b, c, d
u-3 a, c, d, (e)
u-4 a, d, (e)
u-6 a, b, c, d
u-9 a, b, d
u-3 a, c, d, e
u-4 a, d, e
u-10 b, c, e
Number of rows
Numberofitems
u-1 a, b
u-2 b, (c, d)
u-5 a, b, (c)
u-6 a, b, (c, d)
u-8 a, b, (c)
u-9 a, b, (d)
u-10 b, (c, e)
u-2 b, c, (d)
u-3 a, c, (d, e)
u-5 a, b, c
u-6 a, b, c, (d)
u-8 a, b, c
u-10 b, c, (e)
u-2 b, c, d
u-3 a, c, d, (e)
u-4 a, d, (e)
u-6 a, b, c, d
u-9 a, b, d
u-3 a, c, d, e
u-4 a, d, e
u-10 b, c, e
Distribute rows to executors
Build FP-Trees on each node
and mine for patterns
Collect patterns
Build FP-tree header table
Distribute rows to executors
Build FP-Trees on each node
and mine for patterns
Collect patterns
val headerTable = data
.flatMap(_.items.map(_ -> 1L))
.reduceByKey(_ + _)
.filter(isFrequent)
.collect
.sorted
data
.flatMap(filterDataBasedHeaderTable (headerTable))
.groupByKey
.flatMap { case (k, rows) =>
mineForPatternsFor (k, rows)
}
.collect // If necessary
Build FP-tree header table
Minimum support
https://www.maxpixel.net/static/photo/1x/Cogs-Gears-Technical-Wh
eel-Cogwheel-Gearwheel-2279289.jpg
Differential Minimum Support (DMS)
Classify Items Into
Categories
Compute Min Support
Per Category
Run FP with Multiple
Min Supports
COMMON ITEMS
RARE ITEMS
Pattern Frequency Test
CONDITION 1: Pattern Support ≥ Pattern Min Support
Pattern min support is defined as the lowest category minsup, given all items in the pattern
CONDITION 2 - Apriori Principle (Recursive)
If a pattern is frequent, all sub-patterns must be frequent
Condition 1: Pattern Support > Pattern Minimum Support
Pattern Frequency Test
Item Cat Minsup Condition 1
A Common 100k
B Common 100k
C Rare 1k
Pattern Support Minsup Condition 1
A B 80k
A C 4k
B C 3k
A B C 2k
Condition 1: Pattern Support > Pattern Minimum Support
Pattern Frequency Test
Item Cat Minsup Condition 1
A Common 100k
B Common 100k
C Rare 1k
Pattern Support Minsup Condition 1
A B 80k 100k
A C 4k 1k
B C 3k 1k
A B C 2k 1k
Condition 1: Pattern support > Lowest minsup given all items in the pattern
Pattern Frequency
Item Cat Minsup Condition 1
A Common 100k
B Common 100k
C Rare 1k
Pattern Support Minsup Condition 1
A B 80k 100k
A C 4k 1k
B C 3k 1k
A B C 2k 1k
Condition 2 - A priori principle
Pattern Frequency Test
Item Cat Minsup Condition 1
A Common 100k
B Common 100k
C Rare 1k
Pattern Support Minsup Condition 2
A B 80k 100k
A C 4k 1k
B C 3k 1k
A B C 2k 1k
val fpTreeResults = data
.flatMap(filterDataBasedHeaderTable(headerTable))
.groupByKey
.flatMap { case (k, rows) =>
mineForPatternsFor (k, rows)
}
val catMinsupMap = sc.broadcast( computeCatMinSup (data))
val fpTreeResults = data
.flatMap(filterDataBasedHeaderTable(headerTable))
.groupByKey
.flatMap { case (k, rows) =>
mineForPatternsFor (k, rows, catMinsupMap.value )
}
CONDITION 1
val catMinsupMap = sc.broadcast( computeCatMinSup (data))
val fpTreeResults = data
.flatMap(filterDataBasedHeaderTable(headerTable))
.groupByKey
.flatMap { case (k, rows) =>
mineForPatternsFor (k, rows, catMinsupMap.value )
}
val patternsMap = sc.broadcast(fpTreeResults.keys.collect)
fpTreeResults
.filter { case (pattern, support) =>
pattern.subsets.subsetOf (patternMap.value)
}
CONDITION 1
CONDITION 2
Not the end of the story ...
https://w-dog.net/wallpaper/nature-night-star-tree-trees-stars-background-wal
lpaper-widescreen-full-screen-hd-wallpapers-fullscreen/id/308950/
Low Level Optimization
• Handled case where array length > Integer.MAX_VALUE
Result Set Compaction
• Remove redundant and noisy result sets
• Very efficient compaction - 95% without loss of information
Result Set Ranking
• Score patterns with multiple criteria
Items with Feature Set
• Not only which combinations work best, but what makes them work best
• Well received feature, direct feedback on strategy
Theory Meets Reality—Large Scale Frequent Pattern Mining with Apache Spark in the Real World with Kexin Xie and Wanderley Liu
Theory Meets Reality—Large Scale Frequent Pattern Mining with Apache Spark in the Real World with Kexin Xie and Wanderley Liu

More Related Content

More from Databricks

Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityDatabricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueDatabricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesDatabricks
 
Importance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowImportance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowDatabricks
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 

More from Databricks (20)

Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot Instances
 
Importance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowImportance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLow
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 

Recently uploaded

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 

Recently uploaded (20)

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

Theory Meets Reality—Large Scale Frequent Pattern Mining with Apache Spark in the Real World with Kexin Xie and Wanderley Liu

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 10. Numberofcombinations Number of items in set 8 20 256 1,048,576
  • 11. Numberofcombinations Number of items in set 8 20 140,000 256 1,048,576 ???
  • 12.
  • 13. Theory Meets Reality Large Scale Frequent Pattern Mining with Apache Spark in the Real World Kexin Xie, Architect of Marketing Cloud Einstein kexin.xie@salesforce.com, @realstraw Wanderley Liu, Senior Data Science Engineer wanderley.liu@salesforce.com
  • 14. Marketing Cloud Einstein Journey Insights Track the entire consumer journey Gather online and offline interactions to stitch together a complete view of the consumer Discover the optimal path to conversion Use AI to analyze all journey permutations and automatically recommend the best channels, offers and sequences that lead to conversion Learn how customers are actually interacting with your brand GA
  • 15. What is Frequent Pattern Mining Mine Shaft Mural Painting by Frank Wilson
  • 16.
  • 17. a b c d e
  • 18. User Items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e
  • 19. item support a 8 b 7 c 6 d 5 e 3 User Items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e
  • 20. item support a 8 b 7 c 6 d 5 e 3 item support a, b 5 a, c 4 a, d 4 a, e 2 ... ... User Items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e
  • 21. item support a 8 b 7 c 6 d 5 e 3 item support a, b 5 a, c 4 a, d 4 a, e 2 ... ... Min Support = 4 User Items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e
  • 22. item support a 8 b 7 c 6 d 5 e 3 item support a, b 5 a, c 4 a, d 4 a, e 2 ... ... Min Support = 4 item support a 8 b 7 c 6 d 5 e 3 item support a, b 5 a, c 4 a, d 4 a, e 2 ... ... User Items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e
  • 23. item support a 8 b 7 c 6 d 5 e 3 item support a, b 5 a, c 4 a, d 4 a, e 2 ... ... Min Support = 4 item support a 8 b 7 c 6 d 5 e 3 item support a, b 5 a, c 4 a, d 4 a, e 2 ... ... User Items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e L1 Patterns L2 Patterns
  • 24. A-priori Principle A Priori in Berkeley, CA “All sub-patterns of a frequent pattern are frequent”
  • 25. Min Support = 4 item support a 8 b 7 c 6 d 5 e 3 item support a, b ? a, c ? a, d ? a, e ? ... ...
  • 26. Min Support = 4 item support a 8 b 7 c 6 d 5 e 3 item support a, b ? a, c ? a, d ? a, e ? ... ... Min Support = 6 item support a 8 b 7 c 6 d 5 e 3 item support a, b ? a, c ? a, d ? a, e ? ... ...
  • 28. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 29. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 30. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 31. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 32.
  • 33. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 34.
  • 35. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 37. item support a 8 b 7 c 6 root a: 8 b: 2 b: 5 c: 3 c: 1 c: 2 a, b, c 3 a, c 1 b, c 2 c 6 FP Results Header Table
  • 38. item support a 8 b 7 c 6 root a: 8 b: 2 b: 5 c: 3 c: 1 c: 2 a, b, c 3 a, c 1 b, c 2 c 6 FP Results Header Table
  • 39. item support a 8 b 7 c 6 root a: 8 b: 2 b: 5 c: 3 c: 1 c: 2 a, b, c 3 a, c 1 b, c 2 c 6 FP Results Header Table
  • 40. item support a 8 b 7 c 6 root a: 8 b: 2 b: 5 c: 3 c: 1 c: 2 a, b, c 3 a, c 1 b, c 2 c 6 FP Results Header Table
  • 41. FP-Tree | c a, b 3 a 1 b 2 item support b 5 a 4 root b: 5 a: 4 c 6 FP Results Header Table
  • 42. FP-Tree | c a, b 3 a 1 b 2 item support b 5 a 4 root b: 5 a: 4 c 6 FP Results Header Table
  • 43.
  • 44. c 6 item support b 5 a 4 root b: 5 a: 4 a 4 b 5 a, b 4 FP-Tree | c FP Results Header Table
  • 45. c 6 a, c 4 b, c 5 a, b, c 4 item support b 5 a 4 root b: 5 a: 4 a 4 b 5 a, b 4 FP-Tree | c FP Results Header Table
  • 47.
  • 48. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 User Items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e Header Table
  • 51. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 52. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 53. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 54. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 55. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 56. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 57. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 58. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 59. item support a 8 b 7 c 6 d 5 e 3 user items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e Header Table
  • 60. item support a 8 b 7 c 6 d 5 e 3 a [a], b [a, b], c [a, b, c], d [a, b, c, d], e Header Table user items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e
  • 61. item support a 8 b 7 c 6 d 5 e 3 a [a], b [a, b], c [a, b, c], d [a, b, c, d], e u-2 b, c, (d) u-3 a, c, (d, e) u-5 a, b, c u-6 a, b, c, (d) u-8 a, b, c u-10 b, c, (e) Header Table user items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e
  • 62. item support a 8 b 7 c 6 d 5 e 3 a [a], b [a, b], c [a, b, c], d [a, b, c, d], e u-2 b, c, (d) u-3 a, c, (d, e) u-5 a, b, c u-6 a, b, c, (d) u-8 a, b, c u-10 b, c, (e) Header Table user items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e u-3 a, c, d, e u-4 a, d, e u-10 b, c, e
  • 63. item support a 8 b 7 c 6 d 5 e 3 a [a], b [a, b], c [a, b, c], d [a, b, c, d], e u-2 b, c, d u-3 a, c, d, (e) u-4 a, d, (e) u-6 a, b, c, d u-9 a, b, d Header Table u-3 a, c, d, e u-4 a, d, e u-10 b, c, e user items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e u-2 b, c, (d) u-3 a, c, (d, e) u-5 a, b, c u-6 a, b, c, (d) u-8 a, b, c u-10 b, c, (e)
  • 64. item support a 8 b 7 c 6 d 5 e 3 a [a], b [a, b], c [a, b, c], d [a, b, c, d], e u-1 a, b u-2 b, (c, d) u-5 a, b, (c) u-6 a, b, (c, d) u-8 a, b, (c) u-9 a, b, (d) u-10 b, (c, e) Header Table user items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e u-2 b, c, (d) u-3 a, c, (d, e) u-5 a, b, c u-6 a, b, c, (d) u-8 a, b, c u-10 b, c, (e) u-2 b, c, d u-3 a, c, d, (e) u-4 a, d, (e) u-6 a, b, c, d u-9 a, b, d u-3 a, c, d, e u-4 a, d, e u-10 b, c, e
  • 65. Number of rows Numberofitems u-1 a, b u-2 b, (c, d) u-5 a, b, (c) u-6 a, b, (c, d) u-8 a, b, (c) u-9 a, b, (d) u-10 b, (c, e) u-2 b, c, (d) u-3 a, c, (d, e) u-5 a, b, c u-6 a, b, c, (d) u-8 a, b, c u-10 b, c, (e) u-2 b, c, d u-3 a, c, d, (e) u-4 a, d, (e) u-6 a, b, c, d u-9 a, b, d u-3 a, c, d, e u-4 a, d, e u-10 b, c, e
  • 66. Distribute rows to executors Build FP-Trees on each node and mine for patterns Collect patterns Build FP-tree header table
  • 67. Distribute rows to executors Build FP-Trees on each node and mine for patterns Collect patterns val headerTable = data .flatMap(_.items.map(_ -> 1L)) .reduceByKey(_ + _) .filter(isFrequent) .collect .sorted data .flatMap(filterDataBasedHeaderTable (headerTable)) .groupByKey .flatMap { case (k, rows) => mineForPatternsFor (k, rows) } .collect // If necessary Build FP-tree header table
  • 68.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74. Differential Minimum Support (DMS) Classify Items Into Categories Compute Min Support Per Category Run FP with Multiple Min Supports
  • 75.
  • 77.
  • 78.
  • 79.
  • 80. Pattern Frequency Test CONDITION 1: Pattern Support ≥ Pattern Min Support Pattern min support is defined as the lowest category minsup, given all items in the pattern CONDITION 2 - Apriori Principle (Recursive) If a pattern is frequent, all sub-patterns must be frequent
  • 81. Condition 1: Pattern Support > Pattern Minimum Support Pattern Frequency Test Item Cat Minsup Condition 1 A Common 100k B Common 100k C Rare 1k Pattern Support Minsup Condition 1 A B 80k A C 4k B C 3k A B C 2k
  • 82. Condition 1: Pattern Support > Pattern Minimum Support Pattern Frequency Test Item Cat Minsup Condition 1 A Common 100k B Common 100k C Rare 1k Pattern Support Minsup Condition 1 A B 80k 100k A C 4k 1k B C 3k 1k A B C 2k 1k
  • 83. Condition 1: Pattern support > Lowest minsup given all items in the pattern Pattern Frequency Item Cat Minsup Condition 1 A Common 100k B Common 100k C Rare 1k Pattern Support Minsup Condition 1 A B 80k 100k A C 4k 1k B C 3k 1k A B C 2k 1k
  • 84. Condition 2 - A priori principle Pattern Frequency Test Item Cat Minsup Condition 1 A Common 100k B Common 100k C Rare 1k Pattern Support Minsup Condition 2 A B 80k 100k A C 4k 1k B C 3k 1k A B C 2k 1k
  • 85.
  • 86. val fpTreeResults = data .flatMap(filterDataBasedHeaderTable(headerTable)) .groupByKey .flatMap { case (k, rows) => mineForPatternsFor (k, rows) }
  • 87. val catMinsupMap = sc.broadcast( computeCatMinSup (data)) val fpTreeResults = data .flatMap(filterDataBasedHeaderTable(headerTable)) .groupByKey .flatMap { case (k, rows) => mineForPatternsFor (k, rows, catMinsupMap.value ) } CONDITION 1
  • 88. val catMinsupMap = sc.broadcast( computeCatMinSup (data)) val fpTreeResults = data .flatMap(filterDataBasedHeaderTable(headerTable)) .groupByKey .flatMap { case (k, rows) => mineForPatternsFor (k, rows, catMinsupMap.value ) } val patternsMap = sc.broadcast(fpTreeResults.keys.collect) fpTreeResults .filter { case (pattern, support) => pattern.subsets.subsetOf (patternMap.value) } CONDITION 1 CONDITION 2
  • 89. Not the end of the story ... https://w-dog.net/wallpaper/nature-night-star-tree-trees-stars-background-wal lpaper-widescreen-full-screen-hd-wallpapers-fullscreen/id/308950/
  • 90. Low Level Optimization • Handled case where array length > Integer.MAX_VALUE Result Set Compaction • Remove redundant and noisy result sets • Very efficient compaction - 95% without loss of information Result Set Ranking • Score patterns with multiple criteria Items with Feature Set • Not only which combinations work best, but what makes them work best • Well received feature, direct feedback on strategy