SlideShare a Scribd company logo
1 of 77
Matt Gershoff
CEO Conductrics
www.conductrics.com twitter:@mgershoffMatt Gershoff www.conductrics.com Twitter:@mgershoff
Entropy:
An End to the Data
Love Affair
Who am I?
Co-founder of Conductrics software. We help
companies solve their last mile problem for
analytics.
Why me?
Conductrics blends ideas from AB Testing, RL,
Statistics, and Information Theory to generate
human interpretable machine learning models
that target customers with the experiences they
care about. www.conductrics.com
Conductrics Confidential 2
We All
Love Data!
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
But should
we love
data?
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Losing Ticket
Not Surprising
Not Interesting
Not Informative
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Winning Ticket
Surprising
Interesting
Informative
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
If an event is Predictable it is
Not Informative
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Key Idea
If an event is Not Predictable
it is Informative
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Key Idea
Information=?
Matt Gershoff www.conductrics.com Twitter:@mgershoff
But What is Information Really?
Data IS NOT
Information
Matt Gershoff www.conductrics.com Twitter:@mgershoff
Key Idea
Data Is Only Potential
Information
Matt Gershoff www.conductrics.com Twitter:@mgershoff
Key Idea
Source: https://www.thedailybeast.com/claude-shannon-the-juggling-poet-who-gave-us-the-
information-age Photo: Alfred Eisenstaedt/The LIFE Picture Collection/Getty Images
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Studied at MIT
In 1948 while at Bell Labs published
A Mathematical Theory of Communication
Claude Shannon
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Matt Gershoff www.conductrics.com Twitter:@mgershoff
But What is Information Really?
What is
Information
if not Data?
Matt Gershoff www.conductrics.com Twitter:@mgershoff
Matt Gershoff www.conductrics.com Twitter:@mgershoff
But What is Information Really?
Information in data is
equal to the smallest*
possible lossless
encoding
Matt Gershoff www.conductrics.com Twitter:@mgershoff
*smallest on average
The More Predictable
The More Compression
The Less Information
Claude Shannon
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Shannon Entropy is
Measure of
Unpredictability in Data
Claude Shannon
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Entropy is the
Minimum number* of
Bits to fully encode data
* - average number Matt Gershoff www.conductrics.com
Twitter:@mgershoff
* - average number
…or equivalently
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Entropy is the
Minimum number* of
QUESTIONS to identify all
possible values in the data
* - average number Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Matt Gershoff www.conductrics.com Twitter:@mgershoff
But What is Information Really?
Matt Gershoff www.conductrics.com Twitter:@mgershoff
*smallest on average
Matt Gershoff www.conductrics.com Twitter:@mgershoff
But What is Information Really?
Matt Gershoff www.conductrics.com Twitter:@mgershoff
I see you are confused. To
help with intuition, lets play a
game of 20 Questions. What
letter am I thinking of: A; B;
C; or D?
No
No Yes
Yes
No Yes
Is it ‘C’ or ‘D’?
More Than 1? More Than 3?
1st Question
2nd Question
1 2 3 4
First Approach:
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
No
No Yes
Yes
No Yes
More Than 3?
1st Question
2nd Question
A B 3 4
Is it ‘C’ or ‘D’?
Is it ‘B’?
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
No
No Yes
Yes
No Yes
1st Question
2nd Question
A B C D
Is it ‘C’ or ‘D’?
Is it ‘B’? Is it ‘D’?
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
We can ALWAYS Pick the Letter
after 2 Questions
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
What if we swap 0|1 for No|Yes?
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
Each Y|N Question = 1 Bit
A 00
B 01
C 10
D 11
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
Questions Bits
We can identify each Letter
with 2 bits
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
A 0 0
B 0 1
C 1 0
D 1 1
Bit1 Bit2
So the number of bits in this
case is equal to the number of
Y|N Questions
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
Mind Blown
A = 50%
B = 25%
C = 12.5%
D = 12.5%
It gets better. What if I think of the
Letters based on this prior
distribution?
*From http://www.inference.org.uk/mackay/itprnn/
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
Can we now do better, on average,
than 2 Questions, or 2 Bits?
*From http://www.inference.org.uk/mackay/itprnn/
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
I don’t understand. Maybe you
show me how this works.
*From http://www.inference.org.uk/mackay/itprnn/
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
Is it A?
Y N
A
1st Question
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
Is it A?
Y N
A Is it B?
Y N
B
1st Question
2nd Question
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
Is it A?
Y N
A Is it B?
Y N
B Is it C?
Y N
C
1st Question
2nd Question
3rd Question
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
Is it A?
Y N
A Is it B?
Y N
B Is it C?
Y N
C D
1st Question
2nd Question
3rd Question
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
Is it A?
Y N
A Is it B?
Y N
B Is it C?
Y N
C D
1st Question
2nd Question
3rd Question
Data Encoding
A=1
B=01
C=001
D=000
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
In general, the average number of
questions needed is:
෍
1
𝑘
𝑃 𝐿𝑒𝑡𝑡𝑒𝑟i ⋅ Num Questions𝑖
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
Average Number of Questions Needed?
0.5*1 + 0.25*2 + 0.125*3 + 0.125*3 = 1.75
A B C D
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
෍
1
4
𝑃 𝐿𝑒𝑡𝑡𝑒𝑟i ⋅ Num Questions𝑖
Can we on average do better than 2
Questions or bits?
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
YES!!!
Can we on average do better than 2
Questions or bits?
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
How much better depends on the prior
probabilities
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
The more predictable the data the fewer
questions or bits needed
Maximum = 2 Our Case = 1.75 Minimum= 0
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Questions to Bits to Information
𝐌𝐚𝐭𝐡 𝐓𝐢𝐦𝐞
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Information
Entropy
Matt Gershoff www.conductrics.com Twitter:@mgershoff
From Data to Information
Information Content of an individual event,
or value, like a letter
Matt Gershoff www.conductrics.com Twitter:@mgershoff
From Data to Information
− log2 𝑃 𝑥𝑖
Information Content of an individual event,
or value, like a letter
Matt Gershoff www.conductrics.com Twitter:@mgershoff
From Data to Information
Information Content of an Event (𝑥i)
− log2 𝑃 𝑥𝑖
Prob of the Event
Matt Gershoff www.conductrics.com Twitter:@mgershoff
From Data to Information
Information Content of an Event (𝑥i)
− log2 𝑃 𝑥𝑖
Information is measured in Bits!
Matt Gershoff www.conductrics.com Twitter:@mgershoff
Entropy: H(x)
෍
1
𝑘
−𝑃 𝑥i ⋅ log2 𝑃 𝑥𝑖
1) Calculate the Information in each event
2) Take Weighted Average of Information
Matt Gershoff www.conductrics.com Twitter:@mgershoff
෍
1
𝑘
−𝑃 𝑥i ⋅ log2 𝑃 𝑥𝑖
Entropy: H(x)
Matt Gershoff www.conductrics.com Twitter:@mgershoff
෍
1
𝑘
−𝑃 𝑥i ⋅ log2 𝑃 𝑥𝑖
Entropy: H(x)
Matt Gershoff www.conductrics.com Twitter:@mgershoff
Entropy: H(x)
෍
1
𝑘
−𝑃 𝑥i ⋅ log2 𝑃 𝑥𝑖
Hmm… looks suspiciously similar
to the ave number of questions
calculation
Matt Gershoff www.conductrics.com Twitter:@mgershoff
Calculate the Entropy of our Letters
A = 50%
B = 25%
C = 12.5%
D = 12.5%
Entropy: H(x)
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
First Calculate the log2 𝑃 𝑥𝑖 :
A = Log(.5) = -1
B = Log(.25) = -2
C = Log(.125) = -3
D = Log(.125) = -3
Entropy: H(x)
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Entropy: H(x)
෍
1
𝑘
−𝑃 𝑥i ⋅ log2 𝑃 𝑥𝑖
-1*(0.5*-1 + 0.25*-2 + 0.125*-3 + 0.125*-3) =
1.75
A B C D
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Entropy: H(x)
෍
1
𝑘
−𝑃 𝑥i ⋅ log2 𝑃 𝑥𝑖
-1*(0.5*-1 + 0.25*-2 + 0.125*-3 + 0.125*-3) =
1.75
But that’s exactly
the same result!Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Entropy = Number of Bits = Number Questions
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
What can I do with it?
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Information Gain
Turn Prediction Targeting in to
Problem of Reducing Entropy
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Information Gain
Key Idea Behind Decision Trees
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Example: Conductrics Predictive Segments
Information Gain
𝐼 𝑥; 𝑦 = 𝐻 𝑥 − 𝐻(𝑥|𝑦)
Entropy of
Target Variable
Conditional Entropy
of Target given
Feature Y
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Example Data
Convert Windows Mobile
N N N
N N N
N N N
N N N
N N Y
N Y Y
Y N Y
Y Y N
Y Y N
Y Y N
Y Y Y
Y Y Y
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Summary Statistics
Variable Coverage
Conversion
Rate Entropy
Windows Y 50% 83% 0.65
Windows N 50% 17% 0.65
Mobile Y 42% 60% 0.44
Mobile N 58% 43% 0.99
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Information Gain
1-.5*.65+.5*.65 = 0.35
Gain for Windows*
1-.42*.44+.58*.99 = 0.24
Gain for Mobile
Yes No
Yes No
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Note: In this case Windows Yes and No are symmetrical, so results for each are the
same, but this need not be true. You can see this in Mobile – the two are not the same.
Information Gain
Build Tree
Overall Conversion Rate
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Information Gain
Split By Windows
Overall Conversion Rate
WindowsNO Yes
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Decision Trees
Repeat For each Leaf Node
Stop when:
• Node Size is below threshold
• Information gain is below threshold
• Tree Depth reaches defined limit
These are all parameters YOU set to control
overfitting Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Don’t Love
Data!
Love
Information
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
෍
1
𝑘
−𝑃 𝑥i ⋅ log2 𝑃 𝑥𝑖
I
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Thank You!!!
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Psst …much more than Decision
Trees
Information Gain is also Kullback-Leibler Divergence
Matt Gershoff www.conductrics.com
Twitter:@mgershoff
Resources
Information Theory
David Mackay’s Course
Information Theory, Pattern Recognition and Neural Networks
http://www.inference.org.uk/mackay/itprnn/
https://www.youtube.com/channel/UCfoScwn69ekXXWNTN0
CLGXA
Mackay’s Free Online Text Book
http://www.inference.org.uk/mackay/itila/
Decision Tree ID3 Algo
https://en.wikipedia.org/wiki/ID3_algorithm
Matt Gershoff www.conductrics.com
Twitter:@mgershoff

More Related Content

What's hot

지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016Taehoon Kim
 
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Edureka!
 
Reinforcement learning slides
Reinforcement learning slidesReinforcement learning slides
Reinforcement learning slidesOmranHakami
 
WHEN TO REFER A PATIENT FOR ASSISTED REPRODUCTION ( ART ) / IVF BY DR SHASHW...
WHEN TO REFER A PATIENT FOR ASSISTED REPRODUCTION ( ART )  / IVF BY DR SHASHW...WHEN TO REFER A PATIENT FOR ASSISTED REPRODUCTION ( ART )  / IVF BY DR SHASHW...
WHEN TO REFER A PATIENT FOR ASSISTED REPRODUCTION ( ART ) / IVF BY DR SHASHW...DR SHASHWAT JANI
 
Real-world Reinforcement Learning
Real-world Reinforcement LearningReal-world Reinforcement Learning
Real-world Reinforcement LearningMax Pagels
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's TutorialWayne Lee
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchKarel Ha
 
파이썬 로깅, 끝까지 파보면서 내가 배운 것
파이썬 로깅, 끝까지 파보면서 내가 배운 것파이썬 로깅, 끝까지 파보면서 내가 배운 것
파이썬 로깅, 끝까지 파보면서 내가 배운 것Hyun-Tae Hwang
 
The False Discovery Rate: An Overview
The False Discovery Rate: An OverviewThe False Discovery Rate: An Overview
The False Discovery Rate: An OverviewPhilip Anderson
 
What makes a great manager of software engineers?
What makes a great manager of software engineers?What makes a great manager of software engineers?
What makes a great manager of software engineers?ikaliam
 
[PyConKR][2020]이 선 넘으면 침범이야, BEEP!
[PyConKR][2020]이 선 넘으면 침범이야, BEEP![PyConKR][2020]이 선 넘으면 침범이야, BEEP!
[PyConKR][2020]이 선 넘으면 침범이야, BEEP!Ji Hyung Moon
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningMark Chang
 
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22GiacomoBalloccu
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationSangwoo Mo
 

What's hot (20)

지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
 
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
 
Reinforcement learning slides
Reinforcement learning slidesReinforcement learning slides
Reinforcement learning slides
 
WHEN TO REFER A PATIENT FOR ASSISTED REPRODUCTION ( ART ) / IVF BY DR SHASHW...
WHEN TO REFER A PATIENT FOR ASSISTED REPRODUCTION ( ART )  / IVF BY DR SHASHW...WHEN TO REFER A PATIENT FOR ASSISTED REPRODUCTION ( ART )  / IVF BY DR SHASHW...
WHEN TO REFER A PATIENT FOR ASSISTED REPRODUCTION ( ART ) / IVF BY DR SHASHW...
 
Real-world Reinforcement Learning
Real-world Reinforcement LearningReal-world Reinforcement Learning
Real-world Reinforcement Learning
 
eScience SHAP talk
eScience SHAP talkeScience SHAP talk
eScience SHAP talk
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's Tutorial
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
 
파이썬 로깅, 끝까지 파보면서 내가 배운 것
파이썬 로깅, 끝까지 파보면서 내가 배운 것파이썬 로깅, 끝까지 파보면서 내가 배운 것
파이썬 로깅, 끝까지 파보면서 내가 배운 것
 
Machine Learning for dummies!
Machine Learning for dummies!Machine Learning for dummies!
Machine Learning for dummies!
 
The False Discovery Rate: An Overview
The False Discovery Rate: An OverviewThe False Discovery Rate: An Overview
The False Discovery Rate: An Overview
 
What makes a great manager of software engineers?
What makes a great manager of software engineers?What makes a great manager of software engineers?
What makes a great manager of software engineers?
 
Uncertainty in Deep Learning
Uncertainty in Deep LearningUncertainty in Deep Learning
Uncertainty in Deep Learning
 
[PyConKR][2020]이 선 넘으면 침범이야, BEEP!
[PyConKR][2020]이 선 넘으면 침범이야, BEEP![PyConKR][2020]이 선 넘으면 침범이야, BEEP!
[PyConKR][2020]이 선 넘으면 침범이야, BEEP!
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep Learning
 
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 

Recently uploaded

VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...
VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...
VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...aditipandeya
 
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...ChesterYang6
 
Labour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptxLabour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptxelizabethella096
 
Call Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCRCall Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCRSapana Sha
 
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Branding strategies of new company .pptx
Branding strategies of new company .pptxBranding strategies of new company .pptx
Branding strategies of new company .pptxVikasTiwari846641
 
Uncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 ReportsUncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 ReportsVWO
 
Defining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotlerDefining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotlerAmirNasiruog
 
Unraveling the Mystery of The Circleville Letters.pptx
Unraveling the Mystery of The Circleville Letters.pptxUnraveling the Mystery of The Circleville Letters.pptx
Unraveling the Mystery of The Circleville Letters.pptxelizabethella096
 
Situation Analysis | Management Company.
Situation Analysis | Management Company.Situation Analysis | Management Company.
Situation Analysis | Management Company.DanielaQuiroz63
 
Brand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdfBrand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdftbatkhuu1
 
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesGoogle 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesSearch Engine Journal
 
The Skin Games 2024 25 - Sponsorship Deck
The Skin Games 2024 25 - Sponsorship DeckThe Skin Games 2024 25 - Sponsorship Deck
The Skin Games 2024 25 - Sponsorship DeckToluwanimi Balogun
 
How to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail SuccessHow to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail SuccessAggregage
 
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxUnraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxelizabethella096
 
How to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setupsHow to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setupsssuser4571da
 
How videos can elevate your Google rankings and improve your EEAT - Benjamin ...
How videos can elevate your Google rankings and improve your EEAT - Benjamin ...How videos can elevate your Google rankings and improve your EEAT - Benjamin ...
How videos can elevate your Google rankings and improve your EEAT - Benjamin ...Benjamin Szturmaj
 

Recently uploaded (20)

Brand Strategy Master Class - Juntae DeLane
Brand Strategy Master Class - Juntae DeLaneBrand Strategy Master Class - Juntae DeLane
Brand Strategy Master Class - Juntae DeLane
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...
VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...
VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...
 
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
 
Labour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptxLabour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptx
 
Call Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCRCall Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCR
 
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
 
Branding strategies of new company .pptx
Branding strategies of new company .pptxBranding strategies of new company .pptx
Branding strategies of new company .pptx
 
Uncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 ReportsUncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 Reports
 
Defining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotlerDefining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotler
 
Foundation First - Why Your Website and Content Matters - David Pisarek
Foundation First - Why Your Website and Content Matters - David PisarekFoundation First - Why Your Website and Content Matters - David Pisarek
Foundation First - Why Your Website and Content Matters - David Pisarek
 
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
SEO Master Class - Steve Wiideman, Wiideman Consulting GroupSEO Master Class - Steve Wiideman, Wiideman Consulting Group
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
 
Unraveling the Mystery of The Circleville Letters.pptx
Unraveling the Mystery of The Circleville Letters.pptxUnraveling the Mystery of The Circleville Letters.pptx
Unraveling the Mystery of The Circleville Letters.pptx
 
Situation Analysis | Management Company.
Situation Analysis | Management Company.Situation Analysis | Management Company.
Situation Analysis | Management Company.
 
Brand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdfBrand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdf
 
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesGoogle 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
 
The Skin Games 2024 25 - Sponsorship Deck
The Skin Games 2024 25 - Sponsorship DeckThe Skin Games 2024 25 - Sponsorship Deck
The Skin Games 2024 25 - Sponsorship Deck
 
How to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail SuccessHow to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail Success
 
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxUnraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
 
How to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setupsHow to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setups
 
How videos can elevate your Google rankings and improve your EEAT - Benjamin ...
How videos can elevate your Google rankings and improve your EEAT - Benjamin ...How videos can elevate your Google rankings and improve your EEAT - Benjamin ...
How videos can elevate your Google rankings and improve your EEAT - Benjamin ...
 

An End to the Data Love Affair: Understanding Information Through Entropy

  • 1. Matt Gershoff CEO Conductrics www.conductrics.com twitter:@mgershoffMatt Gershoff www.conductrics.com Twitter:@mgershoff Entropy: An End to the Data Love Affair
  • 2. Who am I? Co-founder of Conductrics software. We help companies solve their last mile problem for analytics. Why me? Conductrics blends ideas from AB Testing, RL, Statistics, and Information Theory to generate human interpretable machine learning models that target customers with the experiences they care about. www.conductrics.com Conductrics Confidential 2
  • 3. We All Love Data! Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 4. But should we love data? Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 6. Losing Ticket Not Surprising Not Interesting Not Informative Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 8. If an event is Predictable it is Not Informative Matt Gershoff www.conductrics.com Twitter:@mgershoff Key Idea
  • 9. If an event is Not Predictable it is Informative Matt Gershoff www.conductrics.com Twitter:@mgershoff Key Idea
  • 10. Information=? Matt Gershoff www.conductrics.com Twitter:@mgershoff But What is Information Really?
  • 11. Data IS NOT Information Matt Gershoff www.conductrics.com Twitter:@mgershoff Key Idea
  • 12. Data Is Only Potential Information Matt Gershoff www.conductrics.com Twitter:@mgershoff Key Idea
  • 13. Source: https://www.thedailybeast.com/claude-shannon-the-juggling-poet-who-gave-us-the- information-age Photo: Alfred Eisenstaedt/The LIFE Picture Collection/Getty Images Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 14. Studied at MIT In 1948 while at Bell Labs published A Mathematical Theory of Communication Claude Shannon Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 15. Matt Gershoff www.conductrics.com Twitter:@mgershoff But What is Information Really? What is Information if not Data? Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 16. Matt Gershoff www.conductrics.com Twitter:@mgershoff But What is Information Really? Information in data is equal to the smallest* possible lossless encoding Matt Gershoff www.conductrics.com Twitter:@mgershoff *smallest on average
  • 17. The More Predictable The More Compression The Less Information Claude Shannon Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 18. Shannon Entropy is Measure of Unpredictability in Data Claude Shannon Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 19. Entropy is the Minimum number* of Bits to fully encode data * - average number Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 20. * - average number …or equivalently Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 21. Entropy is the Minimum number* of QUESTIONS to identify all possible values in the data * - average number Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 22. Matt Gershoff www.conductrics.com Twitter:@mgershoff But What is Information Really? Matt Gershoff www.conductrics.com Twitter:@mgershoff *smallest on average
  • 23. Matt Gershoff www.conductrics.com Twitter:@mgershoff But What is Information Really? Matt Gershoff www.conductrics.com Twitter:@mgershoff I see you are confused. To help with intuition, lets play a game of 20 Questions. What letter am I thinking of: A; B; C; or D?
  • 24. No No Yes Yes No Yes Is it ‘C’ or ‘D’? More Than 1? More Than 3? 1st Question 2nd Question 1 2 3 4 First Approach: Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 25. No No Yes Yes No Yes More Than 3? 1st Question 2nd Question A B 3 4 Is it ‘C’ or ‘D’? Is it ‘B’? Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 26. No No Yes Yes No Yes 1st Question 2nd Question A B C D Is it ‘C’ or ‘D’? Is it ‘B’? Is it ‘D’? Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 27. We can ALWAYS Pick the Letter after 2 Questions Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 28. What if we swap 0|1 for No|Yes? Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 29. Each Y|N Question = 1 Bit A 00 B 01 C 10 D 11 Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information Questions Bits
  • 30. We can identify each Letter with 2 bits Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information A 0 0 B 0 1 C 1 0 D 1 1 Bit1 Bit2
  • 31. So the number of bits in this case is equal to the number of Y|N Questions Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 33. A = 50% B = 25% C = 12.5% D = 12.5% It gets better. What if I think of the Letters based on this prior distribution? *From http://www.inference.org.uk/mackay/itprnn/ Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 34. Can we now do better, on average, than 2 Questions, or 2 Bits? *From http://www.inference.org.uk/mackay/itprnn/ Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 35. I don’t understand. Maybe you show me how this works. *From http://www.inference.org.uk/mackay/itprnn/ Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 36. Is it A? Y N A 1st Question Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 37. Is it A? Y N A Is it B? Y N B 1st Question 2nd Question Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 38. Is it A? Y N A Is it B? Y N B Is it C? Y N C 1st Question 2nd Question 3rd Question Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 39. Is it A? Y N A Is it B? Y N B Is it C? Y N C D 1st Question 2nd Question 3rd Question Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 40. Is it A? Y N A Is it B? Y N B Is it C? Y N C D 1st Question 2nd Question 3rd Question Data Encoding A=1 B=01 C=001 D=000 Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 41. In general, the average number of questions needed is: ෍ 1 𝑘 𝑃 𝐿𝑒𝑡𝑡𝑒𝑟i ⋅ Num Questions𝑖 Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 42. Average Number of Questions Needed? 0.5*1 + 0.25*2 + 0.125*3 + 0.125*3 = 1.75 A B C D Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information ෍ 1 4 𝑃 𝐿𝑒𝑡𝑡𝑒𝑟i ⋅ Num Questions𝑖
  • 43. Can we on average do better than 2 Questions or bits? Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 44. YES!!! Can we on average do better than 2 Questions or bits? Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 45. How much better depends on the prior probabilities Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 46. The more predictable the data the fewer questions or bits needed Maximum = 2 Our Case = 1.75 Minimum= 0 Matt Gershoff www.conductrics.com Twitter:@mgershoff Questions to Bits to Information
  • 47. 𝐌𝐚𝐭𝐡 𝐓𝐢𝐦𝐞 Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 49. From Data to Information Information Content of an individual event, or value, like a letter Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 50. From Data to Information − log2 𝑃 𝑥𝑖 Information Content of an individual event, or value, like a letter Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 51. From Data to Information Information Content of an Event (𝑥i) − log2 𝑃 𝑥𝑖 Prob of the Event Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 52. From Data to Information Information Content of an Event (𝑥i) − log2 𝑃 𝑥𝑖 Information is measured in Bits! Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 53. Entropy: H(x) ෍ 1 𝑘 −𝑃 𝑥i ⋅ log2 𝑃 𝑥𝑖 1) Calculate the Information in each event 2) Take Weighted Average of Information Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 54. ෍ 1 𝑘 −𝑃 𝑥i ⋅ log2 𝑃 𝑥𝑖 Entropy: H(x) Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 55. ෍ 1 𝑘 −𝑃 𝑥i ⋅ log2 𝑃 𝑥𝑖 Entropy: H(x) Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 56. Entropy: H(x) ෍ 1 𝑘 −𝑃 𝑥i ⋅ log2 𝑃 𝑥𝑖 Hmm… looks suspiciously similar to the ave number of questions calculation Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 57. Calculate the Entropy of our Letters A = 50% B = 25% C = 12.5% D = 12.5% Entropy: H(x) Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 58. First Calculate the log2 𝑃 𝑥𝑖 : A = Log(.5) = -1 B = Log(.25) = -2 C = Log(.125) = -3 D = Log(.125) = -3 Entropy: H(x) Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 59. Entropy: H(x) ෍ 1 𝑘 −𝑃 𝑥i ⋅ log2 𝑃 𝑥𝑖 -1*(0.5*-1 + 0.25*-2 + 0.125*-3 + 0.125*-3) = 1.75 A B C D Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 60. Entropy: H(x) ෍ 1 𝑘 −𝑃 𝑥i ⋅ log2 𝑃 𝑥𝑖 -1*(0.5*-1 + 0.25*-2 + 0.125*-3 + 0.125*-3) = 1.75 But that’s exactly the same result!Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 61. Entropy = Number of Bits = Number Questions Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 62. What can I do with it? Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 63. Information Gain Turn Prediction Targeting in to Problem of Reducing Entropy Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 64. Information Gain Key Idea Behind Decision Trees Matt Gershoff www.conductrics.com Twitter:@mgershoff Example: Conductrics Predictive Segments
  • 65. Information Gain 𝐼 𝑥; 𝑦 = 𝐻 𝑥 − 𝐻(𝑥|𝑦) Entropy of Target Variable Conditional Entropy of Target given Feature Y Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 66. Example Data Convert Windows Mobile N N N N N N N N N N N N N N Y N Y Y Y N Y Y Y N Y Y N Y Y N Y Y Y Y Y Y Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 67. Summary Statistics Variable Coverage Conversion Rate Entropy Windows Y 50% 83% 0.65 Windows N 50% 17% 0.65 Mobile Y 42% 60% 0.44 Mobile N 58% 43% 0.99 Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 68. Information Gain 1-.5*.65+.5*.65 = 0.35 Gain for Windows* 1-.42*.44+.58*.99 = 0.24 Gain for Mobile Yes No Yes No Matt Gershoff www.conductrics.com Twitter:@mgershoff Note: In this case Windows Yes and No are symmetrical, so results for each are the same, but this need not be true. You can see this in Mobile – the two are not the same.
  • 69. Information Gain Build Tree Overall Conversion Rate Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 70. Information Gain Split By Windows Overall Conversion Rate WindowsNO Yes Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 71. Decision Trees Repeat For each Leaf Node Stop when: • Node Size is below threshold • Information gain is below threshold • Tree Depth reaches defined limit These are all parameters YOU set to control overfitting Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 74. ෍ 1 𝑘 −𝑃 𝑥i ⋅ log2 𝑃 𝑥𝑖 I Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 75. Thank You!!! Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 76. Psst …much more than Decision Trees Information Gain is also Kullback-Leibler Divergence Matt Gershoff www.conductrics.com Twitter:@mgershoff
  • 77. Resources Information Theory David Mackay’s Course Information Theory, Pattern Recognition and Neural Networks http://www.inference.org.uk/mackay/itprnn/ https://www.youtube.com/channel/UCfoScwn69ekXXWNTN0 CLGXA Mackay’s Free Online Text Book http://www.inference.org.uk/mackay/itila/ Decision Tree ID3 Algo https://en.wikipedia.org/wiki/ID3_algorithm Matt Gershoff www.conductrics.com Twitter:@mgershoff