SlideShare a Scribd company logo
1 of 14
Download to read offline
HyperLogLog Intuition
(without the hard math)
Sim Simeonov, Founder & CTO, Swoop
@simeons / sim at swoop dot com
The following presentation takes a poetic
license in order to provide intuition
Q:How do we quickly count the number of
distinct things in some collection?
A: Since “things” is fuzzy, hash them to
simplify the problem to…
Q:How do we quickly determine the
cardinality (size) of a set of n numbers?
A: Quickly means using fewer resources.
Assume we only have k buckets…
Distribute n items randomly in k buckets
E(distance) ≅
!
"
E(min) ≅
!
"
⇒ 𝑛 ≅
!
#(%&")
more buckets == greater precision
We can estimate n from k and
the position of the first bucket…
without keeping any of the n numbers
Q:How do we improve the precision
of our estimate?
A: Use a collection of buckets and use the
mean of the estimates created from each.
HLL sketch ≅ a distribution of mins
true mean
HyperLogLog sketches are reaggregatable
because min reaggregates with min
Making it work in the real world
• Data is not uniformly distributed…
• Hash it!
• How do we get many “samples” from one set of hashes?
• Partition them!
• Can we get a good estimate for the mean?
• Yes, with some fancy math & empirical corrections.
• Do we actually have to keep the minimums?
• No, just keep the number of 0s before the first 1 in binary form.
https://research.neustar.biz/2012/10/25/sketch-of-the-day-hyperloglog-cornerstone-of-a-big-data-infrastructure/
http://bit.ly/spark-alchemy
http://bit.ly/spark-records
Improving patient outcomes
LEADING HEALTH DATA LEADING CONSUMER DATA
Lifestyle
Magazinesubscriptions
Catalogpurchases
Psychographics
Animal lover
Fisherman
Demographics
Propertyrecords
Internettransactions
• 280M unique US patients
• 7 years longitudinal data
• De-identified, HIPAA-safe
1st Party Data
Proprietary tech to
integrate data
NPI Data
Attributed to the
patient
Claims
ICD 9 or 10, CPT,
Rx and J codes
• 300M US Consumers
• 3,500+ consumer attributes
• De-identified, privacy-safe
Petabyte scale privacy-preserving ML/AI
• Experiment with the HLL functions in spark-alchemy.
• Keep big data in Spark only and interop with HLL sketches.
Do you want to make Spark great while improving millions of lives?
Let’s talk.
Calls to Action
sim at swoop dot com

More Related Content

Similar to HyperLogLog Intuition Without the Hard Math

Assessment In Spreadsheets
Assessment In SpreadsheetsAssessment In Spreadsheets
Assessment In Spreadsheetsguest46de76
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Jan Aerts
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Ian Morgan
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Bayes Nets meetup London
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knimeGreg Landrum
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingData-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingAlex Pinto
 
Tokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTrent McConaghy
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014The Hive
 
Designing Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence IntervalsDesigning Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence IntervalsTetsuya Sakai
 
Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015Big Data Spain
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsRussell Jurney
 
Nlp and Neural Networks workshop
Nlp and Neural Networks workshopNlp and Neural Networks workshop
Nlp and Neural Networks workshopQuantUniversity
 
Machine Learning with Hadoop Boston hug 2012
Machine Learning with Hadoop Boston hug 2012Machine Learning with Hadoop Boston hug 2012
Machine Learning with Hadoop Boston hug 2012MapR Technologies
 
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...Quantopian
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning AnalyticsXavier Ochoa
 
Agile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsAgile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsRussell Jurney
 
Introducing Core Role Designer - Michael Marks Product Manager - Identity, Co...
Introducing Core Role Designer - Michael Marks Product Manager - Identity, Co...Introducing Core Role Designer - Michael Marks Product Manager - Identity, Co...
Introducing Core Role Designer - Michael Marks Product Manager - Identity, Co...Core Security
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-sharestelligence
 
Real World NLP, ML, and Big Data
Real World NLP, ML, and Big DataReal World NLP, ML, and Big Data
Real World NLP, ML, and Big DataDevin Bost
 

Similar to HyperLogLog Intuition Without the Hard Math (20)

Assessment In Spreadsheets
Assessment In SpreadsheetsAssessment In Spreadsheets
Assessment In Spreadsheets
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingData-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
 
Tokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTokens, Complex Systems, and Nature
Tokens, Complex Systems, and Nature
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
 
Designing Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence IntervalsDesigning Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence Intervals
 
Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics Applications
 
Nlp and Neural Networks workshop
Nlp and Neural Networks workshopNlp and Neural Networks workshop
Nlp and Neural Networks workshop
 
Machine Learning with Hadoop Boston hug 2012
Machine Learning with Hadoop Boston hug 2012Machine Learning with Hadoop Boston hug 2012
Machine Learning with Hadoop Boston hug 2012
 
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Agile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsAgile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics Applications
 
Introducing Core Role Designer - Michael Marks Product Manager - Identity, Co...
Introducing Core Role Designer - Michael Marks Product Manager - Identity, Co...Introducing Core Role Designer - Michael Marks Product Manager - Identity, Co...
Introducing Core Role Designer - Michael Marks Product Manager - Identity, Co...
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
 
Real World NLP, ML, and Big Data
Real World NLP, ML, and Big DataReal World NLP, ML, and Big Data
Real World NLP, ML, and Big Data
 

More from Simeon Simeonov

High accuracy ML & AI over sensitive data
High accuracy ML & AI over sensitive dataHigh accuracy ML & AI over sensitive data
High accuracy ML & AI over sensitive dataSimeon Simeonov
 
Memory Issues in Ruby on Rails Applications
Memory Issues in Ruby on Rails ApplicationsMemory Issues in Ruby on Rails Applications
Memory Issues in Ruby on Rails ApplicationsSimeon Simeonov
 
Revolutionazing Search Advertising with ElasticSearch at Swoop
Revolutionazing Search Advertising with ElasticSearch at SwoopRevolutionazing Search Advertising with ElasticSearch at Swoop
Revolutionazing Search Advertising with ElasticSearch at SwoopSimeon Simeonov
 
The Rough Guide to MongoDB
The Rough Guide to MongoDBThe Rough Guide to MongoDB
The Rough Guide to MongoDBSimeon Simeonov
 
Three Tips for Winning Startup Weekend
Three Tips for Winning Startup WeekendThree Tips for Winning Startup Weekend
Three Tips for Winning Startup WeekendSimeon Simeonov
 
Swoop: Solve Hard Problems & Fly Robots
Swoop: Solve Hard Problems & Fly RobotsSwoop: Solve Hard Problems & Fly Robots
Swoop: Solve Hard Problems & Fly RobotsSimeon Simeonov
 
Build a Story Factory for Inbound Marketing in Five Easy Steps
Build a Story Factory for Inbound Marketing in Five Easy StepsBuild a Story Factory for Inbound Marketing in Five Easy Steps
Build a Story Factory for Inbound Marketing in Five Easy StepsSimeon Simeonov
 
Strategies for Startup Success by Simeon Simeonov
Strategies for Startup Success by Simeon SimeonovStrategies for Startup Success by Simeon Simeonov
Strategies for Startup Success by Simeon SimeonovSimeon Simeonov
 
Patterns of Successful Angel Investing by Simeon Simeonov
Patterns of Successful Angel Investing by Simeon SimeonovPatterns of Successful Angel Investing by Simeon Simeonov
Patterns of Successful Angel Investing by Simeon SimeonovSimeon Simeonov
 
Customer Development: The Second Decade by Bob Dorf
Customer Development: The Second Decade by Bob DorfCustomer Development: The Second Decade by Bob Dorf
Customer Development: The Second Decade by Bob DorfSimeon Simeonov
 

More from Simeon Simeonov (11)

High accuracy ML & AI over sensitive data
High accuracy ML & AI over sensitive dataHigh accuracy ML & AI over sensitive data
High accuracy ML & AI over sensitive data
 
Memory Issues in Ruby on Rails Applications
Memory Issues in Ruby on Rails ApplicationsMemory Issues in Ruby on Rails Applications
Memory Issues in Ruby on Rails Applications
 
Revolutionazing Search Advertising with ElasticSearch at Swoop
Revolutionazing Search Advertising with ElasticSearch at SwoopRevolutionazing Search Advertising with ElasticSearch at Swoop
Revolutionazing Search Advertising with ElasticSearch at Swoop
 
The Rough Guide to MongoDB
The Rough Guide to MongoDBThe Rough Guide to MongoDB
The Rough Guide to MongoDB
 
Three Tips for Winning Startup Weekend
Three Tips for Winning Startup WeekendThree Tips for Winning Startup Weekend
Three Tips for Winning Startup Weekend
 
Swoop: Solve Hard Problems & Fly Robots
Swoop: Solve Hard Problems & Fly RobotsSwoop: Solve Hard Problems & Fly Robots
Swoop: Solve Hard Problems & Fly Robots
 
Build a Story Factory for Inbound Marketing in Five Easy Steps
Build a Story Factory for Inbound Marketing in Five Easy StepsBuild a Story Factory for Inbound Marketing in Five Easy Steps
Build a Story Factory for Inbound Marketing in Five Easy Steps
 
Strategies for Startup Success by Simeon Simeonov
Strategies for Startup Success by Simeon SimeonovStrategies for Startup Success by Simeon Simeonov
Strategies for Startup Success by Simeon Simeonov
 
Patterns of Successful Angel Investing by Simeon Simeonov
Patterns of Successful Angel Investing by Simeon SimeonovPatterns of Successful Angel Investing by Simeon Simeonov
Patterns of Successful Angel Investing by Simeon Simeonov
 
Customer Development: The Second Decade by Bob Dorf
Customer Development: The Second Decade by Bob DorfCustomer Development: The Second Decade by Bob Dorf
Customer Development: The Second Decade by Bob Dorf
 
Beyond Bootstrapping
Beyond BootstrappingBeyond Bootstrapping
Beyond Bootstrapping
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

HyperLogLog Intuition Without the Hard Math

  • 1. HyperLogLog Intuition (without the hard math) Sim Simeonov, Founder & CTO, Swoop @simeons / sim at swoop dot com
  • 2. The following presentation takes a poetic license in order to provide intuition
  • 3. Q:How do we quickly count the number of distinct things in some collection? A: Since “things” is fuzzy, hash them to simplify the problem to…
  • 4. Q:How do we quickly determine the cardinality (size) of a set of n numbers? A: Quickly means using fewer resources. Assume we only have k buckets…
  • 5. Distribute n items randomly in k buckets E(distance) ≅ ! " E(min) ≅ ! " ⇒ 𝑛 ≅ ! #(%&") more buckets == greater precision
  • 6. We can estimate n from k and the position of the first bucket… without keeping any of the n numbers
  • 7. Q:How do we improve the precision of our estimate? A: Use a collection of buckets and use the mean of the estimates created from each.
  • 8. HLL sketch ≅ a distribution of mins true mean
  • 9. HyperLogLog sketches are reaggregatable because min reaggregates with min
  • 10. Making it work in the real world • Data is not uniformly distributed… • Hash it! • How do we get many “samples” from one set of hashes? • Partition them! • Can we get a good estimate for the mean? • Yes, with some fancy math & empirical corrections. • Do we actually have to keep the minimums? • No, just keep the number of 0s before the first 1 in binary form. https://research.neustar.biz/2012/10/25/sketch-of-the-day-hyperloglog-cornerstone-of-a-big-data-infrastructure/
  • 13. Improving patient outcomes LEADING HEALTH DATA LEADING CONSUMER DATA Lifestyle Magazinesubscriptions Catalogpurchases Psychographics Animal lover Fisherman Demographics Propertyrecords Internettransactions • 280M unique US patients • 7 years longitudinal data • De-identified, HIPAA-safe 1st Party Data Proprietary tech to integrate data NPI Data Attributed to the patient Claims ICD 9 or 10, CPT, Rx and J codes • 300M US Consumers • 3,500+ consumer attributes • De-identified, privacy-safe Petabyte scale privacy-preserving ML/AI
  • 14. • Experiment with the HLL functions in spark-alchemy. • Keep big data in Spark only and interop with HLL sketches. Do you want to make Spark great while improving millions of lives? Let’s talk. Calls to Action sim at swoop dot com