SlideShare a Scribd company logo
1 of 19
A Big Data Case:
Bloom Filters
BERKAY DINÇER
A Big Data Case
•The task:
• Crawl as many web pages as you can.
• Google – Page Rank.
Web Crawler Architecture
Requirements
• 50 billion URLs to crawl.
• Problem of revisiting a URL.
• We have to keep track of the URLs we have
visited before.
• Avoid disk access for performance.
• Tradeoff between memory – speed.
Requirements
•Average size of a URL:
• IE supports 2083 characters at maximum.
• Plenty of URL shorteners around.
• We will assume average size of 80 characters (80 bytes).
◦ For 50 billion URLs: 80 × 50 × 109 = 4 × 1012 = 4 TB
◦ A lookup table of size 4 TB indexed with URLs.
◦ Too big to fit into the main memory.
◦ Disk is okay. But disk IO is very slow.
Bloom Filters
•Place a bloom filter on the stream of URLs
•Bloom Filter will decide if a URL is visited before.
• It will use much less memory.
• Queries will be in O(1).
• It might occasionally lie.
How?
• Start with a bit array initalized with 0s.
• N = 18.
• Predefined length.
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Inserting a URL
Querying
• 2 possible answers
• Possibly in set
• Definitely not in set
Rate of False Positives
•m: Total # of bits.
•k: # of hash functions.
•n: # of insertions.
•Probability that a bit is not set to 1 by a certain hash function: 1 −
1
𝑚
•For k hash functions: (1 −
1
𝑚
) 𝑘
•For n elements: (1 −
1
𝑚
) 𝑘𝑛
Rate of False Positives
•m: Total number of bits.
•k: # of hash functions.
•n: # of insertions.
•Probability that it is 1: 1 − (1 −
1
𝑚
) 𝑘𝑛
•For an item that is not in the set all of the k bits should be 1.
Crawler with Bloom Filter
LU TABLE / HASH TABLE
•4 TB required space.
•Mandatory Disk Access.
BLOOM FILTER
•139.48GB (0.00005 fpp) k=17
•111.58GB (0.0001 fpp) k=13
•No disk access required.
•O(k) insert time.
•O(k) query time.
Big Data Analytics for Time Critical
Mobility Applications
IoT Applications
Small problems will turn into big
problems very fast.
Bloom filters

More Related Content

What's hot

What's hot (20)

Hashing
HashingHashing
Hashing
 
Hash table in java
Hash table in javaHash table in java
Hash table in java
 
Binary Heap Tree, Data Structure
Binary Heap Tree, Data Structure Binary Heap Tree, Data Structure
Binary Heap Tree, Data Structure
 
Hashing
HashingHashing
Hashing
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15
 
Heaps
HeapsHeaps
Heaps
 
Heap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmsHeap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithms
 
Heap Sort Algorithm
Heap Sort Algorithm Heap Sort Algorithm
Heap Sort Algorithm
 
Heap sort
Heap sort Heap sort
Heap sort
 
Consistent hashing
Consistent hashingConsistent hashing
Consistent hashing
 
Linear and Binary search
Linear and Binary searchLinear and Binary search
Linear and Binary search
 
Rehashing
RehashingRehashing
Rehashing
 
Heap sort
Heap sortHeap sort
Heap sort
 
Heap sort
Heap sortHeap sort
Heap sort
 
Poster-SetCoverAlgorithm
Poster-SetCoverAlgorithmPoster-SetCoverAlgorithm
Poster-SetCoverAlgorithm
 
Quick and Heap Sort with examples
Quick and Heap Sort with examplesQuick and Heap Sort with examples
Quick and Heap Sort with examples
 
heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort Algorithm
 
Hashing
HashingHashing
Hashing
 
Chunked, dplyr for large text files
Chunked, dplyr for large text filesChunked, dplyr for large text files
Chunked, dplyr for large text files
 
Heap sort
Heap sortHeap sort
Heap sort
 

Similar to Bloom filters

Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"NUS-ISS
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTWsunnygleason
 
Go Big Quick with Elasticsearch
Go Big Quick with ElasticsearchGo Big Quick with Elasticsearch
Go Big Quick with ElasticsearchJason Scheller
 
Blockchain Technology Introduction and Basics
Blockchain Technology  Introduction and BasicsBlockchain Technology  Introduction and Basics
Blockchain Technology Introduction and Basicsjayasris2023
 
Windows 10 Nt Heap Exploitation (English version)
Windows 10 Nt Heap Exploitation (English version)Windows 10 Nt Heap Exploitation (English version)
Windows 10 Nt Heap Exploitation (English version)Angel Boy
 
London devops logging
London devops loggingLondon devops logging
London devops loggingTomas Doran
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
Web前端性能优化 2014
Web前端性能优化 2014Web前端性能优化 2014
Web前端性能优化 2014Yubei Li
 
AltaVista Search Engine Architecture
AltaVista Search Engine ArchitectureAltaVista Search Engine Architecture
AltaVista Search Engine ArchitectureChangshu Liu
 
Мониторинг. Опять, rootconf 2016
Мониторинг. Опять, rootconf 2016Мониторинг. Опять, rootconf 2016
Мониторинг. Опять, rootconf 2016Vsevolod Polyakov
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterAttila Szegedi
 
Decima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnDecima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnGuerrilla
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...huguk
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Jeremy Zawodny
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 

Similar to Bloom filters (20)

Lecture_3.pptx
Lecture_3.pptxLecture_3.pptx
Lecture_3.pptx
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTW
 
Go Big Quick with Elasticsearch
Go Big Quick with ElasticsearchGo Big Quick with Elasticsearch
Go Big Quick with Elasticsearch
 
Blockchain Technology Introduction and Basics
Blockchain Technology  Introduction and BasicsBlockchain Technology  Introduction and Basics
Blockchain Technology Introduction and Basics
 
Windows 10 Nt Heap Exploitation (English version)
Windows 10 Nt Heap Exploitation (English version)Windows 10 Nt Heap Exploitation (English version)
Windows 10 Nt Heap Exploitation (English version)
 
London devops logging
London devops loggingLondon devops logging
London devops logging
 
Cache is King!
Cache is King!Cache is King!
Cache is King!
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
Web前端性能优化 2014
Web前端性能优化 2014Web前端性能优化 2014
Web前端性能优化 2014
 
AltaVista Search Engine Architecture
AltaVista Search Engine ArchitectureAltaVista Search Engine Architecture
AltaVista Search Engine Architecture
 
Мониторинг. Опять, rootconf 2016
Мониторинг. Опять, rootconf 2016Мониторинг. Опять, rootconf 2016
Мониторинг. Опять, rootconf 2016
 
CAO-Unit-III.pptx
CAO-Unit-III.pptxCAO-Unit-III.pptx
CAO-Unit-III.pptx
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 
Decima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnDecima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero Dawn
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 

Recently uploaded

Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 

Recently uploaded (20)

Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 

Bloom filters

  • 1. A Big Data Case: Bloom Filters BERKAY DINÇER
  • 2. A Big Data Case •The task: • Crawl as many web pages as you can. • Google – Page Rank.
  • 4.
  • 5. Requirements • 50 billion URLs to crawl. • Problem of revisiting a URL. • We have to keep track of the URLs we have visited before. • Avoid disk access for performance. • Tradeoff between memory – speed.
  • 6. Requirements •Average size of a URL: • IE supports 2083 characters at maximum. • Plenty of URL shorteners around. • We will assume average size of 80 characters (80 bytes). ◦ For 50 billion URLs: 80 × 50 × 109 = 4 × 1012 = 4 TB ◦ A lookup table of size 4 TB indexed with URLs. ◦ Too big to fit into the main memory. ◦ Disk is okay. But disk IO is very slow.
  • 7. Bloom Filters •Place a bloom filter on the stream of URLs •Bloom Filter will decide if a URL is visited before. • It will use much less memory. • Queries will be in O(1). • It might occasionally lie.
  • 8. How? • Start with a bit array initalized with 0s. • N = 18. • Predefined length. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  • 10. Querying • 2 possible answers • Possibly in set • Definitely not in set
  • 11. Rate of False Positives •m: Total # of bits. •k: # of hash functions. •n: # of insertions. •Probability that a bit is not set to 1 by a certain hash function: 1 − 1 𝑚 •For k hash functions: (1 − 1 𝑚 ) 𝑘 •For n elements: (1 − 1 𝑚 ) 𝑘𝑛
  • 12. Rate of False Positives •m: Total number of bits. •k: # of hash functions. •n: # of insertions. •Probability that it is 1: 1 − (1 − 1 𝑚 ) 𝑘𝑛 •For an item that is not in the set all of the k bits should be 1.
  • 13.
  • 14. Crawler with Bloom Filter LU TABLE / HASH TABLE •4 TB required space. •Mandatory Disk Access. BLOOM FILTER •139.48GB (0.00005 fpp) k=17 •111.58GB (0.0001 fpp) k=13 •No disk access required. •O(k) insert time. •O(k) query time.
  • 15.
  • 16.
  • 17.
  • 18. Big Data Analytics for Time Critical Mobility Applications IoT Applications Small problems will turn into big problems very fast.

Editor's Notes

  1. SEO – Robots.txt
  2. Sadece false positive mümkün, false negative mümkün değil.  "possibly in set" ya da "definitely not in set« Hash collisions – dont use crypto hashes
  3. Too many hash functions -> Too many 1s Too less -> Just 1 collision is enough for a fpp
  4. One Hit Wonders
  5. Weather forecasts – Interpreting sensor data – Privacy preserving mobility mining – 3V of big data
  6. Big Table – Hbase -> Check if a row exists before doing disk access Medium -> Recommendation Chrome -> Malicious URLs