SlideShare a Scribd company logo
1 of 95
SQL or NoSQL, that is the question! October 2011 Andraž Tori, CTO at Zemanta @andraz andraz@zemanta.com
Answering ,[object Object],[object Object],[object Object]
SQL is awesome! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
No, really, it's awesome! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 
So this is the end, right?
Why the heck would someone not want SQL?
Why not to use SQL? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Some other perspectives...
Let's say ...
You are a big tech company, located on west coast of USA
 
You are... ,[object Object],[object Object],[object Object],[object Object],[object Object]
You want to ,[object Object],[object Object],[object Object],[object Object]
You are...
Some interesting constraints ,[object Object]
So... ,[object Object],[object Object],[object Object],[object Object]
Numbers everybody should know! Jeff Dean at famous Stanford talk L1 cache reference 0.5 ns Branch mispredict  5 ns L2 cache reference  7 ns Mutex lock/unlock  25 ns Main memory reference  100 ns Compress 1K bytes w/ cheap algorithm  3,000 ns Send 2K bytes over 1 Gbps network  20,000 ns Read 1 MB sequentially from memory  250,000 ns Round trip within same datacenter  500,000 ns Disk seek  10,000,000 ns Read 1 MB sequentially from disk  20,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns
Facebook circa 2009 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Big Data ,[object Object],[object Object],[object Object],[object Object]
So smart guys create solutions to these  internal challenges
And then? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Four papers to rule them all ,[object Object],[object Object],[object Object],[object Object]
Solving problems of big guys?
Total Sites Across All Domains August 1995 - October 2011, NetCraft
Yesterday's problem of biggest guys Is today's problem of garden variety startup
 
And so we end up with Cambrian explosion
 
 
These solutions don't have much in common, Except...
They definitely aren't SQL
Not Only SQL
So what are these beasts?
That's a hard question... ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example use-cases ,[object Object],[object Object],[object Object],[object Object],[object Object]
At the core, it's a different set of trade-offs and operational constraints
Trade-offs and operational constraints ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
More possibilities ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A consistent topic: CAP Theorem
CAP theorem (Eric Brewer, 2000, Symposium on Principles of Distributed Computing) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
"There is no free lunch with distributed data” –  HP
Eventual Consistency ,[object Object],[object Object]
But this is horrible!
…  you already are eventually consistent! :) If your database stores how many vases you have in your shop...
Eventual consistency ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
There are different kinds of consistencies ,[object Object],[object Object],[object Object],[object Object]
There's not even a proper taxonomy of features different NoSQL solutions offer
And this presentation is too short to present whole breadth of possibilities
Usual taxonomy of NoSQL ,[object Object],[object Object],[object Object],[object Object],[object Object]
Other attributes ,[object Object],[object Object],[object Object]
Key/Value stores ,[object Object],[object Object],[object Object]
Document databases ,[object Object],[object Object],[object Object],[object Object],[object Object]
Column stores ,[object Object],[object Object],[object Object]
Graph Databases ,[object Object],[object Object]
To make the situation even more confusing... ,[object Object],[object Object],[object Object]
Two examples ,[object Object],[object Object],[object Object],[object Object]
 
Cassandra ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cassandra – writes ,[object Object],[object Object],[object Object],[object Object],[object Object]
Cassandra ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Not enough time to go into data model...
Cassandra ,[object Object],[object Object],[object Object],[object Object],[object Object]
Experiences? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cassandra - queries ,[object Object],[object Object],[object Object]
 
Hadoop ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hadoop: Why? (Owen O’Malley, Yahoo Inc!, omalley@apache.org) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hadoop @ Facebook ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
But also at smaller startups ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hadoop Architecture - HDFS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 
 
Hadoop Architecture - MapReduce ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Infamous word counting example ,[object Object],[object Object],[object Object],Otuput Mapper1:  One 1 And 1 One 1 Is 1 Output Mapper2:  Two 1 And 1 One 1 Is 1 Three 1 And 1 And 1 Is 1 Is 1 One 1 One 1 One 1 Two 1 Three 1 And 2 Is 2 One 3 Two 1 Three 1 Sorter Reducer
Important to know ,[object Object],[object Object],[object Object],[object Object]
Hadoop ,[object Object],[object Object],[object Object],[object Object]
Now, this is a bit wonky, right? ,[object Object],[object Object],[object Object]
Benchmarks, 2009 ,[object Object],Bytes Nodes Maps Reduces Replication Time 500000000000 1406 8000 2600 1 59 seconds 1000000000000 1460 8000 2700 1 62 seconds 100000000000000 3452 190000 10000 2 173 minutes 1000000000000000 3658 80000 20000 2 975 minutes
Hive
Hive ,[object Object],[object Object],[object Object],[object Object],[object Object]
Hive – Why we love it at Zemanta ,[object Object],[object Object],[object Object],[object Object],[object Object]
 
Mahout ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
General notes
Some observations ,[object Object],[object Object],[object Object],[object Object],[object Object]
Some internals, “fun” tricks ,[object Object],[object Object],[object Object],[object Object],[object Object]
Consistent hashing ,[object Object],[object Object],[object Object]
And if you don't take anything else from this presentation...
 
 
 
But there's more to it
This is the edge today ,[object Object],[object Object],[object Object],[object Object],[object Object]
Images ,[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Introduction to NuoDB
Introduction to NuoDBIntroduction to NuoDB
Introduction to NuoDB
Sandun Perera
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetup
Gianmario Spacagna
 

What's hot (20)

Introduction to NuoDB
Introduction to NuoDBIntroduction to NuoDB
Introduction to NuoDB
 
Databases in the Cloud
Databases in the CloudDatabases in the Cloud
Databases in the Cloud
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
 
The Future of Distributed Databases
The Future of Distributed DatabasesThe Future of Distributed Databases
The Future of Distributed Databases
 
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
 
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
 
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullySQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
 
No sql
No sqlNo sql
No sql
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great Together
 
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetup
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
 

Similar to SQL or NoSQL, that is the question!

Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
elliando dias
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
Bhupesh Bansal
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Cal Henderson
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
supertom
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
Heriyadi Janwar
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
Igor Moochnick
 

Similar to SQL or NoSQL, that is the question! (20)

Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed Database
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
 
Os Krug
Os KrugOs Krug
Os Krug
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
NoSQL
NoSQLNoSQL
NoSQL
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]
 
Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQL
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
 

More from Andraz Tori

More from Andraz Tori (10)

Ljubljana je Zakon 2013
Ljubljana je Zakon 2013Ljubljana je Zakon 2013
Ljubljana je Zakon 2013
 
Triple your blog post frequency
Triple your blog post frequencyTriple your blog post frequency
Triple your blog post frequency
 
Future of content cration
Future of content crationFuture of content cration
Future of content cration
 
Augmenting Content
Augmenting ContentAugmenting Content
Augmenting Content
 
Zemanta Tech Talk at Audible
Zemanta Tech Talk at AudibleZemanta Tech Talk at Audible
Zemanta Tech Talk at Audible
 
Quality, quantity, web and semantics
Quality, quantity, web and semanticsQuality, quantity, web and semantics
Quality, quantity, web and semantics
 
#LjubljanaJeZakon
#LjubljanaJeZakon#LjubljanaJeZakon
#LjubljanaJeZakon
 
Semantic web user interfaces - Do they have to be ugly?
Semantic web user interfaces - Do they have to be ugly?Semantic web user interfaces - Do they have to be ugly?
Semantic web user interfaces - Do they have to be ugly?
 
SemWeb install-fest presentation
SemWeb install-fest presentationSemWeb install-fest presentation
SemWeb install-fest presentation
 
Beyond who else bought what
Beyond who else bought whatBeyond who else bought what
Beyond who else bought what
 

Recently uploaded

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

SQL or NoSQL, that is the question!

Editor's Notes

  1. Kaj sploh je Silicijeva Dolina? Zakaj se to sploh sprašujemo? Mislijo politiki dobesedno? Povedal bom o pozitivnih straneh. V bistvu sem hotel povedati drugo zgodbo