SlideShare a Scribd company logo
1 of 41
Bigdata vs. Data Warehousing
     Synergy or Conflict?



          Thomas Kejser
        thomas@kejser.org
       http://blog.kejser.org
          @thomaskejser
Who is this Guy?


Thomas Kejser
http://blog.kejser.org
@thomaskejser

• Formerly: Lead SQLCAT EMEA
• Now:      CTO FusionIo EMEA

• 15 year database experience
• Performance Tuner
Human Consciousness Doesn’t Scale
                 10



                 9
Billion Humans




                 8



                 7



                 6



                 5
                  2000   2050   2100          2150   2200            2250
                                       Year           Source: United Nations Projections
Text Messages in a Table

CREATE TABLE AllTexts (
    Sender BIGINT                 8B
    , Receiver BIGINT             8B
    , SenderLocation BIGINT       8B
    , ReceiverLocation BIGINT     8B
    , Time DATETIME               8B
    , SMS VARCHAR(140)          140B
)
                           = 180Bytes
How much do we text?

• World Average
    •   6.1 Trillion Text Messages / year
    •   About 80% cell phone coverage
    •   7 billion people
    •   3 messages/day/person
• But:
    • Teenagers: 50 messages/day




Source: Pew Internet Research 2010 & ITU
How much will we EVER text?

• 9B people acting like teenagers (in 2050)
  • 50 texts/day
• That’s 450 billion texts/day
  • 164 Trillion texts/year (20x today)
  • 180 bytes each
  • Assume x3 compression
• Approximation: 10 Petabytes/year in
  2050
Moore’s Hard Drives


       LOG
Capacity GB




                  Can it be done?
                                    Year
How Large is this/year?



Hard Disk (4TB) : 2.5” Wine Bottle (75cl): 4.0”



            About 1500 Wine Bottles
In the Data Center

• Calculating:
  • 2U Storage=24 Disks
    (includes compute)
  • 4TB per Disk
  • 100TB in 2U (a bit
    less)
  • 10PB = 200U storage
• About six racks
Warehouses Serve us Well..
… And it is Becoming a Commodity

• Good Management
  Interfaces
• Standard SQL
  • with a few extensions
• Appliances
• Support system
• Homogenous HW
  • In chunks
vs.
PDW vs. Hive – Scan/seek
Query 1                     Query 2
SELECT count(*)             SELECT max(l_quantity)
FROM lineitem               FROM lineitem
                            WHERE l_orderkey > 1000
                              and l_orderkey < 100000
                            GROUP BY l_linestatus



          Secs.
          1500

          1000
                                               Hive
           500                                 PDW

             0
                  Query 1     Query 2
PDW vs. Hive - Joins
                                 PDW-U:
SELECT max(l_orderkey)           • orders partitioned on c_custkey
FROM orders
JOIN lineitem                    • lineitem partitioned on l_partkey
ON l_orderkey = o_orderkey       PDW-P:
                                 • orders partitioned on o_orderkey
                                 • lineitem partitioned on
                                   l_orderkey

        Secs.
         4000

         3000
                                                  Hive
         2000                                     PDW-U
         1000                                     PDW-P
            0
                  Hive   PDW-U    PDW-P
What does Big Data need to Catch up?

• Thread startup times
• Co-location awareness
• Files vs. optimized DB memory
  structures
• Column stores and other DB tech

            Generic is good…

… but when there is structure, make
            use of it!
• What is Bigdata
           Very Unstructured Data
How many Pictures of Cats?

• Flickr Today:
  • 300MB/month
  • 2GB/year
  • 51M users (too small?)


• Estimate: 102 PB /
  year

• 10 x text messages


                             Source: WikiPedia
How big is this in wine bottles?
We have learned how to store it!
What is HDFS?

• Distributed File
  System
• Open Source
• No more SAN



• The Failure
  Unit is the
  Server
Fully unstructured data is
          boring


…Unless you get money for
        storing it
Acquiring Personal Information




Your Semi-structured Data, the Old Fashioned Way
The Social Angle

Who do you talk to and how often?
The Reasons

Why do you own a cell phone?
Saturday, 1:39am   - at The Pub




Your Semi-structured Data, For Free
Big Value

      Extraction of
 of meaning and insight
from semi-structured data
Extracting Meaning from Humans

Method                             Examples
Turn semi-structure to structure   Image recognition, network proximity
                                   and super nodes, social media
Needle in a haystack               Extract outliers, Fraud
Herd behaviors                     Clustering, Pattern Recognition,
                                   “Customers who bought this also
                                   bought”
Text classification and search     Text indexes, syntactic counting,
                                   pagerank
Text to structure                  Semantic analysis, loose structure into
                                   structure
Find New Customers



 “Michael, who is
                                Tommy

                       Thomas

 respected among his
 peers,                             Michael
 often talks
 about his
 new, cool
 gadgets”
Cross Sell




 “Families who own an Aston Martin will often buy a
                 Mini Cooper too”
Free Information
Need: Lots of CPU Cores!
Need: Data Centers!
Provisioning has to be REALLY fast
Things to Learn for the Future

• Get good at
  • Statistics (again)
  • Distributed Algorithms
  • Tuning
• Understand Physical
  Constraints
• Acquire deep domain
  knowledge
Something is Changing


      Today                             Tomorrow




     CAPEX Hardware     OPEX Hardware       You
The Mother of All Stovepipes
Big Data / Staging
                (No Model)


Data you
are afraid                          Data You      Delivery
to lose                           actually need
                                                  (Model)
Synergy




              Create Structure
                  for me


                                 Warehouse
          Here is a table
Applying Social Media to Structure
Summary

    Data Warehouse                 Big Data

•   There is a model               •   Don’t bother modeling!
•   Seek Co-location               •   Optional Co-Location
•   Respond in seconds             •   Respond in minutes
•   Calculate first, query after   •   Calculate while querying
•   Expensive HW                   •   Cheap HW
•   Optimise for target HW         •   Good enough on all HW
•   Homogenous HW                  •   Heterogeneous HW
•   Pay vendor, expect             •   Free license, optimise
    optimised                          yourself
&

More Related Content

What's hot

Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache HadoopSuman Saurabh
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxPankajkumar496281
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035Neelam Rawat
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKaran Desai
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataJoey Li
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An OverviewArvind Kalyan
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big DataFujitsu UK
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)SiamAhmed16
 
Big Data Overview 2013-2014
Big Data Overview 2013-2014Big Data Overview 2013-2014
Big Data Overview 2013-2014KMS Technology
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data AnalyticsTUSHAR GARG
 
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataNextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataEd Dodds
 

What's hot (20)

Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big Data
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
Big Data Overview 2013-2014
Big Data Overview 2013-2014Big Data Overview 2013-2014
Big Data Overview 2013-2014
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Big data
Big dataBig data
Big data
 
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataNextGen Infrastructure for Big Data
NextGen Infrastructure for Big Data
 

Similar to Big Data vs Data Warehousing

Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, whenEugenio Minardi
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenChristopher Whitaker
 
In Memory Databases: A Real Time Analytics Solution
In Memory Databases: A Real Time Analytics SolutionIn Memory Databases: A Real Time Analytics Solution
In Memory Databases: A Real Time Analytics SolutionAdaryl "Bob" Wakefield, MBA
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataDenny Lee
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdbjixuan1989
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureChristos Charmatzis
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloMini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloOCTO Technology
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBjhugg
 
BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)Ashok Rangaswamy
 
Big Data, Big Dream
Big Data, Big DreamBig Data, Big Dream
Big Data, Big DreamWayne Weixin
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 

Similar to Big Data vs Data Warehousing (20)

Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
In Memory Databases: A Real Time Analytics Solution
In Memory Databases: A Real Time Analytics SolutionIn Memory Databases: A Real Time Analytics Solution
In Memory Databases: A Real Time Analytics Solution
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big Data
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
 
NoSQL e Python RuPy 2012
NoSQL e Python RuPy 2012NoSQL e Python RuPy 2012
NoSQL e Python RuPy 2012
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloMini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)
 
Big Data, Big Dream
Big Data, Big DreamBig Data, Big Dream
Big Data, Big Dream
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Big Data vs Data Warehousing

  • 1. Bigdata vs. Data Warehousing Synergy or Conflict? Thomas Kejser thomas@kejser.org http://blog.kejser.org @thomaskejser
  • 2. Who is this Guy? Thomas Kejser http://blog.kejser.org @thomaskejser • Formerly: Lead SQLCAT EMEA • Now: CTO FusionIo EMEA • 15 year database experience • Performance Tuner
  • 3. Human Consciousness Doesn’t Scale 10 9 Billion Humans 8 7 6 5 2000 2050 2100 2150 2200 2250 Year Source: United Nations Projections
  • 4. Text Messages in a Table CREATE TABLE AllTexts ( Sender BIGINT 8B , Receiver BIGINT 8B , SenderLocation BIGINT 8B , ReceiverLocation BIGINT 8B , Time DATETIME 8B , SMS VARCHAR(140) 140B ) = 180Bytes
  • 5. How much do we text? • World Average • 6.1 Trillion Text Messages / year • About 80% cell phone coverage • 7 billion people • 3 messages/day/person • But: • Teenagers: 50 messages/day Source: Pew Internet Research 2010 & ITU
  • 6. How much will we EVER text? • 9B people acting like teenagers (in 2050) • 50 texts/day • That’s 450 billion texts/day • 164 Trillion texts/year (20x today) • 180 bytes each • Assume x3 compression • Approximation: 10 Petabytes/year in 2050
  • 7. Moore’s Hard Drives LOG Capacity GB Can it be done? Year
  • 8. How Large is this/year? Hard Disk (4TB) : 2.5” Wine Bottle (75cl): 4.0” About 1500 Wine Bottles
  • 9. In the Data Center • Calculating: • 2U Storage=24 Disks (includes compute) • 4TB per Disk • 100TB in 2U (a bit less) • 10PB = 200U storage • About six racks
  • 11. … And it is Becoming a Commodity • Good Management Interfaces • Standard SQL • with a few extensions • Appliances • Support system • Homogenous HW • In chunks
  • 12. vs.
  • 13. PDW vs. Hive – Scan/seek Query 1 Query 2 SELECT count(*) SELECT max(l_quantity) FROM lineitem FROM lineitem WHERE l_orderkey > 1000 and l_orderkey < 100000 GROUP BY l_linestatus Secs. 1500 1000 Hive 500 PDW 0 Query 1 Query 2
  • 14. PDW vs. Hive - Joins PDW-U: SELECT max(l_orderkey) • orders partitioned on c_custkey FROM orders JOIN lineitem • lineitem partitioned on l_partkey ON l_orderkey = o_orderkey PDW-P: • orders partitioned on o_orderkey • lineitem partitioned on l_orderkey Secs. 4000 3000 Hive 2000 PDW-U 1000 PDW-P 0 Hive PDW-U PDW-P
  • 15. What does Big Data need to Catch up? • Thread startup times • Co-location awareness • Files vs. optimized DB memory structures • Column stores and other DB tech Generic is good… … but when there is structure, make use of it!
  • 16. • What is Bigdata Very Unstructured Data
  • 17. How many Pictures of Cats? • Flickr Today: • 300MB/month • 2GB/year • 51M users (too small?) • Estimate: 102 PB / year • 10 x text messages Source: WikiPedia
  • 18. How big is this in wine bottles?
  • 19. We have learned how to store it!
  • 20. What is HDFS? • Distributed File System • Open Source • No more SAN • The Failure Unit is the Server
  • 21. Fully unstructured data is boring …Unless you get money for storing it
  • 22. Acquiring Personal Information Your Semi-structured Data, the Old Fashioned Way
  • 23. The Social Angle Who do you talk to and how often?
  • 24. The Reasons Why do you own a cell phone?
  • 25. Saturday, 1:39am - at The Pub Your Semi-structured Data, For Free
  • 26. Big Value Extraction of of meaning and insight from semi-structured data
  • 27. Extracting Meaning from Humans Method Examples Turn semi-structure to structure Image recognition, network proximity and super nodes, social media Needle in a haystack Extract outliers, Fraud Herd behaviors Clustering, Pattern Recognition, “Customers who bought this also bought” Text classification and search Text indexes, syntactic counting, pagerank Text to structure Semantic analysis, loose structure into structure
  • 28. Find New Customers “Michael, who is Tommy Thomas respected among his peers, Michael often talks about his new, cool gadgets”
  • 29. Cross Sell “Families who own an Aston Martin will often buy a Mini Cooper too”
  • 31. Need: Lots of CPU Cores!
  • 33. Provisioning has to be REALLY fast
  • 34. Things to Learn for the Future • Get good at • Statistics (again) • Distributed Algorithms • Tuning • Understand Physical Constraints • Acquire deep domain knowledge
  • 35. Something is Changing Today Tomorrow CAPEX Hardware OPEX Hardware You
  • 36. The Mother of All Stovepipes
  • 37. Big Data / Staging (No Model) Data you are afraid Data You Delivery to lose actually need (Model)
  • 38. Synergy Create Structure for me Warehouse Here is a table
  • 39. Applying Social Media to Structure
  • 40. Summary Data Warehouse Big Data • There is a model • Don’t bother modeling! • Seek Co-location • Optional Co-Location • Respond in seconds • Respond in minutes • Calculate first, query after • Calculate while querying • Expensive HW • Cheap HW • Optimise for target HW • Good enough on all HW • Homogenous HW • Heterogeneous HW • Pay vendor, expect • Free license, optimise optimised yourself
  • 41. &

Editor's Notes

  1. We are at the end of the growth curve... 9B is our total population... This is an important observation because many data estimates are based on human activity and has so far assumed exponention growthm.. This is NOT the case anymore!
  2. This show the development of hard drive capacity over time
  3. The calculation is not meant to be read, just letting people know we did the calc and what it PHYSICALLY means (see the animation)... There is a real cost to storing a lot of data, and this is one of the reasons cloud makes a lot of senseWine bottles
  4. This is Hyde Park.. From on end to the other...