SlideShare a Scribd company logo
1 of 53
Download to read offline
Performance of Fractal-Tree
       Databases
      Michael A. Bender
The Problem
    Problem: maintain a dynamic dictionary on disk.
    Motivation: file systems, databases, etc.
    State of the art:
      •   B-tree [Bayer, McCreight 72]
      •   cache-oblivious B-tree [Bender, Demaine, Farach-Colton 00]
      •   buffer tree [Arge 95]
      •   buffered-repository tree[Buchsbaum,Goldwasser,Venkatasubramanian,Westbrook 00]
      •   Bε tree [Brodal, Fagerberg 03]
      •   log-structured merge tree [O'Neil, Cheng, Gawlick, O'Neil 96]
      •   string B-tree [Ferragina, Grossi 99]
      •   etc, etc!
    State of the practice:
      • B-trees + industrial-strength features


                                            Michael Bender -- Performance of Fractal-Tree Databases
2
The Problem
    Problem: maintain a dynamic dictionary on disk.
    Motivation: file systems, databases, etc.
    State of the art (algorithmic perspective):
     •   B-tree [Bayer, McCreight 72]
     •   cache-oblivious B-tree [Bender, Demaine, Farach-Colton 00]
     •   buffer tree [Arge 95]
     •   buffered-repository tree[Buchsbaum,Goldwasser,Venkatasubramanian,Westbrook 00]
     •   Bε tree [Brodal, Fagerberg 03]
     •   log-structured merge tree [O'Neil, Cheng, Gawlick, O'Neil 96]
     •   string B-tree [Ferragina, Grossi 99]
     •   etc, etc!
    State of the practice:
     • B-trees + industrial-strength features


                                           Michael Bender -- Performance of Fractal-Tree Databases
3
The Problem
    Problem: maintain a dynamic dictionary on disk.
    Motivation: file systems, databases, etc.
    State of the art (algorithmic perspective):
     •   B-tree [Bayer, McCreight 72]
     •   cache-oblivious B-tree [Bender, Demaine, Farach-Colton 00]
     •   buffer tree [Arge 95]
     •   buffered-repository tree[Buchsbaum,Goldwasser,Venkatasubramanian,Westbrook 00]
     •   Bε tree [Brodal, Fagerberg 03]
     •   log-structured merge tree [O'Neil, Cheng, Gawlick, O'Neil 96]
     •   string B-tree [Ferragina, Grossi 99]
     •   etc, etc!
    State of the practice:
     • B-trees + industrial-strength features/optimizations


                                           Michael Bender -- Performance of Fractal-Tree Databases
4
B-trees are Fast at Sequential Inserts
    Sequential inserts in B-trees have near-optimal
    data locality




                           Michael Bender -- Performance of Fractal-Tree Databases
5
B-trees are Fast at Sequential Inserts
    Sequential inserts in B-trees have near-optimal
    data locality




       These B-tree nodes reside            Insertions are into
              in memory                     this leaf node


      • One disk I/O per leaf (which contains many inserts).
      • Sequential disk I/O.
      • Performance is disk-bandwidth limited.


                                   Michael Bender -- Performance of Fractal-Tree Databases
6
B-Trees Are Slow at Ad Hoc Inserts
    High entropy inserts (e.g., random) in B-trees
    have poor data locality
                                                                   These B-tree nodes reside
                                                                          in memory




     • Most nodes are not in main memory.
     • Most insertions require a random disk I/O.
     • Performance is disk-seek limited.
     • ≤ 100 inserts/sec/disk (≤ 0.05% of disk bandwidth).


                                Michael Bender -- Performance of Fractal-Tree Databases
7
B-trees Have a Similar Story for Range Queries



                                                              Leaf nodes are scattered
                                                             across disk in aged B-tree.




    Range queries in newly built B-trees have good
    locality
    Range queries in aged B-trees have poor locality
     • Leaf blocks are scattered across disk.
     • For page-sized nodes, as low as 1% disk bandwidth.

                             Michael Bender -- Performance of Fractal-Tree Databases
8
B-trees Have a Similar Story for Range Queries



                                                              Leaf nodes are scattered
                                                             across disk in aged B-tree.




    Range queries in newly built B-trees have good
    locality
    Range queries in aged B-trees have poor locality
     • Leaf blocks are scattered across disk.
     • For page-sized nodes, as low as 1% disk bandwidth.

                             Michael Bender -- Performance of Fractal-Tree Databases
9
Results
     Cache-Oblivious Streaming B-tree [Bender, Farach-
     Colton, Fineman, Fogel, Kuszmaul, Nelson 07]
       • Replacement for Traditional B-tree
       • High entropy inserts/deletes run up to 100x faster
       • No aging --> always fast range queries
       • Streaming B-tree is cache-oblivious
        ‣   Good data locality without memory-specific parameterization.




                                               Michael Bender -- Performance of Fractal-Tree Databases
10
Results (cont)
                             Fractal TreeTM database
     Application Layer
                              • TokuDB is a storage engine for MySQL
                               ‣   A storage engine is a structure that stores on-disk data.
          Database             ‣   Traditionally a storage engine is a B-tree.

         SQL Processing,
                              • MySQL is an open-source database
       Query Optimization…     ‣   Most installations of any database

                              • Built in context of our startup Tokutek.
         TokuDB for
           MySQL             Performance
        File System           • 10x-100x faster index inserts
                              • No aging
                              • Faster queries in important cases


                                           Michael Bender -- Performance of Fractal-Tree Databases
11
Creative Fundraising for Startup




                       Michael Bender -- Performance of Fractal-Tree Databases
12
Algorithmic Performance Model

Minimize # of block transfers per operation


Disk-Access Machine (DAM) [Aggrawal, Vitter 88]
                                                                                       Memory
  • Two-levels of memory.
  • Two parameters:
      block-size B, memory-size M.                                                                  B

                                                                                        B

Cache-Oblivious Model (CO) [Frigo,
                     Leiserson, Prokop, Ramachandran 99]

  • Parameters B and M are unknown
    to the algorithm or coder.                                                               Disk
  • (Of course, used in proofs.)


                                   Michael Bender -- Performance of Fractal-Tree Databases
Algorithmic Performance Model

Minimize # of block transfers per operation


Disk-Access Machine (DAM) [Aggrawal, Vitter 88]
                                                                                       Memory
  • Two-levels of memory.
  • Two parameters:
      block-size B, memory-size M.                                                                  B   =?
                                                                                        B     =?
Cache-Oblivious Model (CO) [Frigo,
                     Leiserson, Prokop, Ramachandran 99]

  • Parameters B and M are unknown
    to the algorithm or coder.                                                               Disk
  • (Of course, used in proofs.)


                                   Michael Bender -- Performance of Fractal-Tree Databases
Fractal Tree Inserts (and Deletes)

                                B-tree                          Streaming B-tree

                                         logN                               logN
               Insert     O(logBN)=O(    logB       )                O(       B       )
     Example: N=1 billion, B=4096
      • 1 billion 128-byte rows (128 gigabytes)
        ‣ log2 (1 billion) = 30
      • Half-megabyte blocks that hold 4096 rows each
        ‣ log2 (4096) = 12

      • B-trees require logN = 30/12 = 3 disk seeks (modulo swapping,
                        logB
        but at least 1*
      • Streaming B-trees require logN = 30/4096 = 0.007 disk seeks
                                   B



                                  Michael Bender -- Performance of Fractal-Tree Databases
15
Fractal Tree Inserts (and Deletes)

                                B-tree                          Streaming B-tree

                                         logN                               logN
               Insert     O(logBN)=O(    logB       )                O(       B       )
     Example: N=1 billion, B=4096
      • 1 billion 128-byte rows (128 gigabytes)
        ‣ log2 (1 billion) = 30
      • Half-megabyte blocks that hold 4096 rows each
        ‣ log2 (4096) = 12

      • B-trees require logN = 30/12 = 3 disk seeks (modulo caching,
                         logB
        insertion pattern)
      • Streaming B-trees require logN = 30/4096 = 0.007 disk seeks
                                   B



                                  Michael Bender -- Performance of Fractal-Tree Databases
16
Inserts into Prototype Fractal Tree
     Random Inserts into Fractal Tree (“streaming B-
     tree”) and B-tree (Berkeley DB)




             Fractal Tree
                 B- Tree




                            Michael Bender -- Performance of Fractal-Tree Databases
17
Searches in Prototype Fractal Tree
     Point searches ~3.5x slower (N=230)
      • Searches/sec improves as more of data structure fits in
        cache)

                Fractal Tree
                 S B-tree
                    B- Tree




                               Michael Bender -- Performance of Fractal-Tree Databases
18
Asymmetry Between Inserts and Key Searches

     Small specification changes affect complexity
     E.g., duplicate keys
      • Slow: Return an error when a duplicate key is inserted
       ‣   Hidden search

      • Fast: Overwrite duplicates or maintain all versions
       ‣   No hidden search


     E.g. deletes
      • Return number elements deleted is slow
       ‣   Hidden search

      • Delete without feedback is fast
       ‣   No hidden search




19
Asymmetry Between Inserts and Key Searches

     Small specification changes affect complexity
     E.g., duplicate keys
      • Slow: Return an error when a duplicate key is inserted
       ‣   Hidden search

      • Fast: Overwrite duplicates or maintain all versions
       ‣   No hidden search


     E.g. deletes
      • Slow: Return number of elements deleted
       ‣   Hidden search

      • Fast: Delete without feedback
       ‣   No hidden search




20
Asymmetry Between Inserts and Key Searches

     Small specification changes affect complexity
     E.g., duplicate keys
      • Slow: Return an error when a duplicate key is inserted
       ‣   Hidden search

      • Fast: Overwrite duplicates or maintain all versions
       ‣   No hidden search


     E.g. deletes
      • Slow: Return number of elements deleted
       ‣   Hidden search

      • Fast: Delete without feedback
       ‣   No hidden search

     Next slide: extra difficulty of key searches
21
Extra Difficulty of Key Searches
Asymmetry Between Inserts and Key Searches

     Inserts/point query asymmetry has impact on
      • System design. How to redesign standard mechanisms
        (e.g., concurrency-control mechanism).
      • System use. How to take advantage of faster inserts
        (e.g., to enable faster queries).




                             Michael Bender -- Performance of Fractal-Tree Databases
23
24
Overview

     External-memory dictionaries
     Performance limitations of B-trees
     Fractal-Tree data structure (Streaming B-tree)
     Search/point-query asymmetry
     Impact of search/point-query asymmetry on
     database use
     How to build a streaming B-tree
     Impact of search/point-query asymmetry on system
     design
     Scaling into the future
                               Michael Bender -- Performance of Fractal-Tree Databases
25
Search/point-query asymmetry affecting
                        database use
How B-trees Are Used in Databases

   Select via Index             Select via Table Scan
select d where 270 ≤ a ≤ 538   select d where 270 ≤ e ≤ 538




      key   value                    key   value

        a   b c d e                    a   b c d e




   Data maintained in rows and stored in B-trees.
How B-trees Are Used in Databases

   Select via Index             Select via Table Scan
select d where 270 ≤ a ≤ 538   select d where 270 ≤ e ≤ 538




      key   value                    key   value

        a   b c d e                    a   b c d e




   Data maintained in rows and stored in B-trees.
How B-trees Are Used in Databases (Cont.)

     Selecting via an index can be slow, if it is
     coupled with point queries.

     select d where 270 ≤ b ≤ 538
                         main table                              index




         key   value                  key         value                      key            value
          a    b c d e                 b            a                             c           a




                                           Michael Bender -- Performance of Fractal-Tree Databases
29
How B-trees Are Used in Databases (Cont.)
     Covering index can speed up selects
       • Key contains all columns necessary to answer query.


     select d where 270 ≤ b ≤ 538
                      main table                           covering index




      key   value                  key     value                       key           value
        a   b c d e                 bd        a                             c           a




                                         Michael Bender -- Performance of Fractal-Tree Databases
30
Insertion Pain Can Masquerade as Query Pain

     People often don’t use these indexes.
     They use simplistic schema.
      • Sequential inserts via autoincrement key
      • Few indexes, few covering indexes
                                                                         key value
                             Autoincrement key
                           (effectively a timestanp)                          t         a b c d e


     Then insertions are fast but queries are slow.




                                  Michael Bender -- Performance of Fractal-Tree Databases
31
Insertion Pain Can Masquerade as Query Pain

     People often don’t use these indexes.
     They use simplistic schema.
      • Sequential inserts via autoincrement key
      • Few indexes, few covering indexes
                                                                         key value
                             Autoincrement key
                           (effectively a timestanp)                          t         a b c d e


     Then insertions are fast but queries are slow.
     Adding sophisticated indexes helps queries
      • B-trees cannot afford to maintain them.
        Fractal Trees can.
                                  Michael Bender -- Performance of Fractal-Tree Databases
32
How to Build a Fractal Tree and How it
                             Performs
Simplified (Cache-Oblivious) Fractal Tree



     20       21            22                                              23

          O((logN)/B) insert cost & O(log2N) search cost
           • Sorted arrays of exponentially increasing size.
           • Arrays are completely full or completely empty
             (depends on the bit representation of # of elmts).
           • Insert into the smallest array.
             Merge arrays to make room.


                                     Michael Bender -- Performance of Fractal-Tree Databases
34
Simplified (Cache-Oblivious) Fractal Tree (Cont.)




                          Michael Bender -- Performance of Fractal-Tree Databases
35
Analysis of Simplified Fractal Tree


     Insert Cost:
      • cost to flush buffer of size X = O(X/B)
      • cost per element to flush buffer = O(1/B)
      • max # of times each element is flushed = log N
      • insert cost = O((log N))/B) amortized memory transfers

     Search Cost
      • Binary search at each level
      • log(N/B) + log(N/B) - 1 + log(N/B) - 2 + ... + 2 + 1
         = O(log2(N/B))

                                  Michael Bender -- Performance of Fractal-Tree Databases
36
Idea of Faster Key Searches in Fractal Tree




     O(log (N/B)) search cost
      • Some redundancy of elements between levels
      • Arrays can be partially full
      • Horizontal and vertical pointers to redundant elements
      • (Fractional Cascading)

37
Why The Previous Data Structure is a Simplification


      • Need concurrency-control mechanisms
      • Need crash safety
      • Need transactions, logging+recovery
      • Need better search cost
      • Need to store variable-size elements
      • Need better amortization
      • Need to be good for random and sequential inserts
      • Need to support multithreading.
      • Need compression


                              Michael Bender -- Performance of Fractal-Tree Databases
38
iiBench Insertion Benchmark
                                                       iiBench - 1B Row Insert Test!
                         50,000!

                         45,000!

                         40,000!

                         35,000!
          Rows/Second!




                         30,000!

                         25,000!
                                                                                                                       InnoDB!
                         20,000!                                                                                       TokuDB!

                         15,000!

                         10,000!

                          5,000!

                              0!
                                   0!   200,000,000!    400,000,000!    600,000,000!   800,000,000!   1,000,000,000!
                                                               Rows Inserted!



     Fractal Trees scale with disk bandwidth not seek time.
       • In fact, now we are compute bound, so cannot yet take full advantage of more
         cores or disks. (This will change.)
39
iiBench Deletions

                                            iiBench - 500M Row Insert/Delete Test!
                    40,000!


                    35,000!        Insertions only here               Insertions + deletions here
                    30,000!


                    25,000!
     Rows/Second!




                    20,000!
                                                                                                                TokuDB!

                    15,000!
                                                                                                                InnoDB!


                    10,000!


                     5,000!


                         0!
                              0!   100,000,000!   200,000,000!     300,000,000!   400,000,000!   500,000,000!
                                                         Rows Inserted!




40
Search/point query asymmetry when
          building Fractal-Tree Database




41
Building TokuDB Storage Engine for MySQL
     Engineering to do list
      • Need concurrency-control mechanisms
      • Need crash safety
      • Need transactions, logging+recovery
      • Need better search cost
      • Need to store variable-size elements
      • Need better amortization
      • Need to be good for random and sequential inserts
      • Need to support multithreading.
      • Need compression


                              Michael Bender -- Performance of Fractal-Tree Databases
42
Building TokuDB Storage Engine for MySQL
     Engineering to do list
      • Need concurrency-control mechanisms
      • Need crash safety
      • Need transactions, logging+recovery
      • Need better search cost
      • Need to store variable-size elements
      • Need better amortization
      • Need to be good for random and sequential inserts
      • Need to support multithreading.
      • Need compression


                              Michael Bender -- Performance of Fractal-Tree Databases
43
Concurrency Control for Transactions
                  A         B                                         E        D            C


            A                          D
                  B                                          E
                        C

     Transactions
      • Sequence of durable operations.
      • Happen atomically.
     Atomicity in TokuDB via pessimistic locking
      • readers lock: A and B can both read row x of database.
      • writers lock: if A writes to row x, B cannot read x until A
        completes.


                                  Michael Bender -- Performance of Fractal-Tree Databases
44
Concurrency Control for Transactions (cont)

     B-tree implementation: maintain locks in leaves
      •   Insert row t
      •   Search for row u
      •   Search for row v and put a cursor
      •   Increment cursor. Now cursor points to row w.



                        t             u                 v         w


                  writer lock   reader lock        reader range lock


     Doesn’t work for Fractal Trees: maintaining locks
     involves implicit searches on writes.

                                              Michael Bender -- Performance of Fractal-Tree Databases
45
Scaling Fractal Trees into the Future
iiBench on SSD
                         35000


                         30000


                         25000
        Insertion Rate




                                                                       TokuDB
                         20000
                                                                            FusionIO
                                                                            X25E
                                                                            RAID10
                         15000


                         10000


                          5000                                         InnoDB
                                                                            FusionIO
                                                                            X25-E
                                                                            RAID10
                             0
                                 0   5e+07                    1e+08     1.5e+08
                                         Cummulative Insertions


     B-trees are slow on SSDs, probably b/c they waste bandwidth.
      • When inserting one row, a whole block (much larger) is written.


47
B-tree Inserts Are Slow on SSDs
     Inserting an element of size x into a B-tree dirties a
     leaf block of size B.




                                      x
                                     B
     We can write keys of size x into a B-tree using at
     most a O(x/B) fraction of disk bandwidth.
     Fractal trees do efficient inserts into SSDs because
     they transform random I/O into sequential I/O.

48
B-tree Inserts Are Slow on SSDs
     Inserting an element of size x into a B-tree dirties a
     leaf block of size B.




                                      x
                                     B
     We can write keys of size x into a B-tree using at
     most a O(x/B) fraction of disk bandwidth.
     Fractal trees do efficient inserts on SSDs because
     they transform random I/O into sequential I/O.

49
Disk Hardware Trends
     Disk capacity will continue to grow quickly

                  Year            Capacity           Bandwidth
                  2008               2 TB             100MB/s
                  2012              4.5 TB            150MB/s
                  2017              67 TB             500MB/s

     but seek times will change slowly.

      • Bandwidth scales as square root of capacity.


     Source: http://blocksandfiles.com/article/4501


50
Fractal Trees Enable Compact Systems
     B-trees require capacity, bandwidth, and
     random I/O
      • B-tree based systems achieve large random I/O rates by
        using more spindles and lower capacity disks.




     Fractal Trees require only capacity & bandwidth
      • Fractal Trees enable the use of high-capacity disks.




51
Fractal Trees Enable Big Disks
     B-trees require capacity, bandwidth, and seeks.
     Fractal trees require only capacity and bandwidth.
     Today, for a 50TB database,
      • Fractal tree with 25 2TB disks gives 500K ins/s.
      • B-tree with 25 2TB disks gives 2.5K ins/s.
      • B-tree with 500 100GB disks gives 50K ins/s but costs $, racks, and
        power.
     In 2017, for a 1500TB database:
      • Fractal tree with 25 67TB disks gives 2500K ins/s.
      • B-tree with 25 67TB disks gives 2.5K ins/s.
     B-trees need spindles, and spindle density increases
     slowly.


52
Michael Bender

               Using Big Disks Also Saves Energy
     Power consumption of disks
      • Enterprise 80 to 160 GB disk runs at 4W (idle power).
      • Enterprise 1-2 TB disk runs at 8W (idle power).
     Data centers/server farms use 80-160 GB disks
      • Use many small-capacity disks, not large ones.
     Using large disks may save factor >10 in
     Storage Costs
      • Other considerations modify this factor
       ‣   e.g., CPUs necessary to drive disks, scale-out infrastructure, cooling, etc.
       ‣   Metric: e.g., Watts/MB versus Inserts/Joule




                                                 Michael Bender -- Performance of Fractal-Tree Databases
2
53

More Related Content

What's hot

Advanced MariaDB features that developers love.pdf
Advanced MariaDB features that developers love.pdfAdvanced MariaDB features that developers love.pdf
Advanced MariaDB features that developers love.pdfFederico Razzoli
 
collectd & PostgreSQL
collectd & PostgreSQLcollectd & PostgreSQL
collectd & PostgreSQLMark Wong
 
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensOpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensMatthew Ahrens
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecturenickmbailey
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High AvailabilityHortonworks
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...DataStax
 
Firebird's Big Databases (in English)
Firebird's Big Databases (in English)Firebird's Big Databases (in English)
Firebird's Big Databases (in English)Alexey Kovyazin
 
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle databaseSamar Prasad
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceDatabricks
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best PracticesCloudera, Inc.
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters MongoDB
 
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationOri Reshef
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseDatabricks
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Databricks
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?Mydbops
 

What's hot (20)

Advanced MariaDB features that developers love.pdf
Advanced MariaDB features that developers love.pdfAdvanced MariaDB features that developers love.pdf
Advanced MariaDB features that developers love.pdf
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
collectd & PostgreSQL
collectd & PostgreSQLcollectd & PostgreSQL
collectd & PostgreSQL
 
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensOpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High Availability
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
Firebird's Big Databases (in English)
Firebird's Big Databases (in English)Firebird's Big Databases (in English)
Firebird's Big Databases (in English)
 
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle database
 
MySQL SQL Tutorial
MySQL SQL TutorialMySQL SQL Tutorial
MySQL SQL Tutorial
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
 
Database storage engines
Database storage enginesDatabase storage engines
Database storage engines
 
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisation
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
 

Viewers also liked

MySQL源码分析.02.Handler API
MySQL源码分析.02.Handler APIMySQL源码分析.02.Handler API
MySQL源码分析.02.Handler APILixun Peng
 
MySQL优化、新特性和新架构 彭立勋
MySQL优化、新特性和新架构 彭立勋MySQL优化、新特性和新架构 彭立勋
MySQL优化、新特性和新架构 彭立勋Lixun Peng
 
MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程Lixun Peng
 
MySQL新技术探索与实践
MySQL新技术探索与实践MySQL新技术探索与实践
MySQL新技术探索与实践Lixun Peng
 
阿里云RDS for MySQL的若干优化
阿里云RDS for MySQL的若干优化阿里云RDS for MySQL的若干优化
阿里云RDS for MySQL的若干优化Lixun Peng
 
DoubleBinlog方案
DoubleBinlog方案DoubleBinlog方案
DoubleBinlog方案Lixun Peng
 
DB_Algorithm_and_Data_Structure_About_Sort
DB_Algorithm_and_Data_Structure_About_Sort DB_Algorithm_and_Data_Structure_About_Sort
DB_Algorithm_and_Data_Structure_About_Sort Lixun Peng
 
Database.Cache&Buffer&Lock
Database.Cache&Buffer&LockDatabase.Cache&Buffer&Lock
Database.Cache&Buffer&LockLixun Peng
 
MySQL源码分析.03.InnoDB 物理文件格式与数据恢复
MySQL源码分析.03.InnoDB 物理文件格式与数据恢复MySQL源码分析.03.InnoDB 物理文件格式与数据恢复
MySQL源码分析.03.InnoDB 物理文件格式与数据恢复Lixun Peng
 
对MySQL的一些改进想法和实现
对MySQL的一些改进想法和实现对MySQL的一些改进想法和实现
对MySQL的一些改进想法和实现Lixun Peng
 
对MySQL应用的一些总结
对MySQL应用的一些总结对MySQL应用的一些总结
对MySQL应用的一些总结Lixun Peng
 
MySQL多机房容灾设计(with Multi-Master)
MySQL多机房容灾设计(with Multi-Master)MySQL多机房容灾设计(with Multi-Master)
MySQL多机房容灾设计(with Multi-Master)Lixun Peng
 
内部MySQL培训.1.基础技能
内部MySQL培训.1.基础技能内部MySQL培训.1.基础技能
内部MySQL培训.1.基础技能Lixun Peng
 
内部MySQL培训.2.高级应用
内部MySQL培训.2.高级应用内部MySQL培训.2.高级应用
内部MySQL培训.2.高级应用Lixun Peng
 
Alibaba patches in MariaDB
Alibaba patches in MariaDBAlibaba patches in MariaDB
Alibaba patches in MariaDBLixun Peng
 
MySQL新技术探索与实践
MySQL新技术探索与实践MySQL新技术探索与实践
MySQL新技术探索与实践Lixun Peng
 
Bender kuszmaul tutorial-xldb12
Bender kuszmaul tutorial-xldb12Bender kuszmaul tutorial-xldb12
Bender kuszmaul tutorial-xldb12Atner Yegorov
 
Fractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeFractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeTim Callaghan
 

Viewers also liked (20)

MySQL源码分析.02.Handler API
MySQL源码分析.02.Handler APIMySQL源码分析.02.Handler API
MySQL源码分析.02.Handler API
 
MySQL优化、新特性和新架构 彭立勋
MySQL优化、新特性和新架构 彭立勋MySQL优化、新特性和新架构 彭立勋
MySQL优化、新特性和新架构 彭立勋
 
MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程
 
MySQL新技术探索与实践
MySQL新技术探索与实践MySQL新技术探索与实践
MySQL新技术探索与实践
 
阿里云RDS for MySQL的若干优化
阿里云RDS for MySQL的若干优化阿里云RDS for MySQL的若干优化
阿里云RDS for MySQL的若干优化
 
DoubleBinlog方案
DoubleBinlog方案DoubleBinlog方案
DoubleBinlog方案
 
DB_Algorithm_and_Data_Structure_About_Sort
DB_Algorithm_and_Data_Structure_About_Sort DB_Algorithm_and_Data_Structure_About_Sort
DB_Algorithm_and_Data_Structure_About_Sort
 
Database.Cache&Buffer&Lock
Database.Cache&Buffer&LockDatabase.Cache&Buffer&Lock
Database.Cache&Buffer&Lock
 
MySQL源码分析.03.InnoDB 物理文件格式与数据恢复
MySQL源码分析.03.InnoDB 物理文件格式与数据恢复MySQL源码分析.03.InnoDB 物理文件格式与数据恢复
MySQL源码分析.03.InnoDB 物理文件格式与数据恢复
 
对MySQL的一些改进想法和实现
对MySQL的一些改进想法和实现对MySQL的一些改进想法和实现
对MySQL的一些改进想法和实现
 
Time Machine
Time MachineTime Machine
Time Machine
 
对MySQL应用的一些总结
对MySQL应用的一些总结对MySQL应用的一些总结
对MySQL应用的一些总结
 
MySQL多机房容灾设计(with Multi-Master)
MySQL多机房容灾设计(with Multi-Master)MySQL多机房容灾设计(with Multi-Master)
MySQL多机房容灾设计(with Multi-Master)
 
内部MySQL培训.1.基础技能
内部MySQL培训.1.基础技能内部MySQL培训.1.基础技能
内部MySQL培训.1.基础技能
 
内部MySQL培训.2.高级应用
内部MySQL培训.2.高级应用内部MySQL培训.2.高级应用
内部MySQL培训.2.高级应用
 
Alibaba patches in MariaDB
Alibaba patches in MariaDBAlibaba patches in MariaDB
Alibaba patches in MariaDB
 
MySQL新技术探索与实践
MySQL新技术探索与实践MySQL新技术探索与实践
MySQL新技术探索与实践
 
Bender kuszmaul tutorial-xldb12
Bender kuszmaul tutorial-xldb12Bender kuszmaul tutorial-xldb12
Bender kuszmaul tutorial-xldb12
 
Web design basics 1
Web design basics 1Web design basics 1
Web design basics 1
 
Fractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeFractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to Practice
 

Similar to Performance of fractal tree databases

PhegData X - High Performance EBS
PhegData X - High Performance EBSPhegData X - High Performance EBS
PhegData X - High Performance EBSHanson Dong
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionSchubert Zhang
 
highly available distributed databases (poster)
highly available distributed databases (poster)highly available distributed databases (poster)
highly available distributed databases (poster)Rim Moussa
 
Lug best practice_hpc_workflow
Lug best practice_hpc_workflowLug best practice_hpc_workflow
Lug best practice_hpc_workflowrjmurphyslideshare
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)referenceFromDual GmbH
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_clusterPrabhat gangwar
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performanceDaum DNA
 
Presentation gordon
Presentation gordonPresentation gordon
Presentation gordonlitaognv
 
Simple regenerating codes: Network Coding for Cloud Storage
Simple regenerating codes: Network Coding for Cloud StorageSimple regenerating codes: Network Coding for Cloud Storage
Simple regenerating codes: Network Coding for Cloud StorageKevin Tong
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterMat Keep
 
Exadata 11-2-overview-v2 11
Exadata 11-2-overview-v2 11Exadata 11-2-overview-v2 11
Exadata 11-2-overview-v2 11Oracle BH
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Marco Tusa
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydbDaniel Austin
 
Prof. Uri Weiser,Technion
Prof. Uri Weiser,TechnionProf. Uri Weiser,Technion
Prof. Uri Weiser,Technionchiportal
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLDaniel Austin
 
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldOMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldAllan Cantle
 
Firebird database recovery and protection for enterprises and ISV
Firebird database recovery and protection for enterprises and ISVFirebird database recovery and protection for enterprises and ISV
Firebird database recovery and protection for enterprises and ISVMind The Firebird
 

Similar to Performance of fractal tree databases (20)

PhegData X - High Performance EBS
PhegData X - High Performance EBSPhegData X - High Performance EBS
PhegData X - High Performance EBS
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
highly available distributed databases (poster)
highly available distributed databases (poster)highly available distributed databases (poster)
highly available distributed databases (poster)
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
Lug best practice_hpc_workflow
Lug best practice_hpc_workflowLug best practice_hpc_workflow
Lug best practice_hpc_workflow
 
CLFS 2010
CLFS 2010CLFS 2010
CLFS 2010
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)reference
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
 
Presentation gordon
Presentation gordonPresentation gordon
Presentation gordon
 
Simple regenerating codes: Network Coding for Cloud Storage
Simple regenerating codes: Network Coding for Cloud StorageSimple regenerating codes: Network Coding for Cloud Storage
Simple regenerating codes: Network Coding for Cloud Storage
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
 
Exadata 11-2-overview-v2 11
Exadata 11-2-overview-v2 11Exadata 11-2-overview-v2 11
Exadata 11-2-overview-v2 11
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
Prof. Uri Weiser,Technion
Prof. Uri Weiser,TechnionProf. Uri Weiser,Technion
Prof. Uri Weiser,Technion
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQL
 
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldOMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
 
Firebird database recovery and protection for enterprises and ISV
Firebird database recovery and protection for enterprises and ISVFirebird database recovery and protection for enterprises and ISV
Firebird database recovery and protection for enterprises and ISV
 

More from Lixun Peng

Double Sync Replication
Double Sync ReplicationDouble Sync Replication
Double Sync ReplicationLixun Peng
 
内部MySQL培训.3.基本原理
内部MySQL培训.3.基本原理内部MySQL培训.3.基本原理
内部MySQL培训.3.基本原理Lixun Peng
 
对简易几何机械化证明的进一步研究
对简易几何机械化证明的进一步研究对简易几何机械化证明的进一步研究
对简易几何机械化证明的进一步研究Lixun Peng
 
A binary graphics recognition algorithm based on fitting function
A binary graphics recognition algorithm based on fitting functionA binary graphics recognition algorithm based on fitting function
A binary graphics recognition algorithm based on fitting functionLixun Peng
 
一种基于拟合函数的图形识别算法
一种基于拟合函数的图形识别算法一种基于拟合函数的图形识别算法
一种基于拟合函数的图形识别算法Lixun Peng
 
中文分词算法设计
中文分词算法设计中文分词算法设计
中文分词算法设计Lixun Peng
 
DB_Algorithm_and_Data_Structure_About_BTree
DB_Algorithm_and_Data_Structure_About_BTreeDB_Algorithm_and_Data_Structure_About_BTree
DB_Algorithm_and_Data_Structure_About_BTreeLixun Peng
 

More from Lixun Peng (7)

Double Sync Replication
Double Sync ReplicationDouble Sync Replication
Double Sync Replication
 
内部MySQL培训.3.基本原理
内部MySQL培训.3.基本原理内部MySQL培训.3.基本原理
内部MySQL培训.3.基本原理
 
对简易几何机械化证明的进一步研究
对简易几何机械化证明的进一步研究对简易几何机械化证明的进一步研究
对简易几何机械化证明的进一步研究
 
A binary graphics recognition algorithm based on fitting function
A binary graphics recognition algorithm based on fitting functionA binary graphics recognition algorithm based on fitting function
A binary graphics recognition algorithm based on fitting function
 
一种基于拟合函数的图形识别算法
一种基于拟合函数的图形识别算法一种基于拟合函数的图形识别算法
一种基于拟合函数的图形识别算法
 
中文分词算法设计
中文分词算法设计中文分词算法设计
中文分词算法设计
 
DB_Algorithm_and_Data_Structure_About_BTree
DB_Algorithm_and_Data_Structure_About_BTreeDB_Algorithm_and_Data_Structure_About_BTree
DB_Algorithm_and_Data_Structure_About_BTree
 

Recently uploaded

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 

Recently uploaded (20)

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 

Performance of fractal tree databases

  • 1. Performance of Fractal-Tree Databases Michael A. Bender
  • 2. The Problem Problem: maintain a dynamic dictionary on disk. Motivation: file systems, databases, etc. State of the art: • B-tree [Bayer, McCreight 72] • cache-oblivious B-tree [Bender, Demaine, Farach-Colton 00] • buffer tree [Arge 95] • buffered-repository tree[Buchsbaum,Goldwasser,Venkatasubramanian,Westbrook 00] • Bε tree [Brodal, Fagerberg 03] • log-structured merge tree [O'Neil, Cheng, Gawlick, O'Neil 96] • string B-tree [Ferragina, Grossi 99] • etc, etc! State of the practice: • B-trees + industrial-strength features Michael Bender -- Performance of Fractal-Tree Databases 2
  • 3. The Problem Problem: maintain a dynamic dictionary on disk. Motivation: file systems, databases, etc. State of the art (algorithmic perspective): • B-tree [Bayer, McCreight 72] • cache-oblivious B-tree [Bender, Demaine, Farach-Colton 00] • buffer tree [Arge 95] • buffered-repository tree[Buchsbaum,Goldwasser,Venkatasubramanian,Westbrook 00] • Bε tree [Brodal, Fagerberg 03] • log-structured merge tree [O'Neil, Cheng, Gawlick, O'Neil 96] • string B-tree [Ferragina, Grossi 99] • etc, etc! State of the practice: • B-trees + industrial-strength features Michael Bender -- Performance of Fractal-Tree Databases 3
  • 4. The Problem Problem: maintain a dynamic dictionary on disk. Motivation: file systems, databases, etc. State of the art (algorithmic perspective): • B-tree [Bayer, McCreight 72] • cache-oblivious B-tree [Bender, Demaine, Farach-Colton 00] • buffer tree [Arge 95] • buffered-repository tree[Buchsbaum,Goldwasser,Venkatasubramanian,Westbrook 00] • Bε tree [Brodal, Fagerberg 03] • log-structured merge tree [O'Neil, Cheng, Gawlick, O'Neil 96] • string B-tree [Ferragina, Grossi 99] • etc, etc! State of the practice: • B-trees + industrial-strength features/optimizations Michael Bender -- Performance of Fractal-Tree Databases 4
  • 5. B-trees are Fast at Sequential Inserts Sequential inserts in B-trees have near-optimal data locality Michael Bender -- Performance of Fractal-Tree Databases 5
  • 6. B-trees are Fast at Sequential Inserts Sequential inserts in B-trees have near-optimal data locality These B-tree nodes reside Insertions are into in memory this leaf node • One disk I/O per leaf (which contains many inserts). • Sequential disk I/O. • Performance is disk-bandwidth limited. Michael Bender -- Performance of Fractal-Tree Databases 6
  • 7. B-Trees Are Slow at Ad Hoc Inserts High entropy inserts (e.g., random) in B-trees have poor data locality These B-tree nodes reside in memory • Most nodes are not in main memory. • Most insertions require a random disk I/O. • Performance is disk-seek limited. • ≤ 100 inserts/sec/disk (≤ 0.05% of disk bandwidth). Michael Bender -- Performance of Fractal-Tree Databases 7
  • 8. B-trees Have a Similar Story for Range Queries Leaf nodes are scattered across disk in aged B-tree. Range queries in newly built B-trees have good locality Range queries in aged B-trees have poor locality • Leaf blocks are scattered across disk. • For page-sized nodes, as low as 1% disk bandwidth. Michael Bender -- Performance of Fractal-Tree Databases 8
  • 9. B-trees Have a Similar Story for Range Queries Leaf nodes are scattered across disk in aged B-tree. Range queries in newly built B-trees have good locality Range queries in aged B-trees have poor locality • Leaf blocks are scattered across disk. • For page-sized nodes, as low as 1% disk bandwidth. Michael Bender -- Performance of Fractal-Tree Databases 9
  • 10. Results Cache-Oblivious Streaming B-tree [Bender, Farach- Colton, Fineman, Fogel, Kuszmaul, Nelson 07] • Replacement for Traditional B-tree • High entropy inserts/deletes run up to 100x faster • No aging --> always fast range queries • Streaming B-tree is cache-oblivious ‣ Good data locality without memory-specific parameterization. Michael Bender -- Performance of Fractal-Tree Databases 10
  • 11. Results (cont) Fractal TreeTM database Application Layer • TokuDB is a storage engine for MySQL ‣ A storage engine is a structure that stores on-disk data. Database ‣ Traditionally a storage engine is a B-tree. SQL Processing, • MySQL is an open-source database Query Optimization… ‣ Most installations of any database • Built in context of our startup Tokutek. TokuDB for MySQL Performance File System • 10x-100x faster index inserts • No aging • Faster queries in important cases Michael Bender -- Performance of Fractal-Tree Databases 11
  • 12. Creative Fundraising for Startup Michael Bender -- Performance of Fractal-Tree Databases 12
  • 13. Algorithmic Performance Model Minimize # of block transfers per operation Disk-Access Machine (DAM) [Aggrawal, Vitter 88] Memory • Two-levels of memory. • Two parameters: block-size B, memory-size M. B B Cache-Oblivious Model (CO) [Frigo, Leiserson, Prokop, Ramachandran 99] • Parameters B and M are unknown to the algorithm or coder. Disk • (Of course, used in proofs.) Michael Bender -- Performance of Fractal-Tree Databases
  • 14. Algorithmic Performance Model Minimize # of block transfers per operation Disk-Access Machine (DAM) [Aggrawal, Vitter 88] Memory • Two-levels of memory. • Two parameters: block-size B, memory-size M. B =? B =? Cache-Oblivious Model (CO) [Frigo, Leiserson, Prokop, Ramachandran 99] • Parameters B and M are unknown to the algorithm or coder. Disk • (Of course, used in proofs.) Michael Bender -- Performance of Fractal-Tree Databases
  • 15. Fractal Tree Inserts (and Deletes) B-tree Streaming B-tree logN logN Insert O(logBN)=O( logB ) O( B ) Example: N=1 billion, B=4096 • 1 billion 128-byte rows (128 gigabytes) ‣ log2 (1 billion) = 30 • Half-megabyte blocks that hold 4096 rows each ‣ log2 (4096) = 12 • B-trees require logN = 30/12 = 3 disk seeks (modulo swapping, logB but at least 1* • Streaming B-trees require logN = 30/4096 = 0.007 disk seeks B Michael Bender -- Performance of Fractal-Tree Databases 15
  • 16. Fractal Tree Inserts (and Deletes) B-tree Streaming B-tree logN logN Insert O(logBN)=O( logB ) O( B ) Example: N=1 billion, B=4096 • 1 billion 128-byte rows (128 gigabytes) ‣ log2 (1 billion) = 30 • Half-megabyte blocks that hold 4096 rows each ‣ log2 (4096) = 12 • B-trees require logN = 30/12 = 3 disk seeks (modulo caching, logB insertion pattern) • Streaming B-trees require logN = 30/4096 = 0.007 disk seeks B Michael Bender -- Performance of Fractal-Tree Databases 16
  • 17. Inserts into Prototype Fractal Tree Random Inserts into Fractal Tree (“streaming B- tree”) and B-tree (Berkeley DB) Fractal Tree B- Tree Michael Bender -- Performance of Fractal-Tree Databases 17
  • 18. Searches in Prototype Fractal Tree Point searches ~3.5x slower (N=230) • Searches/sec improves as more of data structure fits in cache) Fractal Tree S B-tree B- Tree Michael Bender -- Performance of Fractal-Tree Databases 18
  • 19. Asymmetry Between Inserts and Key Searches Small specification changes affect complexity E.g., duplicate keys • Slow: Return an error when a duplicate key is inserted ‣ Hidden search • Fast: Overwrite duplicates or maintain all versions ‣ No hidden search E.g. deletes • Return number elements deleted is slow ‣ Hidden search • Delete without feedback is fast ‣ No hidden search 19
  • 20. Asymmetry Between Inserts and Key Searches Small specification changes affect complexity E.g., duplicate keys • Slow: Return an error when a duplicate key is inserted ‣ Hidden search • Fast: Overwrite duplicates or maintain all versions ‣ No hidden search E.g. deletes • Slow: Return number of elements deleted ‣ Hidden search • Fast: Delete without feedback ‣ No hidden search 20
  • 21. Asymmetry Between Inserts and Key Searches Small specification changes affect complexity E.g., duplicate keys • Slow: Return an error when a duplicate key is inserted ‣ Hidden search • Fast: Overwrite duplicates or maintain all versions ‣ No hidden search E.g. deletes • Slow: Return number of elements deleted ‣ Hidden search • Fast: Delete without feedback ‣ No hidden search Next slide: extra difficulty of key searches 21
  • 22. Extra Difficulty of Key Searches
  • 23. Asymmetry Between Inserts and Key Searches Inserts/point query asymmetry has impact on • System design. How to redesign standard mechanisms (e.g., concurrency-control mechanism). • System use. How to take advantage of faster inserts (e.g., to enable faster queries). Michael Bender -- Performance of Fractal-Tree Databases 23
  • 24. 24
  • 25. Overview External-memory dictionaries Performance limitations of B-trees Fractal-Tree data structure (Streaming B-tree) Search/point-query asymmetry Impact of search/point-query asymmetry on database use How to build a streaming B-tree Impact of search/point-query asymmetry on system design Scaling into the future Michael Bender -- Performance of Fractal-Tree Databases 25
  • 27. How B-trees Are Used in Databases Select via Index Select via Table Scan select d where 270 ≤ a ≤ 538 select d where 270 ≤ e ≤ 538 key value key value a b c d e a b c d e Data maintained in rows and stored in B-trees.
  • 28. How B-trees Are Used in Databases Select via Index Select via Table Scan select d where 270 ≤ a ≤ 538 select d where 270 ≤ e ≤ 538 key value key value a b c d e a b c d e Data maintained in rows and stored in B-trees.
  • 29. How B-trees Are Used in Databases (Cont.) Selecting via an index can be slow, if it is coupled with point queries. select d where 270 ≤ b ≤ 538 main table index key value key value key value a b c d e b a c a Michael Bender -- Performance of Fractal-Tree Databases 29
  • 30. How B-trees Are Used in Databases (Cont.) Covering index can speed up selects • Key contains all columns necessary to answer query. select d where 270 ≤ b ≤ 538 main table covering index key value key value key value a b c d e bd a c a Michael Bender -- Performance of Fractal-Tree Databases 30
  • 31. Insertion Pain Can Masquerade as Query Pain People often don’t use these indexes. They use simplistic schema. • Sequential inserts via autoincrement key • Few indexes, few covering indexes key value Autoincrement key (effectively a timestanp) t a b c d e Then insertions are fast but queries are slow. Michael Bender -- Performance of Fractal-Tree Databases 31
  • 32. Insertion Pain Can Masquerade as Query Pain People often don’t use these indexes. They use simplistic schema. • Sequential inserts via autoincrement key • Few indexes, few covering indexes key value Autoincrement key (effectively a timestanp) t a b c d e Then insertions are fast but queries are slow. Adding sophisticated indexes helps queries • B-trees cannot afford to maintain them. Fractal Trees can. Michael Bender -- Performance of Fractal-Tree Databases 32
  • 33. How to Build a Fractal Tree and How it Performs
  • 34. Simplified (Cache-Oblivious) Fractal Tree 20 21 22 23 O((logN)/B) insert cost & O(log2N) search cost • Sorted arrays of exponentially increasing size. • Arrays are completely full or completely empty (depends on the bit representation of # of elmts). • Insert into the smallest array. Merge arrays to make room. Michael Bender -- Performance of Fractal-Tree Databases 34
  • 35. Simplified (Cache-Oblivious) Fractal Tree (Cont.) Michael Bender -- Performance of Fractal-Tree Databases 35
  • 36. Analysis of Simplified Fractal Tree Insert Cost: • cost to flush buffer of size X = O(X/B) • cost per element to flush buffer = O(1/B) • max # of times each element is flushed = log N • insert cost = O((log N))/B) amortized memory transfers Search Cost • Binary search at each level • log(N/B) + log(N/B) - 1 + log(N/B) - 2 + ... + 2 + 1 = O(log2(N/B)) Michael Bender -- Performance of Fractal-Tree Databases 36
  • 37. Idea of Faster Key Searches in Fractal Tree O(log (N/B)) search cost • Some redundancy of elements between levels • Arrays can be partially full • Horizontal and vertical pointers to redundant elements • (Fractional Cascading) 37
  • 38. Why The Previous Data Structure is a Simplification • Need concurrency-control mechanisms • Need crash safety • Need transactions, logging+recovery • Need better search cost • Need to store variable-size elements • Need better amortization • Need to be good for random and sequential inserts • Need to support multithreading. • Need compression Michael Bender -- Performance of Fractal-Tree Databases 38
  • 39. iiBench Insertion Benchmark iiBench - 1B Row Insert Test! 50,000! 45,000! 40,000! 35,000! Rows/Second! 30,000! 25,000! InnoDB! 20,000! TokuDB! 15,000! 10,000! 5,000! 0! 0! 200,000,000! 400,000,000! 600,000,000! 800,000,000! 1,000,000,000! Rows Inserted! Fractal Trees scale with disk bandwidth not seek time. • In fact, now we are compute bound, so cannot yet take full advantage of more cores or disks. (This will change.) 39
  • 40. iiBench Deletions iiBench - 500M Row Insert/Delete Test! 40,000! 35,000! Insertions only here Insertions + deletions here 30,000! 25,000! Rows/Second! 20,000! TokuDB! 15,000! InnoDB! 10,000! 5,000! 0! 0! 100,000,000! 200,000,000! 300,000,000! 400,000,000! 500,000,000! Rows Inserted! 40
  • 41. Search/point query asymmetry when building Fractal-Tree Database 41
  • 42. Building TokuDB Storage Engine for MySQL Engineering to do list • Need concurrency-control mechanisms • Need crash safety • Need transactions, logging+recovery • Need better search cost • Need to store variable-size elements • Need better amortization • Need to be good for random and sequential inserts • Need to support multithreading. • Need compression Michael Bender -- Performance of Fractal-Tree Databases 42
  • 43. Building TokuDB Storage Engine for MySQL Engineering to do list • Need concurrency-control mechanisms • Need crash safety • Need transactions, logging+recovery • Need better search cost • Need to store variable-size elements • Need better amortization • Need to be good for random and sequential inserts • Need to support multithreading. • Need compression Michael Bender -- Performance of Fractal-Tree Databases 43
  • 44. Concurrency Control for Transactions A B E D C A D B E C Transactions • Sequence of durable operations. • Happen atomically. Atomicity in TokuDB via pessimistic locking • readers lock: A and B can both read row x of database. • writers lock: if A writes to row x, B cannot read x until A completes. Michael Bender -- Performance of Fractal-Tree Databases 44
  • 45. Concurrency Control for Transactions (cont) B-tree implementation: maintain locks in leaves • Insert row t • Search for row u • Search for row v and put a cursor • Increment cursor. Now cursor points to row w. t u v w writer lock reader lock reader range lock Doesn’t work for Fractal Trees: maintaining locks involves implicit searches on writes. Michael Bender -- Performance of Fractal-Tree Databases 45
  • 46. Scaling Fractal Trees into the Future
  • 47. iiBench on SSD 35000 30000 25000 Insertion Rate TokuDB 20000 FusionIO X25E RAID10 15000 10000 5000 InnoDB FusionIO X25-E RAID10 0 0 5e+07 1e+08 1.5e+08 Cummulative Insertions B-trees are slow on SSDs, probably b/c they waste bandwidth. • When inserting one row, a whole block (much larger) is written. 47
  • 48. B-tree Inserts Are Slow on SSDs Inserting an element of size x into a B-tree dirties a leaf block of size B. x B We can write keys of size x into a B-tree using at most a O(x/B) fraction of disk bandwidth. Fractal trees do efficient inserts into SSDs because they transform random I/O into sequential I/O. 48
  • 49. B-tree Inserts Are Slow on SSDs Inserting an element of size x into a B-tree dirties a leaf block of size B. x B We can write keys of size x into a B-tree using at most a O(x/B) fraction of disk bandwidth. Fractal trees do efficient inserts on SSDs because they transform random I/O into sequential I/O. 49
  • 50. Disk Hardware Trends Disk capacity will continue to grow quickly Year Capacity Bandwidth 2008 2 TB 100MB/s 2012 4.5 TB 150MB/s 2017 67 TB 500MB/s but seek times will change slowly. • Bandwidth scales as square root of capacity. Source: http://blocksandfiles.com/article/4501 50
  • 51. Fractal Trees Enable Compact Systems B-trees require capacity, bandwidth, and random I/O • B-tree based systems achieve large random I/O rates by using more spindles and lower capacity disks. Fractal Trees require only capacity & bandwidth • Fractal Trees enable the use of high-capacity disks. 51
  • 52. Fractal Trees Enable Big Disks B-trees require capacity, bandwidth, and seeks. Fractal trees require only capacity and bandwidth. Today, for a 50TB database, • Fractal tree with 25 2TB disks gives 500K ins/s. • B-tree with 25 2TB disks gives 2.5K ins/s. • B-tree with 500 100GB disks gives 50K ins/s but costs $, racks, and power. In 2017, for a 1500TB database: • Fractal tree with 25 67TB disks gives 2500K ins/s. • B-tree with 25 67TB disks gives 2.5K ins/s. B-trees need spindles, and spindle density increases slowly. 52
  • 53. Michael Bender Using Big Disks Also Saves Energy Power consumption of disks • Enterprise 80 to 160 GB disk runs at 4W (idle power). • Enterprise 1-2 TB disk runs at 8W (idle power). Data centers/server farms use 80-160 GB disks • Use many small-capacity disks, not large ones. Using large disks may save factor >10 in Storage Costs • Other considerations modify this factor ‣ e.g., CPUs necessary to drive disks, scale-out infrastructure, cooling, etc. ‣ Metric: e.g., Watts/MB versus Inserts/Joule Michael Bender -- Performance of Fractal-Tree Databases 2 53