SlideShare a Scribd company logo
1 of 26
Efficient Pagination Using MySQL
                   Surat Singh Bhati (surat@yahoo-inc.com)
                    Rick James (rjames@yahoo-inc.com)


                               Yahoo Inc


Percona Performance Conference 2009
Outline

  1. Overview
     –    Common pagination UI pattern
     – Sample table and typical solution using OFFSET
     – Techniques to avoid large OFFSET
     – Performance comparison
     – Concerns




                                         -2-
Common Patterns




                  -3-
Basics


    First step toward having efficient pagination over large data set

     – Use index to filter rows (resolve WHERE)
     – Use same index to return rows in sorted order (resolve ORDER)




    Step zero
     – http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html
     – http://dev.mysql.com/doc/refman/5.1/en/order-by-optimization.html
     – http://dev.mysql.com/doc/refman/5.1/en/limit-optimization.html




                                        -4-
Using Index


  KEY a_b_c (a, b, c)

  ORDER may get resolved using Index
       –   ORDER BY a
       –   ORDER BY a,b
       –   ORDER BY a, b, c
       –   ORDER BY a DESC, b DESC, c DESC


  WHERE and ORDER both resolved using index:
       –   WHERE a = const ORDER BY b, c
       –   WHERE a = const AND b = const ORDER BY c
       –   WHERE a = const     ORDER BY b, c
       –   WHERE a = const AND b > const ORDER BY b, c

  ORDER will not get resolved uisng index (file sort)
       –   ORDER BY a ASC, b DESC, c DESC /* mixed sort direction */
       –   WHERE g = const ORDER BY b, c        /* a prefix is missing */
       –   WHERE a = const ORDER BY c           /* b is missing */
       –   WHERE a = const ORDER BY a, d        /* d is not part of index */
                                               -5-
Sample Schema
  CREATE TABLE `message` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
    `user_id` int(11) NOT NULL,
    `content` text COLLATE utf8_unicode_ci NOT NULL,
    `create_time` int(11) NOT NULL,
    `thumbs_up` int(11) NOT NULL DEFAULT '0', /* Vote Count */
    PRIMARY KEY (`id`),
    KEY `thumbs_up_key` (`thumbs_up`,`id`)
  ) ENGINE=InnoDB

   mysql> show table status like 'message' G
           Engine: InnoDB
          Version: 10
       Row_format: Compact
             Rows: 50000040    /* 50 Million */
   Avg_row_length: 565
      Data_length: 28273803264 /* 26 GB */
     Index_length: 789577728   /* 753 MB */
        Data_free: 6291456
      Create_time: 2009-04-20 13:30:45

  Two use case:
  • Paginate by time, recent message one page one
  • Paginate by thumps_up, largest value on page one
                                          -6-
Typical Query

  1. Get the total records
     SELECT count(*) FROM message


  2. Get current page
     SELECT * FROM message
     ORDER BY id DESC LIMIT 0, 20


  •   http://domain.com/message?page=1
              • ORDER BY id DESC LIMIT 0,                 20

  •   http://domain.com/message?page=2
              • ORDER BY id DESC LIMIT 20,                 20

  •   http://domain.com/message?page=3
              • ORDER BY id DESC LIMIT 40,                 20


  Note: id is auto_increment, same as create_time order, no need to create index on create_time, save space



        –                                           -7-
Explain


  mysql> explain SELECT * FROM message
         ORDER BY id DESC
         LIMIT 10000, 20G
  ***************** 1. row **************
             id: 1
    select_type: SIMPLE
          table: message
           type: index
  possible_keys: NULL
            key: PRIMARY
        key_len: 4
            ref: NULL
           rows: 10020
          Extra:
  1 row in set (0.00 sec)
     – it can read rows using index scan and execution will stop as soon as it finds
       required rows.
     – LIMIT 10000, 20 means it has to read 10020 and throw away 10000 rows, then
       return next 20 rows.


                                          -8-
Performance Implications

     – Larger OFFSET is going to increase active data set, MySQL has to bring data
       in memory that is never returned to caller.
     – Performance issue is more visible when your have database that can't fit in
       main memory.
     – Small percentage of request with large OFFSET would be able to hit disk I/O
       Disk I/O bottleneck
     – In order to display “21 to 40 of 1000,000” , some one has to count 1000,000
       rows.




                                         -9-
Simple Solution

     – Do not display total records, does user really care?



     – Do not let user go to deep pages, redirect him
       http://en.wikipedia.org/wiki/Internet_addiction_disorder after certain number of
       pages




                                          - 10 -
Avoid Count(*)


  1. Never display total messages, let user see more message by clicking
     'next'



  2. Do not count on every request, cache it, display stale count, user do not
     care about 324533 v/s 324633


  3. Display 41 to 80 of Thousands


  4. Use pre calculated count, increment/decrement value as insert/delete
     happens.




                                       - 11 -
Solution to avoid offset

  1. Change User Interface
      – No direct jumps to Nth page



  2. LIMIT N is fine, Do not use LIMIT M,N
      – Provide extra clue about from where to start given page
      – Find the desired records using more restricted WHERE using given clue and
        ORDER BY and LIMIT N without OFFSET)




                                         - 12 -
Find the clue

   150
   111
   102                       Page One
   101
   100   <a href=”/page=2;last_seen=100;dir=next>Next</a>

   98    <a href=”/page=1;last_seen=98;dir=prev>Prev</a>
   97
   96                        Page Two
   95
   94    <a href=”/page=3;last_seen=94;dir=next>Next</a>

   93    <a href=”/page=3;last_seen=93;dir=prev>Prev</a>
   92
   91                         Page Three
   90
   89    <a href=”/page=4;last_seen=89;dir=prev>Next</a>




                                   - 13 -
Solution using clue

  Next Page:
  http://domain.com/forum?page=2&last_seen=100&dir=next

           WHERE id < 100 /* last_seen *
           ORDER BY id DESC LIMIT $page_size /* No OFFSET*/


  Prev Page:
  http://domain.com/forum?page=1&last_seen=98&dir=prev

           WHERE id > 98 /* last_seen *
           ORDER BY id ASC LIMIT $page_size /* No OFFSET*/


           Reverse given 10 rows before sending to user


                                 - 14 -
Explain

  mysql> explain
         SELECT * FROM message
         WHERE id < '49999961'
         ORDER BY id DESC LIMIT 20 G
  *************************** 1. row ***************************
             id: 1
    select_type: SIMPLE
          table: message
           type: range
  possible_keys: PRIMARY
            key: PRIMARY
        key_len: 4
            ref: NULL
           Rows: 25000020 /* ignore this */
          Extra: Using where
  1 row in set (0.00 sec)




                                   - 15 -
What about order by non unique values?


                         99
                         99
                         98    Page One
                         98
                         98
                         98
                         98
                         97    Page Two
                         97
                         10
  We can't do:
            WHERE thumbs_up < 98
            ORDER BY thumbs_up DESC /* It will return few seen rows */

  Can we say this:
            WHERE thumbs_up <= 98
            AND <extra_con>
            ORDER BY thumbs_up DESC

                                      - 16 -
Add more condition

  •   Consider thumbs_up as major number
       – if we have additional minor number, we can use combination of major & minor
         as extra condition




  •   Find additional column (minor number)
       – we can use id primary key as minor number




                                          - 17 -
Solution
                                     First Page
  SELECT thumbs_up, id
  FROM message
  ORDER BY thumbs_up DESC, id DESC
  LIMIT $page_size

  +-----------+----+
  | thumbs_up | id |
  +-----------+----+
  |        99 | 14 |
  |        99 | 2 |
  |        98 | 18 |
  |        98 | 15 |
  |        98 | 13 |
  +-----------+----+
                                     Next Page
  SELECT thumbs_up, id
  FROM message
  WHERE thumbs_up <= 98 AND (id < 13 OR thumbs_up < 98)
  ORDER BY thumbs_up DESC, id DESC
  LIMIT $page_size

  +-----------+----+
  | thumbs_up | id |
  +-----------+----+
  |        98 | 10 |
  |        98 | 6 |
  |        97 | 17 |
                                        - 18 -
Make it better..
   Query:


   SELECT * FROM message
   WHERE thumbs_up <= 98
            AND (id < 13 OR thumbs_up < 98)
   ORDER BY thumbs_up DESC, id DESC
   LIMIT 20


   Can be written as:


   SELECT m2.* FROM message m1, message m2
   WHERE m1.id = m2.id
            AND m1.thumbs_up <= 98
            AND (m1.id < 13 OR m1.thumbs_up < 98)
   ORDER BY m1.thumbs_up DESC, m1.id DESC
   LIMIT 20;

                                     - 19 -
Explain

  *************************** 1. row ***************************
             id: 1
    select_type: SIMPLE
          table: m1
           type: range
  possible_keys: PRIMARY,thumbs_up_key
            key: thumbs_up_key /* (thumbs_up,id) */
        key_len: 4
            ref: NULL
           Rows: 25000020 /*ignore this, we will read just 20 rows*/
          Extra: Using where; Using index /* Cover */
  *************************** 2. row ***************************
             id: 1
    select_type: SIMPLE
          table: m2
           type: eq_ref
  possible_keys: PRIMARY
            key: PRIMARY
        key_len: 4
            ref: forum.m1.id
           rows: 1
          Extra:

                                   - 20 -
Performance Gain (Primary Key Order)




                       - 21 -
Performance Gain (Secondary Key Order)




                      - 22 -
Throughput Gain


 •   Throughput Gain while hitting first 30 pages:

      – Using LIMIT OFFSET, N
          • 600 query/sec




      – Using LIMIT N (no OFFSET)
          • 3.7k query/sec




                                        - 23 -
Bonus Point

  Product issue with LIMIT M, N


     User is reading a page, in the mean time some records may be added to
     previous page.


     Due to insert/delete pages records are going to move forward/backward
     as rolling window:
      – User is reading messages on 4th page
      – While he was reading, one new message posted (it would be there on page
        one), all pages are going to move one message to next page.
      – User Clicks on Page 5
      – One message from page got pushed forward on page 5, user has to read it
        again


  No such issue with news approach

                                        - 24 -
Drawback

  Search Engine Optimization Expert says:
  Let bot reach all you pages with fewer number of deep dive


  Two Solutions:
  •   Read extra rows
       – Read extra rows in advance and construct links for few previous & next pages


  •   Use small offset
       – Do not read extra rows in advance, just add links for few past & next pages
         with required offset & last_seen_id on current page
       – Do query using new approach with small offset to display desired page
       –
                             file:///Users/surat/Desktop/Picture%2043.png




  Additional concern: Dynamic urls, last_seen is not constant over time.
                                                                            - 25 -
Thanks




  - 26 -

More Related Content

What's hot

My SQL Idiosyncrasies That Bite OTN
My SQL Idiosyncrasies That Bite OTNMy SQL Idiosyncrasies That Bite OTN
My SQL Idiosyncrasies That Bite OTNRonald Bradford
 
12c Mini Lesson - Invisible Columns
12c Mini Lesson - Invisible Columns12c Mini Lesson - Invisible Columns
12c Mini Lesson - Invisible ColumnsConnor McDonald
 
Optimizer in oracle 11g by wwf from ebay COC
Optimizer in oracle 11g by wwf from ebay COCOptimizer in oracle 11g by wwf from ebay COC
Optimizer in oracle 11g by wwf from ebay COCLouis liu
 
Lecture2 mysql by okello erick
Lecture2 mysql by okello erickLecture2 mysql by okello erick
Lecture2 mysql by okello erickokelloerick
 
TDC2017 | São Paulo - Trilha Programação Funcional How we figured out we had ...
TDC2017 | São Paulo - Trilha Programação Funcional How we figured out we had ...TDC2017 | São Paulo - Trilha Programação Funcional How we figured out we had ...
TDC2017 | São Paulo - Trilha Programação Funcional How we figured out we had ...tdc-globalcode
 
How MySQL can boost (or kill) your application
How MySQL can boost (or kill) your applicationHow MySQL can boost (or kill) your application
How MySQL can boost (or kill) your applicationFederico Razzoli
 
MySQL Query Optimisation 101
MySQL Query Optimisation 101MySQL Query Optimisation 101
MySQL Query Optimisation 101Federico Razzoli
 
Basic operations by novi reandy sasmita
Basic operations by novi reandy sasmitaBasic operations by novi reandy sasmita
Basic operations by novi reandy sasmitabeasiswa
 
MySQL partitions tutorial
MySQL partitions tutorialMySQL partitions tutorial
MySQL partitions tutorialGiuseppe Maxia
 
Boost performance with MySQL 5.1 partitions
Boost performance with MySQL 5.1 partitionsBoost performance with MySQL 5.1 partitions
Boost performance with MySQL 5.1 partitionsGiuseppe Maxia
 
Python data structures
Python data structuresPython data structures
Python data structuresHarry Potter
 

What's hot (14)

My SQL Idiosyncrasies That Bite OTN
My SQL Idiosyncrasies That Bite OTNMy SQL Idiosyncrasies That Bite OTN
My SQL Idiosyncrasies That Bite OTN
 
Writeable CTEs: The Next Big Thing
Writeable CTEs: The Next Big ThingWriteable CTEs: The Next Big Thing
Writeable CTEs: The Next Big Thing
 
MySQL constraints
MySQL constraintsMySQL constraints
MySQL constraints
 
mysqlHiep.ppt
mysqlHiep.pptmysqlHiep.ppt
mysqlHiep.ppt
 
12c Mini Lesson - Invisible Columns
12c Mini Lesson - Invisible Columns12c Mini Lesson - Invisible Columns
12c Mini Lesson - Invisible Columns
 
Optimizer in oracle 11g by wwf from ebay COC
Optimizer in oracle 11g by wwf from ebay COCOptimizer in oracle 11g by wwf from ebay COC
Optimizer in oracle 11g by wwf from ebay COC
 
Lecture2 mysql by okello erick
Lecture2 mysql by okello erickLecture2 mysql by okello erick
Lecture2 mysql by okello erick
 
TDC2017 | São Paulo - Trilha Programação Funcional How we figured out we had ...
TDC2017 | São Paulo - Trilha Programação Funcional How we figured out we had ...TDC2017 | São Paulo - Trilha Programação Funcional How we figured out we had ...
TDC2017 | São Paulo - Trilha Programação Funcional How we figured out we had ...
 
How MySQL can boost (or kill) your application
How MySQL can boost (or kill) your applicationHow MySQL can boost (or kill) your application
How MySQL can boost (or kill) your application
 
MySQL Query Optimisation 101
MySQL Query Optimisation 101MySQL Query Optimisation 101
MySQL Query Optimisation 101
 
Basic operations by novi reandy sasmita
Basic operations by novi reandy sasmitaBasic operations by novi reandy sasmita
Basic operations by novi reandy sasmita
 
MySQL partitions tutorial
MySQL partitions tutorialMySQL partitions tutorial
MySQL partitions tutorial
 
Boost performance with MySQL 5.1 partitions
Boost performance with MySQL 5.1 partitionsBoost performance with MySQL 5.1 partitions
Boost performance with MySQL 5.1 partitions
 
Python data structures
Python data structuresPython data structures
Python data structures
 

Viewers also liked

Rethinkdb and tokudb research
Rethinkdb and tokudb research Rethinkdb and tokudb research
Rethinkdb and tokudb research mysqlops
 
MongoDB使用技巧
MongoDB使用技巧MongoDB使用技巧
MongoDB使用技巧mysqlops
 
我对后端优化的一点想法
我对后端优化的一点想法我对后端优化的一点想法
我对后端优化的一点想法mysqlops
 
Event proxy introduction
Event proxy introductionEvent proxy introduction
Event proxy introductionmysqlops
 
Couchbase b jmeetup
Couchbase b jmeetupCouchbase b jmeetup
Couchbase b jmeetupmysqlops
 

Viewers also liked (6)

Rethinkdb and tokudb research
Rethinkdb and tokudb research Rethinkdb and tokudb research
Rethinkdb and tokudb research
 
MongoDB使用技巧
MongoDB使用技巧MongoDB使用技巧
MongoDB使用技巧
 
我对后端优化的一点想法
我对后端优化的一点想法我对后端优化的一点想法
我对后端优化的一点想法
 
Event proxy introduction
Event proxy introductionEvent proxy introduction
Event proxy introduction
 
LVS
LVSLVS
LVS
 
Couchbase b jmeetup
Couchbase b jmeetupCouchbase b jmeetup
Couchbase b jmeetup
 

Similar to PPC2009_yahoo_mysql_pagination

Efficient Pagination Using MySQL
Efficient Pagination Using MySQLEfficient Pagination Using MySQL
Efficient Pagination Using MySQLSurat Singh Bhati
 
Covering indexes
Covering indexesCovering indexes
Covering indexesMYXPLAIN
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query TuningSveta Smirnova
 
Advanced MySQL Query Tuning
Advanced MySQL Query TuningAdvanced MySQL Query Tuning
Advanced MySQL Query TuningAlexander Rubin
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceAltinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxAltinity Ltd
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuRedis Labs
 
What's New In MySQL 5.6
What's New In MySQL 5.6What's New In MySQL 5.6
What's New In MySQL 5.6Abdul Manaf
 
Database Oracle Basic
Database Oracle BasicDatabase Oracle Basic
Database Oracle BasicKamlesh Singh
 
MySQL Query tuning 101
MySQL Query tuning 101MySQL Query tuning 101
MySQL Query tuning 101Sveta Smirnova
 
Troubleshooting MySQL Performance
Troubleshooting MySQL PerformanceTroubleshooting MySQL Performance
Troubleshooting MySQL PerformanceSveta Smirnova
 
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLTen Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLanandology
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007paulguerin
 

Similar to PPC2009_yahoo_mysql_pagination (20)

Efficient Pagination Using MySQL
Efficient Pagination Using MySQLEfficient Pagination Using MySQL
Efficient Pagination Using MySQL
 
Covering indexes
Covering indexesCovering indexes
Covering indexes
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query Tuning
 
Optimizando MySQL
Optimizando MySQLOptimizando MySQL
Optimizando MySQL
 
Oracle
OracleOracle
Oracle
 
Advanced MySQL Query Tuning
Advanced MySQL Query TuningAdvanced MySQL Query Tuning
Advanced MySQL Query Tuning
 
Raj mysql
Raj mysqlRaj mysql
Raj mysql
 
PHP tips by a MYSQL DBA
PHP tips by a MYSQL DBAPHP tips by a MYSQL DBA
PHP tips by a MYSQL DBA
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
 
What's New In MySQL 5.6
What's New In MySQL 5.6What's New In MySQL 5.6
What's New In MySQL 5.6
 
SQL (1).pptx
SQL (1).pptxSQL (1).pptx
SQL (1).pptx
 
Database Oracle Basic
Database Oracle BasicDatabase Oracle Basic
Database Oracle Basic
 
MySQL Query tuning 101
MySQL Query tuning 101MySQL Query tuning 101
MySQL Query tuning 101
 
Troubleshooting MySQL Performance
Troubleshooting MySQL PerformanceTroubleshooting MySQL Performance
Troubleshooting MySQL Performance
 
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLTen Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
 
MySQLinsanity
MySQLinsanityMySQLinsanity
MySQLinsanity
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
 

More from mysqlops

The simplethebeautiful
The simplethebeautifulThe simplethebeautiful
The simplethebeautifulmysqlops
 
Oracle数据库分析函数详解
Oracle数据库分析函数详解Oracle数据库分析函数详解
Oracle数据库分析函数详解mysqlops
 
Percona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-managementPercona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-managementmysqlops
 
Percona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationPercona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationmysqlops
 
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB ClusterPercona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB Clustermysqlops
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationmysqlops
 
Pldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internalsPldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internalsmysqlops
 
DBA新人的述职报告
DBA新人的述职报告DBA新人的述职报告
DBA新人的述职报告mysqlops
 
分布式爬虫
分布式爬虫分布式爬虫
分布式爬虫mysqlops
 
MySQL应用优化实践
MySQL应用优化实践MySQL应用优化实践
MySQL应用优化实践mysqlops
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用mysqlops
 
基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现mysqlops
 
eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析mysqlops
 
对MySQL DBA的一些思考
对MySQL DBA的一些思考对MySQL DBA的一些思考
对MySQL DBA的一些思考mysqlops
 
QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示mysqlops
 
腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事mysqlops
 
分布式存储与TDDL
分布式存储与TDDL分布式存储与TDDL
分布式存储与TDDLmysqlops
 
MySQL数据库生产环境维护
MySQL数据库生产环境维护MySQL数据库生产环境维护
MySQL数据库生产环境维护mysqlops
 

More from mysqlops (20)

The simplethebeautiful
The simplethebeautifulThe simplethebeautiful
The simplethebeautiful
 
Oracle数据库分析函数详解
Oracle数据库分析函数详解Oracle数据库分析函数详解
Oracle数据库分析函数详解
 
Percona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-managementPercona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-management
 
Percona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationPercona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replication
 
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB ClusterPercona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimization
 
Pldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internalsPldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internals
 
DBA新人的述职报告
DBA新人的述职报告DBA新人的述职报告
DBA新人的述职报告
 
分布式爬虫
分布式爬虫分布式爬虫
分布式爬虫
 
MySQL应用优化实践
MySQL应用优化实践MySQL应用优化实践
MySQL应用优化实践
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用
 
基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现
 
eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析
 
对MySQL DBA的一些思考
对MySQL DBA的一些思考对MySQL DBA的一些思考
对MySQL DBA的一些思考
 
QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示
 
腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事
 
分布式存储与TDDL
分布式存储与TDDL分布式存储与TDDL
分布式存储与TDDL
 
MySQL数据库生产环境维护
MySQL数据库生产环境维护MySQL数据库生产环境维护
MySQL数据库生产环境维护
 
Memcached
MemcachedMemcached
Memcached
 
DevOPS
DevOPSDevOPS
DevOPS
 

Recently uploaded

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

PPC2009_yahoo_mysql_pagination

  • 1. Efficient Pagination Using MySQL Surat Singh Bhati (surat@yahoo-inc.com) Rick James (rjames@yahoo-inc.com) Yahoo Inc Percona Performance Conference 2009
  • 2. Outline 1. Overview – Common pagination UI pattern – Sample table and typical solution using OFFSET – Techniques to avoid large OFFSET – Performance comparison – Concerns -2-
  • 4. Basics First step toward having efficient pagination over large data set – Use index to filter rows (resolve WHERE) – Use same index to return rows in sorted order (resolve ORDER) Step zero – http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html – http://dev.mysql.com/doc/refman/5.1/en/order-by-optimization.html – http://dev.mysql.com/doc/refman/5.1/en/limit-optimization.html -4-
  • 5. Using Index KEY a_b_c (a, b, c) ORDER may get resolved using Index – ORDER BY a – ORDER BY a,b – ORDER BY a, b, c – ORDER BY a DESC, b DESC, c DESC WHERE and ORDER both resolved using index: – WHERE a = const ORDER BY b, c – WHERE a = const AND b = const ORDER BY c – WHERE a = const ORDER BY b, c – WHERE a = const AND b > const ORDER BY b, c ORDER will not get resolved uisng index (file sort) – ORDER BY a ASC, b DESC, c DESC /* mixed sort direction */ – WHERE g = const ORDER BY b, c /* a prefix is missing */ – WHERE a = const ORDER BY c /* b is missing */ – WHERE a = const ORDER BY a, d /* d is not part of index */ -5-
  • 6. Sample Schema CREATE TABLE `message` ( `id` int(11) NOT NULL AUTO_INCREMENT, `title` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `user_id` int(11) NOT NULL, `content` text COLLATE utf8_unicode_ci NOT NULL, `create_time` int(11) NOT NULL, `thumbs_up` int(11) NOT NULL DEFAULT '0', /* Vote Count */ PRIMARY KEY (`id`), KEY `thumbs_up_key` (`thumbs_up`,`id`) ) ENGINE=InnoDB mysql> show table status like 'message' G Engine: InnoDB Version: 10 Row_format: Compact Rows: 50000040 /* 50 Million */ Avg_row_length: 565 Data_length: 28273803264 /* 26 GB */ Index_length: 789577728 /* 753 MB */ Data_free: 6291456 Create_time: 2009-04-20 13:30:45 Two use case: • Paginate by time, recent message one page one • Paginate by thumps_up, largest value on page one -6-
  • 7. Typical Query 1. Get the total records SELECT count(*) FROM message 2. Get current page SELECT * FROM message ORDER BY id DESC LIMIT 0, 20 • http://domain.com/message?page=1 • ORDER BY id DESC LIMIT 0, 20 • http://domain.com/message?page=2 • ORDER BY id DESC LIMIT 20, 20 • http://domain.com/message?page=3 • ORDER BY id DESC LIMIT 40, 20 Note: id is auto_increment, same as create_time order, no need to create index on create_time, save space – -7-
  • 8. Explain mysql> explain SELECT * FROM message ORDER BY id DESC LIMIT 10000, 20G ***************** 1. row ************** id: 1 select_type: SIMPLE table: message type: index possible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 10020 Extra: 1 row in set (0.00 sec) – it can read rows using index scan and execution will stop as soon as it finds required rows. – LIMIT 10000, 20 means it has to read 10020 and throw away 10000 rows, then return next 20 rows. -8-
  • 9. Performance Implications – Larger OFFSET is going to increase active data set, MySQL has to bring data in memory that is never returned to caller. – Performance issue is more visible when your have database that can't fit in main memory. – Small percentage of request with large OFFSET would be able to hit disk I/O Disk I/O bottleneck – In order to display “21 to 40 of 1000,000” , some one has to count 1000,000 rows. -9-
  • 10. Simple Solution – Do not display total records, does user really care? – Do not let user go to deep pages, redirect him http://en.wikipedia.org/wiki/Internet_addiction_disorder after certain number of pages - 10 -
  • 11. Avoid Count(*) 1. Never display total messages, let user see more message by clicking 'next' 2. Do not count on every request, cache it, display stale count, user do not care about 324533 v/s 324633 3. Display 41 to 80 of Thousands 4. Use pre calculated count, increment/decrement value as insert/delete happens. - 11 -
  • 12. Solution to avoid offset 1. Change User Interface – No direct jumps to Nth page 2. LIMIT N is fine, Do not use LIMIT M,N – Provide extra clue about from where to start given page – Find the desired records using more restricted WHERE using given clue and ORDER BY and LIMIT N without OFFSET) - 12 -
  • 13. Find the clue 150 111 102 Page One 101 100 <a href=”/page=2;last_seen=100;dir=next>Next</a> 98 <a href=”/page=1;last_seen=98;dir=prev>Prev</a> 97 96 Page Two 95 94 <a href=”/page=3;last_seen=94;dir=next>Next</a> 93 <a href=”/page=3;last_seen=93;dir=prev>Prev</a> 92 91 Page Three 90 89 <a href=”/page=4;last_seen=89;dir=prev>Next</a> - 13 -
  • 14. Solution using clue Next Page: http://domain.com/forum?page=2&last_seen=100&dir=next WHERE id < 100 /* last_seen * ORDER BY id DESC LIMIT $page_size /* No OFFSET*/ Prev Page: http://domain.com/forum?page=1&last_seen=98&dir=prev WHERE id > 98 /* last_seen * ORDER BY id ASC LIMIT $page_size /* No OFFSET*/ Reverse given 10 rows before sending to user - 14 -
  • 15. Explain mysql> explain SELECT * FROM message WHERE id < '49999961' ORDER BY id DESC LIMIT 20 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: message type: range possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL Rows: 25000020 /* ignore this */ Extra: Using where 1 row in set (0.00 sec) - 15 -
  • 16. What about order by non unique values? 99 99 98 Page One 98 98 98 98 97 Page Two 97 10 We can't do: WHERE thumbs_up < 98 ORDER BY thumbs_up DESC /* It will return few seen rows */ Can we say this: WHERE thumbs_up <= 98 AND <extra_con> ORDER BY thumbs_up DESC - 16 -
  • 17. Add more condition • Consider thumbs_up as major number – if we have additional minor number, we can use combination of major & minor as extra condition • Find additional column (minor number) – we can use id primary key as minor number - 17 -
  • 18. Solution First Page SELECT thumbs_up, id FROM message ORDER BY thumbs_up DESC, id DESC LIMIT $page_size +-----------+----+ | thumbs_up | id | +-----------+----+ | 99 | 14 | | 99 | 2 | | 98 | 18 | | 98 | 15 | | 98 | 13 | +-----------+----+ Next Page SELECT thumbs_up, id FROM message WHERE thumbs_up <= 98 AND (id < 13 OR thumbs_up < 98) ORDER BY thumbs_up DESC, id DESC LIMIT $page_size +-----------+----+ | thumbs_up | id | +-----------+----+ | 98 | 10 | | 98 | 6 | | 97 | 17 | - 18 -
  • 19. Make it better.. Query: SELECT * FROM message WHERE thumbs_up <= 98 AND (id < 13 OR thumbs_up < 98) ORDER BY thumbs_up DESC, id DESC LIMIT 20 Can be written as: SELECT m2.* FROM message m1, message m2 WHERE m1.id = m2.id AND m1.thumbs_up <= 98 AND (m1.id < 13 OR m1.thumbs_up < 98) ORDER BY m1.thumbs_up DESC, m1.id DESC LIMIT 20; - 19 -
  • 20. Explain *************************** 1. row *************************** id: 1 select_type: SIMPLE table: m1 type: range possible_keys: PRIMARY,thumbs_up_key key: thumbs_up_key /* (thumbs_up,id) */ key_len: 4 ref: NULL Rows: 25000020 /*ignore this, we will read just 20 rows*/ Extra: Using where; Using index /* Cover */ *************************** 2. row *************************** id: 1 select_type: SIMPLE table: m2 type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: forum.m1.id rows: 1 Extra: - 20 -
  • 21. Performance Gain (Primary Key Order) - 21 -
  • 22. Performance Gain (Secondary Key Order) - 22 -
  • 23. Throughput Gain • Throughput Gain while hitting first 30 pages: – Using LIMIT OFFSET, N • 600 query/sec – Using LIMIT N (no OFFSET) • 3.7k query/sec - 23 -
  • 24. Bonus Point Product issue with LIMIT M, N User is reading a page, in the mean time some records may be added to previous page. Due to insert/delete pages records are going to move forward/backward as rolling window: – User is reading messages on 4th page – While he was reading, one new message posted (it would be there on page one), all pages are going to move one message to next page. – User Clicks on Page 5 – One message from page got pushed forward on page 5, user has to read it again No such issue with news approach - 24 -
  • 25. Drawback Search Engine Optimization Expert says: Let bot reach all you pages with fewer number of deep dive Two Solutions: • Read extra rows – Read extra rows in advance and construct links for few previous & next pages • Use small offset – Do not read extra rows in advance, just add links for few past & next pages with required offset & last_seen_id on current page – Do query using new approach with small offset to display desired page – file:///Users/surat/Desktop/Picture%2043.png Additional concern: Dynamic urls, last_seen is not constant over time. - 25 -
  • 26. Thanks - 26 -