Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

20141206 4 q14_dataconference_i_am_your_db

3,163 views

Published on

4Q14 DataConference.IO

I'm Your DB !!

Published in: Software

20141206 4 q14_dataconference_i_am_your_db

  1. 1. I’m your DB( I need a database that scales ) FB/hyeongchae.lee 4Q14 DataConference.IO 1
  2. 2. 4Q14 DataConference.IO 2 I’m your DB! May the oracle be with you
  3. 3. Agenda•About me•DBMS vs NoSQL•Local vs Global•So... which databases scale? •Amazon Aurora 4Q14 DataConference.IO 3
  4. 4. ABOUT ME---------------------------- 4Q14 DataConference.IO 4
  5. 5. 4Q14 DataConference.IO 5 INERVITMobileLitenhnCUBRIDTELCOWARETelcobaseALTIBASEAltibaseTIBEROTibero
  6. 6. 4Q14 DataConference.IO 6
  7. 7. Global Open Frontier Full-time•Project : MySQL RedisPlug-in ( +MariaDB, +MaxScale) –https://github.com/sql2/MySQL_Redis_Plugin_Dev 4Q14 DataConference.IO 7
  8. 8. MySQL MemcachedPlug-in 4Q14 DataConference.IO 8 MysqldMySQL ServerHandler APIMemcachedplugininnodb_memcachelocal cache(optional) InnoDBAPIInnoDBStorage EngineSQLMemcachedprotocolApplication
  9. 9. MySQL RedisPlug-in 4Q14 DataConference.IO 9 MysqldMySQL ServerHandler APIRedisplugininnodb_redislocal cache(optional) InnoDBAPIInnoDBStorage EngineSQLRedisprotocolApplication
  10. 10. 2015 : MaxScaleRedisCluster Plug-in 4Q14 DataConference.IO 10 URL : https://mariadb.com/blog/maxscale-proxy-mysql-replication-relay
  11. 11. DBMSVS NoSQL 4Q14 DataConference.IO 11
  12. 12. RankLastMonthDBMSDatabase ModelScoreChanges11OracleRelational DBMS 1452.13 -19.7722MySQLRelational DBMS1279.08+16.1133Microsoft SQL ServerRelational DBMS 1220.20 +0.5944PostgreSQLRelational DBMS257.36-0.3655MongoDBDocument store244.73+4.3366DB2Relational DBMS 206.23 -1.4477Microsoft AccessRelational DBMS 138.84 -2.8088SQLiteRelational DBMS 95.28 +0.33910CassandraWide column store91.99+6.29109Sybase ASERelational DBMS 84.62 -2.17DB-Engines Ranking 4Q14 DataConference.IO 12 2014.11.24 http://db-engines.com/en/ranking
  13. 13. 4Q14 DataConference.IO 13 http://db-engines.com/en/ranking_categories
  14. 14. Winner !! 4Q14 DataConference.IO 14
  15. 15. Magic Quadrant for Operational Database Management Systems 4Q14 DataConference.IO 15 1Oracle's Letter to the EUConcerning MySQL After an antitrust investigation, theEuropean Commission approved Oracle's acquisition of Sun Microsystems, including MySQL, on 21 January 2010. Wikileakssubsequently publishedcables indicating that the Obama administration applied pressure to the EU to approve the deal. Concerns about the MySQL acquisition had been addressed inOracle's 14 December 2009 pledges to customers, which were to extend for five years —thus expiring in early 2015. Oracle's pledges included commitments to maintain certain APIs, extensions of licenses to then-current licensees, continued use of GPL licensing, and others. The expiration of these commitments may change the nature of Oracle's relationships with a number of hardware and software vendors, as well as its posture regarding product investment, support for purchasing requirements, and other aspects of MySQL's business model.
  16. 16. LOCAL VS GLOBAL 4Q14 DataConference.IO 16
  17. 17. Korean vs Japan50M vs 127M 4Q14 DataConference.IO 17
  18. 18. Korea vs Japan 4Q14 DataConference.IO 18 SlaveSlaveMasterSlaveSlaveSlaveMasterSlavex3
  19. 19. KakaoTalkvs LINE 4Q14 DataConference.IO 19
  20. 20. KakaoTalkvs LINE 4Q14 DataConference.IO 20
  21. 21. We Love FusionIO!! 4Q14 DataConference.IO 21 •facebook/flashcache
  22. 22. Dolphinics’ Dolphin Interconnect Solutions 4Q14 DataConference.IO 22
  23. 23. MEMSCALE 4Q14 DataConference.IO 23
  24. 24. SO... WHICH DATABASES SCALE? 4Q14 DataConference.IO 24
  25. 25. Read Caching •Pros : Read-cachingcan take overa lot of read operations. If reads make up most of your workload, this will obviously help a lot. Even if you have a heavy write workload, read-caching might be enough to keep you from having to scale-out to handle writes. •Cons :Read-caching, by nature, involves a memory store. If your data-access patterns are really random, or involve a large percentage of records,you might wind up with a pretty expensive memory foot print. Figuring out the right cache-invalidation for your app can also bereallytricky. Many memory stores are prettybasic in terms of functionality—lack of support for transactions & joins can mean that you’ll need multiple process or network round-trips between the app & the cache. 4Q14 DataConference.IO 25 http://spiegela.com/2014/04/28/but-i-need-a-database-that-scales-part-1
  26. 26. WriteCoalescing •Pros:In short: you can achieve better throughputof incoming writes. With many caching systems, you can also query the data in the cache creating a set of real-time use cases including: event-processing, triggers & real-time analytics. •Cons:Coalescing writes will inherently mean that your persistence layer isbehindyour ingestion layer.To takeadvantage of this technique, you’ll need to consider a lot of questions: –Whichdata to query: cached, persisted, both? –Does thisdata need to bemade durable (survives a reboot)? How quickly? –Are there consistency concerns? Unique indices? Atomic transaction? 4Q14 DataConference.IO 26
  27. 27. Connection Scaling •Pros :Connection scaling increases the number of concurrentconnections (obviously, I think?) It’sbiggest benefit, though, is in reliability, since any cluster node can fail and clients can simply reconnect. •Cons:Connection Scalingrequires shared storage. RAC,for example, typically uses OCFS, a clustered file-system, and SAN storage.The ability to handle more I/O transactionsis dependent on scaling up that shared storage tier, which can be very expensive. Connection Scaling also doesn’t help much with capacity or analysis scaling sincethe data isshared, not spread out across nodes. 4Q14 DataConference.IO 27
  28. 28. Master-Slave Replication •Pros :While there’s some setup involved, it’spretty seamless to yourapplication. There’s still only a single node that hascontrolover the data, so there are no new concerns around consistency. For read- constrainedapplications, nodes can be added quickly and the architecture remains relatively simple. •Cons :MSRsolves one problem: reader transactions. If you need to scale other aspects, you’re not doing it here. If you need more write throughput, MSRoffloads the read transactionsfrom the master, butwrites are still limited to a single node. Also, slavescan lag in their updates from the master, if you need absolute consistency between the two, you’ll need to investigate options for synchronous replication which can impact performance of the masternode. 4Q14 DataConference.IO 28
  29. 29. Vertical Partitioning ( aka cluster ) •Pros:Having smaller databases makes indices perform better, and allows you to improve just about anyaspect of scaling. •Cons:If yourmodel requires relationships betweenmost or all of your tables forthe basic operations, vertically partitioning may not be a fit. Even when you model fits well into partitions today, having these divisions can impact flexibility of performing joinsacross models in the future. 4Q14 DataConference.IO 29
  30. 30. Horizontal Partitioning ( aka shard ) •Pros:This type of partitioning provides scaling forall of the elements of scale, allowing for very large data-sets and very good performance. •Cons:Shardingcanhave alot of drawbacks depending on the implementation. For one thing, the client must be aware of the partition key. When implementingshardingin MySQL, for example,an application will typicallyinfer the partition key, and address the desiredpartition. Increasing the number of nodes, or changing the key requires an update to the app each time. Other trade-offs like database features are up for grabs too: –Joins:if my data for two collections is distributedacross multiple nodes,when I fetch the data back, I may need to join data acrossmore than one —which is likely to be slower –Transactions:if I have a transaction that involves two nodes of the cluster, how to I execute them atomic-ly? Do I lock multiple nodes? All of them? –Bulk commits:If I updaterecords in bulk acrossmultiple nodes, this is reallytwo transactions executed separately. 4Q14 DataConference.IO 30
  31. 31. So... which databases scale? •Scale Out Reads •Capacity •Scale Out Analysis •Scale Out Writes •Bulk Commits •Joins •Transactions •Durability •Consistency 4Q14 DataConference.IO 31
  32. 32. 4Q14 DataConference.IO 32
  33. 33. Scaling Storytime•http://en.wikipedia.org/wiki/Brad_Fitzpatrick 4Q14 DataConference.IO 33
  34. 34. One Server 4Q14 DataConference.IO 34 MySQLApacheInternet•Simple:
  35. 35. Two Server 4Q14 DataConference.IO 35 MySQLApacheInternet•Two SPOF
  36. 36. •Replication ! Five Server 4Q14 DataConference.IO 36 MasterApacheInternetApacheApacheSlavereadwritereplication
  37. 37. More Server •Chaos ! 4Q14 DataConference.IO 37 MasterApacheInternetApacheSlaveApacheApacheApacheApacheSlaveSlaveSlaveSlaveSlave
  38. 38. Cluster vs ShardMulti-Master  Cluster  Shard  Cluster + Shard 4Q14 DataConference.IO 38
  39. 39. MySQL Recruit •Big Table ( X ) Small Table ( O ) •Performance ( X ) Scale-up ( O ) Distributed ( O ) •Query Tuning hard ... •Clustering & Sharding mission ... 4Q14 DataConference.IO 39
  40. 40. AMAZON AURORA 4Q14 DataConference.IO 40
  41. 41. http://www.theregister.co.uk/2014/11/26/inside_aurora_how_disruptive_is_amazons_mysql_clone/ 4Q14 DataConference.IO 41
  42. 42. OSSCON 4Q14 42

×