Magic Quadrant for Operational Database Management Systems
4Q14 DataConference.IO 15
1Oracle's Letter to the EUConcerning MySQL
After an antitrust investigation, theEuropean Commission approved Oracle's acquisition of Sun Microsystems, including MySQL, on 21 January 2010. Wikileakssubsequently publishedcables indicating that the Obama administration applied pressure to the EU to approve the deal. Concerns about the MySQL acquisition had been addressed inOracle's 14 December 2009 pledges to customers, which were to extend for five years —thus expiring in early 2015. Oracle's pledges included commitments to maintain certain APIs, extensions of licenses to then-current licensees, continued use of GPL licensing, and others. The expiration of these commitments may change the nature of Oracle's relationships with a number of hardware and software vendors, as well as its posture regarding product investment, support for purchasing requirements, and other aspects of MySQL's business model.
SO... WHICH DATABASES SCALE?
4Q14 DataConference.IO 24
•Pros : Read-cachingcan take overa lot of read operations. If reads make up most of your workload, this will obviously help a lot. Even if you have a heavy write workload, read-caching might be enough to keep you from having to scale-out to handle writes.
•Cons :Read-caching, by nature, involves a memory store. If your data-access patterns are really random, or involve a large percentage of records,you might wind up with a pretty expensive memory foot print. Figuring out the right cache-invalidation for your app can also bereallytricky. Many memory stores are prettybasic in terms of functionality—lack of support for transactions & joins can mean that you’ll need multiple process or network round-trips between the app & the cache.
4Q14 DataConference.IO 25
•Pros:In short: you can achieve better throughputof incoming writes. With many caching systems, you can also query the data in the cache creating a set of real-time use cases including: event-processing, triggers & real-time analytics.
•Cons:Coalescing writes will inherently mean that your persistence layer isbehindyour ingestion layer.To takeadvantage of this technique, you’ll need to consider a lot of questions:
–Whichdata to query: cached, persisted, both?
–Does thisdata need to bemade durable (survives a reboot)? How quickly?
–Are there consistency concerns? Unique indices? Atomic transaction?
4Q14 DataConference.IO 26
•Pros :Connection scaling increases the number of concurrentconnections (obviously, I think?) It’sbiggest benefit, though, is in reliability, since any cluster node can fail and clients can simply reconnect.
•Cons:Connection Scalingrequires shared storage. RAC,for example, typically uses OCFS, a clustered file-system, and SAN storage.The ability to handle more I/O transactionsis dependent on scaling up that shared storage tier, which can be very expensive. Connection Scaling also doesn’t help much with capacity or analysis scaling sincethe data isshared, not spread out across nodes.
4Q14 DataConference.IO 27
•Pros :While there’s some setup involved, it’spretty seamless to yourapplication. There’s still only a single node that hascontrolover the data, so there are no new concerns around consistency. For read- constrainedapplications, nodes can be added quickly and the architecture remains relatively simple.
•Cons :MSRsolves one problem: reader transactions. If you need to scale other aspects, you’re not doing it here. If you need more write throughput, MSRoffloads the read transactionsfrom the master, butwrites are still limited to a single node. Also, slavescan lag in their updates from the master, if you need absolute consistency between the two, you’ll need to investigate options for synchronous replication which can impact performance of the masternode.
4Q14 DataConference.IO 28
Vertical Partitioning ( aka cluster )
•Pros:Having smaller databases makes indices perform better, and allows you to improve just about anyaspect of scaling.
•Cons:If yourmodel requires relationships betweenmost or all of your tables forthe basic operations, vertically partitioning may not be a fit. Even when you model fits well into partitions today, having these divisions can impact flexibility of performing joinsacross models in the future.
4Q14 DataConference.IO 29
Horizontal Partitioning ( aka shard )
•Pros:This type of partitioning provides scaling forall of the elements of scale, allowing for very large data-sets and very good performance.
•Cons:Shardingcanhave alot of drawbacks depending on the implementation. For one thing, the client must be aware of the partition key. When implementingshardingin MySQL, for example,an application will typicallyinfer the partition key, and address the desiredpartition. Increasing the number of nodes, or changing the key requires an update to the app each time. Other trade-offs like database features are up for grabs too:
–Joins:if my data for two collections is distributedacross multiple nodes,when I fetch the data back, I may need to join data acrossmore than one —which is likely to be slower
–Transactions:if I have a transaction that involves two nodes of the cluster, how to I execute them atomic-ly? Do I lock multiple nodes? All of them?
–Bulk commits:If I updaterecords in bulk acrossmultiple nodes, this is reallytwo transactions executed separately.
4Q14 DataConference.IO 30
So... which databases scale?
•Scale Out Reads
•Scale Out Analysis
•Scale Out Writes
4Q14 DataConference.IO 31