A brief overview of currently popular & available key/value, column oriented & document oriented databases, along with implementation suggestions for the CakePHP web application framework.
2. @JPERRAS - JOEL PERRAS
Canadian Geek
Blog: http://nerderati.com
GitHub: http://github.com/jperras
CakePHP Core since Early 2009, PHP dev. since 2001
McGill University, Montréal, Canada - Physics,
Mathematics & Computer Science
Employer: Plank Design (http://plankdesign.com)
(Twitter: @plankdesign)
Saturday, October 31, 2009
3. RELATIONAL DATABASES
Many different vendors: MySQL, PostgreSQL,
SQLite, Oracle, ...
Same basic implementation:
B(+)-Trees for pages
B(+)-Trees or hash tables for secondary indexes
Possibly R-Trees for spatial indexes
Saturday, October 31, 2009
5. Schemas (relational models)
Familiar BCNF structure
Strong consistency
Transactions
Very “mature” & well tested (mostly)
Easy adoption/integration
Saturday, October 31, 2009
6. RDBMS’ES ARE NOT
GOING ANYWHERE
FriendFeed
Wikipedia
Google AdWords
Facebook
Saturday, October 31, 2009
7. Most small to medium size applications will
never need to go beyond a single database server.
Saturday, October 31, 2009
8. Always try and follow the Golden Web Application
Development Rule:
Saturday, October 31, 2009
9. DON’T TRY TO SOLVE A
PROBLEM YOU DON’T
HAVE
Saturday, October 31, 2009
10. The web has created new problem domains in
data storage and querying.
Saturday, October 31, 2009
11. MODERN WEB APPS
Often use variable schemas
Optional fields: contact lists, addresses, favourite
movies/books, etc.
NULL-itis: null values should not be permitted in
BCNF, but are everywhere in web applications.
Saturday, October 31, 2009
12. MODERN WEB APPS
‘Social’ apps => high write/read ratios
Complex Many-to-Many relationships
Joins become a problem in federated architectures
Eventual consistency is usually acceptable
Downtime unacceptable
Saturday, October 31, 2009
14. RULES OF APP AGING
http://push.cx/2009/rules-of-database-app-aging
1. All fields become optional
2. All relationships become many-to-many
3. Chatter (comments explaining hacks)
grows with time.
Saturday, October 31, 2009
15. SOME GOOD PROBLEMS
TO HAVE
Even if they are “Hard” ones to solve.
Saturday, October 31, 2009
16. Load Balancing
(you can only live with one machine for so long)
Saturday, October 31, 2009
17. High Availability
(because disks fail, and replication fails)
Saturday, October 31, 2009
18. What’s a web application developer to do?
Saturday, October 31, 2009
20. Not a silver bullet.
These can solve some problems,
but cause others and have their own limitations.
It’s up to you to weigh the cost/benefit of your chosen
solution.
Saturday, October 31, 2009
21. THE LANDSCAPE
Key/Value Stores/Distributed Hash Tables (DHT)
Document-oriented databases
Column-oriented databases
Saturday, October 31, 2009
22. KEY/VALUE STORES
Voldemort
Scalaris
Tokyo Cabinet
Redis
MemcacheDB
Saturday, October 31, 2009
23. DOCUMENT ORIENTED
DATA STORES
CouchDB <- (my favourite!)
MongoDB
SimpleDB (Amazon)
Saturday, October 31, 2009
24. COLUMN-ORIENTED
STORES
BigTable (Google)
HBase (Hadoop Database)
Hypertable (BigTable Open Source clone)
Cassandra (Facebook)
Saturday, October 31, 2009
25. How do we use these technologies
alongside CakePHP ?
Saturday, October 31, 2009
27. CASE STUDY - COUCHDB
http://github.com/jperras/divan
(I will make zip/tar available when more stable - stay tuned)
Saturday, October 31, 2009
28. CASE STUDY - TOKYO
CABINET/TYRANT
http://github.com/jperras/tyrannical
(I will make zip/tar available when more stable - stay tuned)
Saturday, October 31, 2009
30. So don’t try to force the interface to
be relational.
Saturday, October 31, 2009
31. DESIGNING A NON-
RELATIONAL DATASOURCE
Favour simplicity over transparency
Don’t try to implement everything that the
MySQL driver implements
Use the strengths of the alternative store
Saturday, October 31, 2009
33. KEY/VALUE STORES
Most have atomic increment/decrement operations
Great for API rate limiters (e.g. 300 API reqs/hour/account)
Counts & sums of normalized data
Most popular items, votes, ratings, some statistics
And more.
Saturday, October 31, 2009
34. DOCUMENT STORES
Filesystem objects (pdfs, images, excel sheets etc.) -
stored as document attachments (size limited).
Allows you to reduce reliance on shared filesystems (NFS)
Address book
Volatile schema situations
CouchDB has a very interesting feature set
Saturday, October 31, 2009
36. Thanks to the DataSource adapter implementation
in CakePHP, creating a model-based interface is simple.
Saturday, October 31, 2009
37. Thank you!
@jperras
http://nerderati.com
http://github.com/jperras
Saturday, October 31, 2009
38. CODE
Divan - CouchDB datasource
Yantra - State Machine component for application control flow
CakPHP TextMate Bundle
CakeMate - TextMate/Vim Plugin
Tyrannical - Tokyo Tyrant datasource
Originally by Martin Samson (pyrolian@gmail.com)
Working to improve code - commits coming soon.
Currently working on a framework-agnostic, distributed, plugin/library server.
Saturday, October 31, 2009