This is from a 2 hour talk introducing in-memory databases. First a look at traditional RDBMS architecture and some of it's limitations, then a look at some in-memory products and finally a closer look at OrigoDB, the open source in-memory database toolkit for NET/Mono.
Explain the basic operations like insert, seek and scan
Explain the basics quickly.Talk about the boundaries of s.Ask: Is an RDBMS ACID? Answer on next slide.
Consistency and isolation are not binary.
Reporting.In-memory pushes the boundaries
Explain each of the bullets relating to previous topics.Recall slide ”What is a database”?
Great performance comes for free but could be optimized.
Some other frameworks based on or supporting write-ahead command logging and snapshots with a user defined in-memory model.
Defining a custom data model is what makes OrigoDB unique.
ROBERT FRIBERG, DEVREX LABS
◦ Independent Developer and Trainer
◦ Sql Server DBA since 6.5
◦ Machine learning, AI
◦ Squash fanatic
◦ Revisiting Traditional RDBMS
◦ Defining IMDB
◦ A look at a few in-memory products
◦ OrigoDB in depth
◦ Learn technical stuff
◦ Thinking different
What is a database?
◦ An organized collection of information
◦ Allows reading and writing
◦ Provides authorization and authentication
◦ Provides some level of data safety
Demand drives change
◦ Data volume
• Big data
• Real time analytics
• In-memory computing
• Column stores
One size no longer fits all
B-trees and Transactions
DATA 64KB blocks w 8x8KB pages
Logical BTREE of 8kb data pages
In the buffer pool (cache)
Transactions append inserted, deleted, original and modified pages to the LOG
• Fill factor
• Page splits
• Clustered index
D Durable s0 s1 s2t1 t2
What is s?
“the B-tree is optimized for
systems that read and
write large blocks of data”
The Traditional RDBMS Architecture
”.. is obsolete”
Reference: OLTP through the looking glass, Stonebraker et al
OLTP vs. OLAP mismatch
OLAP Read intensive, touches a lot of
data, benefits from indexes
- Small writes
- small reads
- hot spots
What is an in-memory database?
◦ PRIMARY representation is in-memory
◦ Memory optimized data structures
◦ ALL the data in memory (possibly distributed)
(in-memory is not necessarily in-process)
◦ Write Ahead Logging – write to disk before commit
◦ Effect logging – persist the effected datapages
◦ Command logging – persist the cause
◦ Real time applications with no durability requirements
◦ Embedded, router, online gaming
◦ Real time applications with durability requirements, low latency, high throughput
◦ Traditional applications during test and development (and production)
◦ Whenever data fits in RAM or can be distributed
◦ General OLTP replacement when DB < 2TB
Some In-memory Products
SQL Server Hekaton
◦ Memory optimized tree structure
◦ Almost Lock-free Mvcc concurrency control
◦ Command logging
◦ Seamlessly Integrated in the traditional model
◦ Redis is an open source, BSD licensed, advanced key-value store. It is often referred to
as a data structure server since keys can contain strings, hashes, lists, sets and sorted
◦ Extremely popular and widespread
(twitter, flicker, github, digg, disqus, Instagram, stackoverflow)
◦ Written in C, great performance
Product License Datamodel Interface ACID Distributed Concurrency
VoltDB OSS Relational Java/sql yes Yes (2PC) Serialized
memsql $$ Relational SQL Almost Yes Mvcc
aerospike $$ Key/value many yes Yes(2PC) CAS
SQL Server $$ Relational + T-SQL Yes (no) No Locking,
NuoDB $$ Relational SQL Yes
Hazelcast OSS Key/value+ java Almost Yes (2PC)
Gridgain OSS Key/value Java,sql Yes Yes (2PC) mvcc
Origodb OSS + User defined NET/REST Yes No
Redis OSS Key/value + Many/LUA Yes No
◦ Is it a database? (first name was Livedomain)
◦ Database Toolkit - Define your own datamodel
◦ Write ahead command logging + snapshots
◦ Single writer + multiple reader concurrency (serialized)
◦ Open source embedded engine
◦ 100% ACID
◦ Commercial server with master/slave replication
◦Simplicity and correctness before performance
Bring your own data model
◦ Generic models = Extra schema + mapping is complex so why?
◦ Key/Value (value is a blob)
◦ Document (document is structured and queryable)
◦ Graph, nodes and edges
◦ Domain specific models
◦ OO Domain model (DDD) (typed graph)
◦ Machine learning models (Accord.NET)
◦ Lucene.NET indexes
◦ TODO example – Anemic model, transaction script pattern (fat commands)
◦ Twitter clone – rich model with proxy, no commands
◦ Geekstream http://geekstream.devrexlabs.com/
◦ OrigoDB Server http://origodb.com/
◦ Times are changing! Embrace!
◦ One size does not fit all – go polyglot persistence!
◦ Choose the most appropriate data model
◦ If data fits in RAM go in-memory!