Nitrite - Choosing the "Rite" Embedded Database

Nitrite
Choosing the “rite” embedded database
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019

● Enthusiast for all things JVM
● Tech Lead @ Correl8
WHOAMI
● Best Jafar Purim custom award winner for 2019

WHAT DO I
WANT
FROM YOU
● Talk about embedded databases and their proper
use-cases
● Demonstrate why we chose Nitrite and how it
help us accomplish our goals
● Survey various aspects and features Nitrite has to
oﬀer

WHY
SHOULD
YOU CARE ● Nitrite might prove useful in solving similar
problems you might come across in your
backend/mobile application
● In the long run, it can also help you maintain a
cleaner code-base and evolve your data-model
more easily
● Besides, you’re already here … ;-)

Embedded Databases
When are they called for?

Performance
Scalability
/ Simplicity
Flexible Data
Model / Queries
Technological
Diversity
Embedded
Databases !
O ine data
Remote Database Local Storage

TO SUM
THINGS UP ● We’d all rather manage our data somewhere
outside of the application.
● For backend applications requiring
sub-millisecond latency for data operations,
that’s usually not an option
● For mobile applications, it can set the diﬀerence
between a sluggish app and a responsive one
○ User-data plans are also a consideration
🐈 And don’t forget that oﬀline-data cat...

Stock market data, in nature, is
both stateful and mutable
Stock order-books and issued
buy/sell orders are both good
examples. Algorithmic trading
applications are required to
accumulate stock market data
and analyze all of it in realtime.
We’re building an algorithmic trading artiﬁcial intelligence meant to
operate in the Israeli stock exchange (in Kotlin, nonetheless).
OUR USE CASE
Application latency measurements
are taken in microseconds
Data access variations are numerous,
even for homogeneous datums
There are many companies out
there listening for the same set of
events, competing amongst
themselves (and with us) on who
reacts and seizes opportunities
fastest.
Each issued order has its own life-cycle:
it transitions between various states,
receiving only a partial subset of the
class ﬁelds on each transition.
Business-wise, we need to able to query
orders using a varying set of
parameters, such as stock id, order
state, broker, and so on
⇨ ⇨ ⇨

Initialization
{
"orderId":"order-1",
"stockId":"TEVA-IL",
"specification":{
"price":100.0,
"quantity":100
},
"orderState":"REQUESTED_ADDITION",
"transmissionState":"INITIAL"
...
}
Order data transitions
OUR USE CASE
SDK Confirmation
{
???
???
???
???
???
"brokerOrderState": "Accepted"
"transmissionState":"LOCAL"
...
}
Broker Confirmation
{
"specification":{
"price":100.0,
"quantity":100
},
"brokerOrderId": "b-o-1"
"brokerOrderState": "Registered"
"transmissionState":"BROKER"
...
}
Exchange Confirmation
{
"specification":{
"price":100.0,
"qunatity":100
},
"exchangeOrderId": "e-o-1"
"brokerOrderId": "b-o-1"
"brokerOrderState": "Approved"
"orderState":"ADDED",
"transmissionState":"EXCHANGE"
...
}
⇨ ⇨ ⇨
{
"specification":{
"price":100.0,
"quantity":100
},
"transmissionState":"INITIAL"
...
}

Order data repository excerpt
OUR USE CASE
interface OrderDataRepository {
fun get(orderId: String): OrderData?
fun get(orderIds: Set<String>): Set<OrderData>
fun getStockOrders(stockId: String, orderStates: Array<OrderState> = ORDER_STATES):Set<OrderData>
fun update(orderId: String, fields: Map<KProperty<*>, Any>)
fun insert(orderData: OrderData)
}

Our greatest concern - Evolving the data layer
OUR USE CASE
● Surly, we could just implement that repository naively using several
mutableMapOf<K,OrderData>() where K varies based on filter (string, number, etc)
😨 But what happens when we’ll need to query on additional fields?
○ Maintaining a growing number of maps can be cumbersome and error prone
○ Equality is OK, but comparison (greater than, less than) is harder
○ Composite filters (on multiple fields) does not always fit easily into map keys
○ No updates for partial data
😿 We’ll wind up making a poor-man’s database, and a bad one at best

Technical Considerations
What we care about

IT NEEDS
TO BE
FAST AND
FLEXIBLE
● Each library incorporated in our application is
measured in microseconds/operation
● Expected data-set size is 10K-100K objects
● Support “rich” queries - matching on diﬀerent
fields, multiple fields, array fields, nested fields
and so forth without compromising on speed
(This implies secondary indexing)
● Support partial-updates (to avoid the overhead
of read-modify-writes)
● Memory only (volatile) storage

IT NEEDS
TO BE DEAD
SIMPLE
● Short setup times and minimal code to have a
fully working repository/DAO implementation
● No additional query language to learn
● Directly working with objects/classes is always
preferable (including support for nested classes)
● Minimal changes to existing codebase
● All of the above imply minimal vendor lock-in
which is also a crucial consideration

IT NEEDS
TO BE
FREE
● Don’t forget about proper software licensing
● We’re a 6 people (3 employee) startup

Nitrite - Technical Overview
A class is worth a thousand words (or something)

DOES NITRITE
MAKE A
GOOD FIT?
● Well, it’s FREE, that’s for sure :)
○ Apache 2.0 is a very permissive license
● As for SIMPLICITY In the next couple of slides ,
we’ll see how very few lines of code are required
to setup a running Nitrite instance.
● We’ll also observe how well Kotlin’s data class plays
with Nitrite, almost as if it was a part of the
standard library
● And of course - FLEXIBLE queries and indexing
● And some other neat features...

UNDER
THE HOOD
● Nitrite is built on top of MVStore (Known for
powering Java H2 embedded database)
● Objects are stored as Documents
○ Can be written/read directly to a collection,
or de/serialized as pure Kotlin class
instances.
● Both indices and documents are stored inside
MVMap maps, which are B+/-Tree,
multi-versioned maps
○ This makes them optimal for all CRUD
operations/queries, while providing good
filesystem I/O performance

object MyDataRepository {
private val nitrite = Nitrite.builder().openOrCreate()
private val repository = nitrite.getRepository(MyData::class.java)
}
MINIMAL
SETUP
data class MyData(@Id val id:String,val number:Int,val flag:Boolean)
fun all(): List<MyData> = repository
.find(ObjectFilters.ALL)
.toList()
fun insert(data:MyData) {
repository.insert(data)
}
fun get(id: String): MyData? = repository
.find(MyData::id eq id)
.let { cursor -> runCatching { cursor.first() } }
.getOrNull()
code sample

INDICES ● Indices can be defined to speed up queries
filtering on fields other than the primary key
(annotated with @Id)
● A field can be indexed as long as:
○ It’s a scalar (doesn’t implement Collection<*>)
○ It’s sortable (implements Comprable<*>)
○ It doesn’t already have an index defined on it
● Composite indices are not supported

fun insert(data:MyData) {
}
}
SIMPLE
INDICES
init {
repository.createIndex("number",IndexOptions.indexOptions(IndexType.NonUnique))
}
fun greaterThan(number: Int): List<MyData> = repository
.find(MyData::number gt number)
.let { kotlin.runCatching { it.toList()} }
.getOrElse { emptyList() }
code sample

data class MyData(@Id val id:String,val flag:Boolean, val nested:MyNestedData )
data class MyNestedData(val criteria:Int)
fun insert(vararg data:MyData) {
}
}
MULTIPLE
INDICES
init {
repository.createIndex("flag",indexOptions(IndexType.NonUnique))
repository.createIndex("nested.criteria",indexOptions(IndexType.NonUnique))
}
fun on(number: Int, flag: Boolean): List<MyData> {
val filter = ObjectFilters.eq("nested.criteria", 0) or
(ObjectFilters.gt("nested.criteria", number) and (MyData::flag eq flag))
return repository.find(filter).let { kotlin.runCatching { it.toList() } }
.getOrElse { emptyList() }
}
code sample

fun upsert(data:MyData) {
repository.update(data,true)
}
}
FULL
UPDATES

PARTIAL
UPDATES
● Full updates eﬀectually replace elements in the
database with new ones
● Sometimes we don’t have all of the element data
fields required to generate a full update
● If your kotlin data class contains nonnull val fields
you don’t have, you’re stuck
● One solution would be doing a
read-modify-write: read the old instance from the
database, modify it, and write it again
● But that might prove to be too costly,
performance wise...

}
PARTIAL
UPDATES
fun update(id: String, fields: Map<KProperty1<MyData,*>, Any>) {
val document = emptyDocument()
fields.forEach { (field, value) -> document[field.name] = value }
document[MyData::id.name] = id
val options = UpdateOptions.updateOptions(true)
collection.update(MyData::id.name eq id, document,options)
}
…
MyDataRepository.update("id-1", mapOf(MyData::flag,true))
private val collection = repository.documentCollection
code sample

PROJECTED
TYPES
● While stored documents are of type A, sometimes
we need to represent queried data as a sub-type B
where B out A
● Or a completely diﬀerent type C
● We can always write code to do this manually
outside the scope of the query (copy fields from
queried object A to object B)
● Luckily for us, Nitrite provide a solution OOTB
(without garbage/performance overhead)

data class MyData(@Id val id:String,val number:Int?,val flag:Boolean)
}
PROJECTED
TYPES
fun <P> get(id: String, target:Class<P>):P = repository
.find(MyData::id eq id)
.project(target)
.let { kotlin.runCatching { it.firstOrDefault() } }
.getOrNull()
...
MyDataRepository.get("id-1”,ProjectedData::class.java)
data class ProjectedData(val id:String,val number:Int)
code sample

Indexed ﬁeld query, 10000 items, index dispersion of 100, us per op
HOW FAST IS IT?
SQLite (File-Storage)
SQLite (In-Memory)
Nitrite (File Storage)
Nitrite (In-Memory)
Nitrite (In-Memory, Optimized Mapper)
(Benchmarks are implemented using JMH, and yes - they are credible)

Benchmark notes
HOW FAST IS IT?
● The same class (a simple POJO) is used for all of the benchmarks
● SQLite is accessed via JDBC, queries are done using prepared statements.
● Index dispersion indicates value range for the indexed field. The higher that number,
the smaller amounts entities having the same field value we’ll have
● Each measured operation includes only query and result deserialization. Data
generation and insertion take place during the benchmark setup.
● The winner is the In-Memory Nitrite with manual deserialization (matching JDBC
serialization setup). The other Nitrite candidate simply deserialize documents using
Jackson (shorter setup times).

Indexed ﬁeld query, 50000 items, index dispersion of 2000, us per op
HOW FAST IS IT?
SQLite (File-Storage)
SQLite (In-Memory)
Nitrite (File Storage)
Nitrite (In-Memory)
Nitrite (In-Memory, Optimized Mapper)

Additional Benchmark notes
HOW FAST IS IT?
● While the tested data-set is larger for the 2nd benchmark (50000 vs 10000), we can see
that query operation durations are much shorter (~60us vs ~200us)
● The reason for that is the fact that the dispersion for the indexed field value is also
greater (2000 vs 100). that means each query returns less results.
● That gives us a strong indication that the heaviest part in working with these
embedded database (at least for simple indexed queries) is actually object
de/serialization
● Benchmark code is available in here

Nitrite - Advanced Features
Let’s squeeze that lemon to the end

● Nitrite provides us with the ability to register
callbacks on all document related events taking
place in a collection/repository
● What is it good for?
● One good example could be rolling your own
document expiration/TTL:
○ Add a listener that on document insertion,
schedules a task to remove that same
document from the repository at a later time
○ Fully supporting TTLs is in fact, a much more
complicated task, so tread lightly
EVENT
LISTENERS

ADVANCED
SERIALIZATION
● Nitrite uses Jackson to serialize documents into
objects and vice versa
● Jackson gives us decent performance. However,
we can achieve even better de/serialization
latancies by creating manual serialization code
● While this approach extends the setup time, it
can really speed up database access times for
both read and write operations.
● You can find additional information here and
here
● I eventually came up with my own technique

REMOTE
REPLICATION
● Nitrite also provides us with a solution for remote
data replication called DataGate
● Built on top of MongoDB, this add-on allows us to
synchronize events across multiple client
applications (that has Nitrite instances, of course)
○ Replication is selective (on a collection basis)
○ There’s also a neat administrative portal
● Ideal for mobile application remote sync?
○ As of late 2019, not really sure about
scalability/maturity, so test prudently

That’s all folks
Questions?

Nitrite - Choosing the "Rite" Embedded Database

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Nitrite - Choosing the "Rite" Embedded Database