Lecture slides from Kotlin Everywhere - TLV Edition meetup that took place on 27/10/2019, in Soluto's office space
All code samples are publicly available at https://github.com/sheinbergon/kotlintlv-nitrite-demo
2. ● Enthusiast for all things JVM
● Tech Lead @ Correl8
WHOAMI
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
● Best Jafar Purim custom award winner for 2019
3. WHAT DO I
WANT
FROM YOU
● Talk about embedded databases and their proper
use-cases
● Demonstrate why we chose Nitrite and how it
help us accomplish our goals
● Survey various aspects and features Nitrite has to
offer
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
4. WHY
SHOULD
YOU CARE ● Nitrite might prove useful in solving similar
problems you might come across in your
backend/mobile application
● In the long run, it can also help you maintain a
cleaner code-base and evolve your data-model
more easily
● Besides, you’re already here … ;-)
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
7. TO SUM
THINGS UP ● We’d all rather manage our data somewhere
outside of the application.
● For backend applications requiring
sub-millisecond latency for data operations,
that’s usually not an option
● For mobile applications, it can set the difference
between a sluggish app and a responsive one
○ User-data plans are also a consideration
🐈 And don’t forget that offline-data cat...
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
8. Stock market data, in nature, is
both stateful and mutable
Stock order-books and issued
buy/sell orders are both good
examples. Algorithmic trading
applications are required to
accumulate stock market data
and analyze all of it in realtime.
We’re building an algorithmic trading artificial intelligence meant to
operate in the Israeli stock exchange (in Kotlin, nonetheless).
OUR USE CASE
Application latency measurements
are taken in microseconds
Data access variations are numerous,
even for homogeneous datums
There are many companies out
there listening for the same set of
events, competing amongst
themselves (and with us) on who
reacts and seizes opportunities
fastest.
Each issued order has its own life-cycle:
it transitions between various states,
receiving only a partial subset of the
class fields on each transition.
Business-wise, we need to able to query
orders using a varying set of
parameters, such as stock id, order
state, broker, and so on
⇨ ⇨ ⇨
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
10. Order data repository excerpt
OUR USE CASE
interface OrderDataRepository {
fun get(orderId: String): OrderData?
fun get(orderIds: Set<String>): Set<OrderData>
fun getStockOrders(stockId: String, orderStates: Array<OrderState> = ORDER_STATES):Set<OrderData>
fun update(orderId: String, fields: Map<KProperty<*>, Any>)
fun insert(orderData: OrderData)
}
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
11. Our greatest concern - Evolving the data layer
OUR USE CASE
● Surly, we could just implement that repository naively using several
mutableMapOf<K,OrderData>() where K varies based on filter (string, number, etc)
😨 But what happens when we’ll need to query on additional fields?
○ Maintaining a growing number of maps can be cumbersome and error prone
○ Equality is OK, but comparison (greater than, less than) is harder
○ Composite filters (on multiple fields) does not always fit easily into map keys
○ No updates for partial data
😿 We’ll wind up making a poor-man’s database, and a bad one at best
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
13. IT NEEDS
TO BE
FAST AND
FLEXIBLE
● Each library incorporated in our application is
measured in microseconds/operation
● Expected data-set size is 10K-100K objects
● Support “rich” queries - matching on different
fields, multiple fields, array fields, nested fields
and so forth without compromising on speed
(This implies secondary indexing)
● Support partial-updates (to avoid the overhead
of read-modify-writes)
● Memory only (volatile) storage
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
14. IT NEEDS
TO BE DEAD
SIMPLE
● Short setup times and minimal code to have a
fully working repository/DAO implementation
● No additional query language to learn
● Directly working with objects/classes is always
preferable (including support for nested classes)
● Minimal changes to existing codebase
● All of the above imply minimal vendor lock-in
which is also a crucial consideration
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
15. IT NEEDS
TO BE
FREE
● Don’t forget about proper software licensing
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
● We’re a 6 people (3 employee) startup
16. Nitrite - Technical Overview
A class is worth a thousand words (or something)
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
17. DOES NITRITE
MAKE A
GOOD FIT?
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
● Well, it’s FREE, that’s for sure :)
○ Apache 2.0 is a very permissive license
● As for SIMPLICITY In the next couple of slides ,
we’ll see how very few lines of code are required
to setup a running Nitrite instance.
● We’ll also observe how well Kotlin’s data class plays
with Nitrite, almost as if it was a part of the
standard library
● And of course - FLEXIBLE queries and indexing
● And some other neat features...
18. UNDER
THE HOOD
● Nitrite is built on top of MVStore (Known for
powering Java H2 embedded database)
● Objects are stored as Documents
○ Can be written/read directly to a collection,
or de/serialized as pure Kotlin class
instances.
● Both indices and documents are stored inside
MVMap maps, which are B+/-Tree,
multi-versioned maps
○ This makes them optimal for all CRUD
operations/queries, while providing good
filesystem I/O performance
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
19. object MyDataRepository {
private val nitrite = Nitrite.builder().openOrCreate()
private val repository = nitrite.getRepository(MyData::class.java)
}
MINIMAL
SETUP
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
data class MyData(@Id val id:String,val number:Int,val flag:Boolean)
fun all(): List<MyData> = repository
.find(ObjectFilters.ALL)
.toList()
fun insert(data:MyData) {
repository.insert(data)
}
fun get(id: String): MyData? = repository
.find(MyData::id eq id)
.let { cursor -> runCatching { cursor.first() } }
.getOrNull()
code sample
20. INDICES ● Indices can be defined to speed up queries
filtering on fields other than the primary key
(annotated with @Id)
● A field can be indexed as long as:
○ It’s a scalar (doesn’t implement Collection<*>)
○ It’s sortable (implements Comprable<*>)
○ It doesn’t already have an index defined on it
● Composite indices are not supported
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
21. data class MyData(@Id val id:String,val number:Int,val flag:Boolean)
object MyDataRepository {
private val nitrite = Nitrite.builder().openOrCreate()
private val repository = nitrite.getRepository(MyData::class.java)
fun insert(data:MyData) {
repository.insert(data)
}
}
SIMPLE
INDICES
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
init {
repository.createIndex("number",IndexOptions.indexOptions(IndexType.NonUnique))
}
fun greaterThan(number: Int): List<MyData> = repository
.find(MyData::number gt number)
.let { kotlin.runCatching { it.toList()} }
.getOrElse { emptyList() }
code sample
22. data class MyData(@Id val id:String,val flag:Boolean, val nested:MyNestedData )
data class MyNestedData(val criteria:Int)
object MyDataRepository {
private val nitrite = Nitrite.builder().openOrCreate()
private val repository = nitrite.getRepository(MyData::class.java)
fun insert(vararg data:MyData) {
repository.insert(data)
}
}
MULTIPLE
INDICES
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
init {
repository.createIndex("flag",indexOptions(IndexType.NonUnique))
repository.createIndex("nested.criteria",indexOptions(IndexType.NonUnique))
}
fun on(number: Int, flag: Boolean): List<MyData> {
val filter = ObjectFilters.eq("nested.criteria", 0) or
(ObjectFilters.gt("nested.criteria", number) and (MyData::flag eq flag))
return repository.find(filter).let { kotlin.runCatching { it.toList() } }
.getOrElse { emptyList() }
}
code sample
23. data class MyData(@Id val id:String,val number:Int,val flag:Boolean)
object MyDataRepository {
private val nitrite = Nitrite.builder().openOrCreate()
private val repository = nitrite.getRepository(MyData::class.java)
fun upsert(data:MyData) {
repository.update(data,true)
}
}
FULL
UPDATES
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
24. PARTIAL
UPDATES
● Full updates effectually replace elements in the
database with new ones
● Sometimes we don’t have all of the element data
fields required to generate a full update
● If your kotlin data class contains nonnull val fields
you don’t have, you’re stuck
● One solution would be doing a
read-modify-write: read the old instance from the
database, modify it, and write it again
● But that might prove to be too costly,
performance wise...
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
25. data class MyData(@Id val id:String,val number:Int,val flag:Boolean)
object MyDataRepository {
private val nitrite = Nitrite.builder().openOrCreate()
private val repository = nitrite.getRepository(MyData::class.java)
}
PARTIAL
UPDATES
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
fun update(id: String, fields: Map<KProperty1<MyData,*>, Any>) {
val document = emptyDocument()
fields.forEach { (field, value) -> document[field.name] = value }
document[MyData::id.name] = id
val options = UpdateOptions.updateOptions(true)
collection.update(MyData::id.name eq id, document,options)
}
…
MyDataRepository.update("id-1", mapOf(MyData::flag,true))
private val collection = repository.documentCollection
code sample
26. PROJECTED
TYPES
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
● While stored documents are of type A, sometimes
we need to represent queried data as a sub-type B
where B out A
● Or a completely different type C
● We can always write code to do this manually
outside the scope of the query (copy fields from
queried object A to object B)
● Luckily for us, Nitrite provide a solution OOTB
(without garbage/performance overhead)
27. data class MyData(@Id val id:String,val number:Int?,val flag:Boolean)
object MyDataRepository {
private val nitrite = Nitrite.builder().openOrCreate()
private val repository = nitrite.getRepository(MyData::class.java)
}
PROJECTED
TYPES
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
fun <P> get(id: String, target:Class<P>):P = repository
.find(MyData::id eq id)
.project(target)
.let { kotlin.runCatching { it.firstOrDefault() } }
.getOrNull()
...
MyDataRepository.get("id-1”,ProjectedData::class.java)
data class ProjectedData(val id:String,val number:Int)
code sample
28. Indexed field query, 10000 items, index dispersion of 100, us per op
HOW FAST IS IT?
SQLite (File-Storage)
SQLite (In-Memory)
Nitrite (File Storage)
Nitrite (In-Memory)
Nitrite (In-Memory, Optimized Mapper)
(Benchmarks are implemented using JMH, and yes - they are credible)
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
29. Benchmark notes
HOW FAST IS IT?
● The same class (a simple POJO) is used for all of the benchmarks
● SQLite is accessed via JDBC, queries are done using prepared statements.
● Index dispersion indicates value range for the indexed field. The higher that number,
the smaller amounts entities having the same field value we’ll have
● Each measured operation includes only query and result deserialization. Data
generation and insertion take place during the benchmark setup.
● The winner is the In-Memory Nitrite with manual deserialization (matching JDBC
serialization setup). The other Nitrite candidate simply deserialize documents using
Jackson (shorter setup times).
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
30. Indexed field query, 50000 items, index dispersion of 2000, us per op
HOW FAST IS IT?
SQLite (File-Storage)
SQLite (In-Memory)
Nitrite (File Storage)
Nitrite (In-Memory)
Nitrite (In-Memory, Optimized Mapper)
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
31. Additional Benchmark notes
HOW FAST IS IT?
● While the tested data-set is larger for the 2nd benchmark (50000 vs 10000), we can see
that query operation durations are much shorter (~60us vs ~200us)
● The reason for that is the fact that the dispersion for the indexed field value is also
greater (2000 vs 100). that means each query returns less results.
● That gives us a strong indication that the heaviest part in working with these
embedded database (at least for simple indexed queries) is actually object
de/serialization
● Benchmark code is available in here
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
32. Nitrite - Advanced Features
Let’s squeeze that lemon to the end
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
33. Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
● Nitrite provides us with the ability to register
callbacks on all document related events taking
place in a collection/repository
● What is it good for?
● One good example could be rolling your own
document expiration/TTL:
○ Add a listener that on document insertion,
schedules a task to remove that same
document from the repository at a later time
○ Fully supporting TTLs is in fact, a much more
complicated task, so tread lightly
EVENT
LISTENERS
34. ADVANCED
SERIALIZATION
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
● Nitrite uses Jackson to serialize documents into
objects and vice versa
● Jackson gives us decent performance. However,
we can achieve even better de/serialization
latancies by creating manual serialization code
● While this approach extends the setup time, it
can really speed up database access times for
both read and write operations.
● You can find additional information here and
here
● I eventually came up with my own technique
35. REMOTE
REPLICATION
Idan Sheinberg, Kotlin Everywhere - TLV Edition, 27/10/2019
● Nitrite also provides us with a solution for remote
data replication called DataGate
● Built on top of MongoDB, this add-on allows us to
synchronize events across multiple client
applications (that has Nitrite instances, of course)
○ Replication is selective (on a collection basis)
○ There’s also a neat administrative portal
● Ideal for mobile application remote sync?
○ As of late 2019, not really sure about
scalability/maturity, so test prudently