SlideShare a Scribd company logo
1 of 48
Download to read offline
Building a High-
Performance Database with
Scala, Akka, and Spark
Evan Chan
November 2017
Who am I
User and contributor to Spark since 0.9,
Cassandra since 0.6
Created Spark Job Server and FiloDB
Talks at Spark Summit, Cassandra Summit, Strata,
Scala Days, etc.
http://velvia.github.io/
Why Build a New
Streaming Database?
Needs
ā€¢ Ingest HUGE streams of events ā€” IoT etc.
ā€¢ Real-time, low latency, and somewhat ļ¬‚exible queries
ā€¢ Dashboards, quick answers on new data
ā€¢ Flexible schemas and query patterns
ā€¢ Keep your streaming pipeline super simple
ā€¢ Streaming = hardest to debug. Simplicity rules!
Message
Queue
Events
Stream
Processing
Layer
State /
Database
Happy
Users
Spark + HDFS Streaming
Kafka
Spark
Streaming
Many small ļ¬les
(microbatches)
Dedup,
consolidate
job
Larger efļ¬cient
ļ¬les
ā€¢ High latency
ā€¢ Big impedance mismatch between streaming
systems and a ļ¬le system designed for big blobs
of data
Cassandra?
ā€¢ Ingest HUGE streams of events ā€” IoT etc.
ā€¢ C* is not efļ¬cient for writing raw events
ā€¢ Real-time, low latency, and somewhat ļ¬‚exible queries
ā€¢ C* is real-time, but only low latency for simple
lookups. Add Spark => much higher latency
ā€¢ Flexible schemas and query patterns
ā€¢ C* only handles simple lookups
Introducing FiloDB
A distributed, columnar time-series/event database.
Built for streaming.
http://www.github.com/ļ¬lodb/FiloDB
Message
Queue
Events
Spark
Streaming
Short term
storage, K-V
Adhoc,
SQL, ML
Cassandra
FiloDB: Events,
ad-hoc, batch
Spark
Dashboa
rds,
maps
100% Reactive
ā€¢ Scala
ā€¢ Akka Cluster
ā€¢ Spark
ā€¢ Monix / Reactive Streams
ā€¢ Typesafe Conļ¬g for all conļ¬guration
ā€¢ Scodec, Ficus, Enumeratum, Scalactic, etc.
ā€¢ Even most of the performance critical parts are written in Scala
:)
Scala, Akka, and
Spark for Database
Why use Scala and Akka?
ā€¢ Akka Cluster!
ā€¢ Just the right abstractions - streams, futures,
Akka, type safetyā€¦.
ā€¢ Failure handling and supervision are critical for
databases
ā€¢ All the pattern matching and immutable goodness
:)
Scala Big Data Projects
ā€¢ Spark
ā€¢ GeoMesa
ā€¢ Khronus - Akka time-series DB
ā€¢ Sirius - Akka distributed KV Store
ā€¢ FiloDB!
Actors vs Futures vs
Observables
One FiloDB Node
NodeCoordinatorActor
(NCA)
DatasetCoordinatorActor
(DsCA)
DatasetCoordinatorActor
(DsCA)
Active MemTable
Flushing MemTable
Reprojector ColumnStore
Data, commands
Akka vs Futures
NodeCoordinatorActor
(NCA)
DatasetCoordinatorActor
(DsCA)
DatasetCoordinatorActor
(DsCA)
Active MemTable
Flushing MemTable
Reprojector ColumnStore
Data, commands
Akka - control
ļ¬‚ow
Core I/O - Futures/Observables
Akka vs Futures
ā€¢ Akka Actors:
ā€¢ External FiloDB node API (remote + cluster)
ā€¢ Async messaging with clients
ā€¢ Cluster/distributed state management
ā€¢ Futures and Observables:
ā€¢ Core I/O
ā€¢ Columnar data processing / ingestion
ā€¢ Type-safe processing stages
Futures for Single Actions
/**
* Clears all data from the column store for that given projection, for all versions.
* More like a truncation, not a drop.
* NOTE: please make sure there are no reprojections or writes going on before calling this
*/
def clearProjectionData(projection: Projection): Future[Response]
/**
* Completely and permanently drops the dataset from the column store.
* @param dataset the DatasetRef for the dataset to drop.
*/
def dropDataset(dataset: DatasetRef): Future[Response]
/**
* Appends the ChunkSets and incremental indices in the segment to the column store.
* @param segment the ChunkSetSegment to write / merge to the columnar store
* @param version the version # to write the segment to
* @return Success. Future.failure(exception) otherwise.
*/
def appendSegment(projection: RichProjection,
segment: ChunkSetSegment,
version: Int): Future[Response]
Monix / Reactive Streams
ā€¢ http://monix.io
ā€¢ ā€œobservable sequences that are exposed as
asynchronous streams, expanding on the
observer pattern, strongly inspired by ReactiveX
and by Scalaz, but designed from the ground up
for back-pressure and made to cleanly interact
with Scalaā€™s standard library, compatible out-of-
the-box with the Reactive Streams protocolā€
ā€¢ Much better than Future[Iterator[_]]
Monix / Reactive Streams
def readChunks(projection: RichProjection,
columns: Seq[Column],
version: Int,
partMethod: PartitionScanMethod,
chunkMethod: ChunkScanMethod = AllChunkScan): Observable[ChunkSetReader] = {
scanPartitions(projection, version, partMethod)
// Partitions to pipeline of single chunks
.flatMap { partIndex =>
stats.incrReadPartitions(1)
readPartitionChunks(projection.datasetRef, version, columns, partIndex, chunkMethod)
// Collate single chunks to ChunkSetReaders
}.scan(new ChunkSetReaderAggregator(columns, stats)) { _ add _ }
.collect { case agg: ChunkSetReaderAggregator if agg.canEmit => agg.emit() }
}
}
Functional Reactive Stream
Processing
ā€¢ Ingest stream merged with ļ¬‚ush commands
ā€¢ Built in async/parallel tasks via mapAsync
ā€¢ Notify on end of stream, errors
val combinedStream = Observable.merge(stream.map(SomeData), flushStream)
combinedStream.map {
case SomeData(records) => shard.ingest(records)
None
case FlushCommand(group) => shard.switchGroupBuffers(group)
Some(FlushGroup(shard.shardNum, group, shard.latestOffset))
}.collect { case Some(flushGroup) => flushGroup }
.mapAsync(numParallelFlushes)(shard.createFlushTask _)
.foreach { x => }
.recover { case ex: Exception => errHandler(ex) }
Akka Cluster and
Spark
Spark/Akka Cluster Setup
Driver
NodeClusterActor
Client
Executor
NCA
DsCA1 DsCA2
Executor
NCA
DsCA1 DsCA2
Adding one executor
Driver
NodeClusterActor
Client
executor1
NCA
DsCA1 DsCA2
State:ā€Ø
Executors ->
(executor1)
MemberUp
ActorSelection
ActorRef
Adding second executor
Driver
NodeClusterActor
Client
executor1
NCA
DsCA1 DsCA2
State:ā€Ø
Executors ->
(executor1,
executor2) MemberUp
ActorSelection ActorRef
executor2
NCA
DsCA1 DsCA2
Sending a command
Driver
NodeClusterActor
Client
Executor
NCA
DsCA1 DsCA2
Executor
NCA
DsCA1 DsCA2
Flush()
Yes, Akka in Spark
ā€¢ Columnar ingestion is stateful - need stickiness of
state. This is inherently difļ¬cult in Spark.
ā€¢ Akka (cluster) gives us a separate, asynchronous
control channel to talk to FiloDB ingestors
ā€¢ Spark only gives data ļ¬‚ow primitives, not async
messaging
ā€¢ We need to route incoming records to the correct
ingestion node. Sorting data is inefļ¬cient and forces
all nodes to wait for sorting to be done.
Data Ingestion Setup
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source
Actor
Row Source
Actor
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source
Actor
Row Source
Actor
Node
Cluster
Actor
Partition Map
FiloDB NodeFiloDB Node
FiloDB separate nodes
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source
Actor
Row Source
Actor
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source
Actor
Row Source
Actor
Node
Cluster
Actor
Partition Map
Testing Akka Cluster
ā€¢ MultiNodeSpec / sbt-multi-jvm
ā€¢ NodeClusterSpec
ā€¢ Tests joining of different cluster nodes and
partition map updates
ā€¢ Is partition map updated properly if a cluster
node goes down ā€” inject network failures
ā€¢ Lessons
Kamon Tracing
ā€¢ http://kamon.io
ā€¢ One trace can encapsulate multiple Future steps
all executing on different threads
ā€¢ Tunable tracing levels
ā€¢ Summary stats and histograms for segments
ā€¢ Super useful for production debugging of reactive
stack
Kamon Tracing
def appendSegment(projection: RichProjection,
segment: ChunkSetSegment,
version: Int): Future[Response] = Tracer.withNewContext("append-segment") {
val ctx = Tracer.currentContext
stats.segmentAppend()
if (segment.chunkSets.isEmpty) {
stats.segmentEmpty()
return(Future.successful(NotApplied))
}
for { writeChunksResp <- writeChunks(projection.datasetRef, version, segment, ctx)
writeIndexResp <- writeIndices(projection, version, segment, ctx)
if writeChunksResp == Success
} yield {
ctx.finish()
writeIndexResp
}
}
private def writeChunks(dataset: DatasetRef,
version: Int,
segment: ChunkSetSegment,
ctx: TraceContext): Future[Response] = {
asyncSubtrace(ctx, "write-chunks", "ingestion") {
val binPartition = segment.binaryPartition
val segmentId = segment.segmentId
val chunkTable = getOrCreateChunkTable(dataset)
Future.traverse(segment.chunkSets) { chunkSet =>
chunkTable.writeChunks(binPartition, version, segmentId, chunkSet.info.id, chunkSet.chunks, stats)
}.map { responses => responses.head }
}
}
Kamon Metrics
ā€¢ Uses HDRHistogram for much ļ¬ner and more
accurate buckets
ā€¢ Built-in metrics for Akka actors, Spray, Akka-Http,
Play, etc. etc.
KAMON trace name=append-segment n=2863 min=765952 p50=2113536 p90=3211264 p95=3981312
p99=9895936 p999=16121856 max=19529728
KAMON trace-segment name=write-chunks n=2864 min=436224 p50=1597440 p90=2637824
p95=3424256 p99=9109504 p999=15335424 max=18874368
KAMON trace-segment name=write-index n=2863 min=278528 p50=432128 p90=544768 p95=598016
p99=888832 p999=2260992 max=8355840
Validation: Scalactic
private def getColumnsFromNames(allColumns: Seq[Column],
columnNames: Seq[String]): Seq[Column] Or BadSchema = {
if (columnNames.isEmpty) {
Good(allColumns)
} else {
val columnMap = allColumns.map { c => c.name -> c }.toMap
val missing = columnNames.toSet -- columnMap.keySet
if (missing.nonEmpty) { Bad(MissingColumnNames(missing.toSeq, "projection")) }
else { Good(columnNames.map(columnMap)) }
}
}
for { computedColumns <- getComputedColumns(dataset.name, allColIds, columns)
dataColumns <- getColumnsFromNames(columns, normProjection.columns)
richColumns = dataColumns ++ computedColumns
// scalac has problems dealing with (a, b, c) <- getColIndicesAndType... apparently
segStuff <- getColIndicesAndType(richColumns, Seq(normProjection.segmentColId), "segment")
keyStuff <- getColIndicesAndType(richColumns, normProjection.keyColIds, "row")
partStuff <- getColIndicesAndType(richColumns, dataset.partitionColumns, "partition") }
yield {
ā€¢ Notice how multiple validations compose!
Machine-Speed Scala
How do you go REALLY fast?
ā€¢ Donā€™t serialize
ā€¢ Donā€™t allocate
ā€¢ Donā€™t copy
Filo fast
ā€¢ Filo binary vectors - 2 billion records/sec
ā€¢ Spark InMemoryColumnStore - 125 million
records/sec
ā€¢ Spark CassandraColumnStore - 25 million
records/sec
Filo: High Performance
Binary Vectors
ā€¢ Designed for NoSQL, not a ļ¬le format
ā€¢ random or linear access
ā€¢ on or off heap
ā€¢ missing value support
ā€¢ Scala only, but cross-platform support possible
http://github.com/velvia/ļ¬lo is a binary data vector library designed
for extreme read performance with minimal deserialization costs.
Billions of Ops / Sec
ā€¢ JMH benchmark: 0.5ns per FiloVector element access / add
ā€¢ 2 Billion adds per second - single threaded
ā€¢ Who said Scala cannot be fast?
ā€¢ Spark API (row-based) limits performance signiļ¬cantly
val randomInts = (0 until numValues).map(i => util.Random.nextInt)
val randomIntsAray = randomInts.toArray
val filoBuffer = VectorBuilder(randomInts).toFiloBuffer
val sc = FiloVector[Int](filoBuffer)
@Benchmark
@BenchmarkMode(Array(Mode.AverageTime))
@OutputTimeUnit(TimeUnit.MICROSECONDS)
def sumAllIntsFiloApply(): Int = {
var total = 0
for { i <- 0 until numValues optimized } {
total += sc(i)
}
total
}
JVM Inlining
ā€¢ Very small methods can be inlined by the JVM
ā€¢ ļ¬nal def avoids virtual method dispatch.
ā€¢ Thus methods in traits, abstract classes not inlinable
val base = baseReader.readInt(0)
final def apply(i: Int): Int = base + dataReader.read(i)
case (32, _) => new TypedBufferReader[Int] {
final def read(i: Int): Int = reader.readInt(i)
}
final def readInt(i: Int): Int = unsafe.getInt(byteArray, (offset + i * 4).toLong)
0.5ns/read is achieved through a stack of very small methods:
BinaryRecord
ā€¢ Tough problem: FiloDB must handle many
different datasets, each with different schemas
ā€¢ Cannot rely on static types and standard
serialization mechanisms - case classes,
Protobuf, etc.
ā€¢ Serialization very costly, especially strings
ā€¢ Solution: BinaryRecord
BinaryRecord II
ā€¢ BinaryRecord is a binary (ie transport ready) record
class that supports any schema or mix of column
types
ā€¢ Values can be extracted or written with no serialization
cost
ā€¢ UTF8-encoded string class
ā€¢ String compare as fast as native Java strings
ā€¢ Immutable API once built
Use Case: Sorting
ā€¢ Regular sorting: deserialize record, create sort
key, compare sort key
ā€¢ BinaryRecord sorting: binary compare ļ¬elds
directly ā€” no deserialization, no object allocations
Regular Sorting
Protobuf/Avro etc record
Deserialized instance
Sort Key
Protobuf/Avro etc record
Deserialized instance
Sort Key
Cmp
BinaryRecord Sorting
ā€¢ BinaryRecord sorting: binary compare ļ¬elds
directly ā€” no deserialization, no object allocations
name: Str age: Int
lastTimestamp:
Long
group: Str
name: Str age: Int
lastTimestamp:
Long
group: Str
SBT-JMH
ā€¢ Super useful tool to leverage JMH, the best micro
benchmarking harness
ā€¢ JMH is written by the JDK folks
In Summary
ā€¢ Scala, Akka, reactive can give you both awesome
abstractions AND performance
ā€¢ Use Akka for distribution, state, protocols
ā€¢ Use reactive/Monix for functional, concurrent
stream processing
ā€¢ Build (or use FiloDBā€™s) fast low-level abstractions
with good APIs
Thank you Scala OSS!

More Related Content

Viewers also liked

Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)Patricia Aas
Ā 
What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?Black Duck by Synopsys
Ā 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware
Ā 
Advanced memory allocation
Advanced memory allocationAdvanced memory allocation
Advanced memory allocationJoris Bonnefoy
Ā 
Server virtualization
Server virtualizationServer virtualization
Server virtualizationKingston Smiler
Ā 
In-depth forensic analysis of Windows registry files
In-depth forensic analysis of Windows registry filesIn-depth forensic analysis of Windows registry files
In-depth forensic analysis of Windows registry filesMaxim Suhanov
Ā 
SDN Architecture & Ecosystem
SDN Architecture & EcosystemSDN Architecture & Ecosystem
SDN Architecture & EcosystemKingston Smiler
Ā 
Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017Roman Elizarov
Ā 
Network Virtualization
Network VirtualizationNetwork Virtualization
Network VirtualizationKingston Smiler
Ā 
Introduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFVIntroduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFVKingston Smiler
Ā 
Scaling and Transaction Futures
Scaling and Transaction FuturesScaling and Transaction Futures
Scaling and Transaction FuturesMongoDB
Ā 
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...Azamat Abdoullaev
Ā 
Blockchain demystification
Blockchain demystificationBlockchain demystification
Blockchain demystificationBellaj Badr
Ā 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...Chris Fregly
Ā 
Jƶrg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jƶrg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Jƶrg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jƶrg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Codemotion
Ā 
Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017
Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017
Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017Codemotion
Ā 

Viewers also liked (20)

Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Ā 
What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?
Ā 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
Ā 
Advanced memory allocation
Advanced memory allocationAdvanced memory allocation
Advanced memory allocation
Ā 
Docker Networking
Docker NetworkingDocker Networking
Docker Networking
Ā 
Virtualization
VirtualizationVirtualization
Virtualization
Ā 
Server virtualization
Server virtualizationServer virtualization
Server virtualization
Ā 
Go Execution Tracer
Go Execution TracerGo Execution Tracer
Go Execution Tracer
Ā 
In-depth forensic analysis of Windows registry files
In-depth forensic analysis of Windows registry filesIn-depth forensic analysis of Windows registry files
In-depth forensic analysis of Windows registry files
Ā 
SDN Architecture & Ecosystem
SDN Architecture & EcosystemSDN Architecture & Ecosystem
SDN Architecture & Ecosystem
Ā 
OpenFlow
OpenFlowOpenFlow
OpenFlow
Ā 
Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017
Ā 
Network Virtualization
Network VirtualizationNetwork Virtualization
Network Virtualization
Ā 
Introduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFVIntroduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFV
Ā 
Scaling and Transaction Futures
Scaling and Transaction FuturesScaling and Transaction Futures
Scaling and Transaction Futures
Ā 
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
Ā 
Blockchain demystification
Blockchain demystificationBlockchain demystification
Blockchain demystification
Ā 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
Ā 
Jƶrg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jƶrg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Jƶrg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jƶrg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Ā 
Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017
Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017
Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017
Ā 

More from Evan Chan

Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustEvan Chan
Ā 
Designing Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesDesigning Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesEvan Chan
Ā 
Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019Evan Chan
Ā 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleEvan Chan
Ā 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
Ā 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web ServiceEvan Chan
Ā 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
Ā 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkEvan Chan
Ā 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
Ā 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
Ā 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Evan Chan
Ā 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureEvan Chan
Ā 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and SparkEvan Chan
Ā 
Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server TalkEvan Chan
Ā 
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Evan Chan
Ā 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkEvan Chan
Ā 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkEvan Chan
Ā 

More from Evan Chan (17)

Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
Ā 
Designing Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesDesigning Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and Kubernetes
Ā 
Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019
Ā 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
Ā 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
Ā 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
Ā 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Ā 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
Ā 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
Ā 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
Ā 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
Ā 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data Architecture
Ā 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and Spark
Ā 
Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server Talk
Ā 
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Ā 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Ā 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and Shark
Ā 

Recently uploaded

Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
Ā 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsDILIPKUMARMONDAL6
Ā 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
Ā 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
Ā 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
Ā 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
Ā 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
Ā 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
Ā 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
Ā 
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
Ā 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
Ā 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
Ā 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
Ā 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
Ā 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
Ā 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
Ā 
šŸ”9953056974šŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
šŸ”9953056974šŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...šŸ”9953056974šŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
šŸ”9953056974šŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR
Ā 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
Ā 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESNarmatha D
Ā 

Recently uploaded (20)

Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
Ā 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teams
Ā 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
Ā 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
Ā 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
Ā 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
Ā 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
Ā 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
Ā 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
Ā 
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
Ā 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
Ā 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
Ā 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
Ā 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
Ā 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
Ā 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
Ā 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
Ā 
šŸ”9953056974šŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
šŸ”9953056974šŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...šŸ”9953056974šŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
šŸ”9953056974šŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
Ā 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
Ā 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Ā 

2017 High Performance Database with Scala, Akka, Spark

  • 1. Building a High- Performance Database with Scala, Akka, and Spark Evan Chan November 2017
  • 2. Who am I User and contributor to Spark since 0.9, Cassandra since 0.6 Created Spark Job Server and FiloDB Talks at Spark Summit, Cassandra Summit, Strata, Scala Days, etc. http://velvia.github.io/
  • 3. Why Build a New Streaming Database?
  • 4. Needs ā€¢ Ingest HUGE streams of events ā€” IoT etc. ā€¢ Real-time, low latency, and somewhat ļ¬‚exible queries ā€¢ Dashboards, quick answers on new data ā€¢ Flexible schemas and query patterns ā€¢ Keep your streaming pipeline super simple ā€¢ Streaming = hardest to debug. Simplicity rules!
  • 6. Spark + HDFS Streaming Kafka Spark Streaming Many small ļ¬les (microbatches) Dedup, consolidate job Larger efļ¬cient ļ¬les ā€¢ High latency ā€¢ Big impedance mismatch between streaming systems and a ļ¬le system designed for big blobs of data
  • 7. Cassandra? ā€¢ Ingest HUGE streams of events ā€” IoT etc. ā€¢ C* is not efļ¬cient for writing raw events ā€¢ Real-time, low latency, and somewhat ļ¬‚exible queries ā€¢ C* is real-time, but only low latency for simple lookups. Add Spark => much higher latency ā€¢ Flexible schemas and query patterns ā€¢ C* only handles simple lookups
  • 8. Introducing FiloDB A distributed, columnar time-series/event database. Built for streaming. http://www.github.com/ļ¬lodb/FiloDB
  • 9. Message Queue Events Spark Streaming Short term storage, K-V Adhoc, SQL, ML Cassandra FiloDB: Events, ad-hoc, batch Spark Dashboa rds, maps
  • 10. 100% Reactive ā€¢ Scala ā€¢ Akka Cluster ā€¢ Spark ā€¢ Monix / Reactive Streams ā€¢ Typesafe Conļ¬g for all conļ¬guration ā€¢ Scodec, Ficus, Enumeratum, Scalactic, etc. ā€¢ Even most of the performance critical parts are written in Scala :)
  • 11. Scala, Akka, and Spark for Database
  • 12. Why use Scala and Akka? ā€¢ Akka Cluster! ā€¢ Just the right abstractions - streams, futures, Akka, type safetyā€¦. ā€¢ Failure handling and supervision are critical for databases ā€¢ All the pattern matching and immutable goodness :)
  • 13. Scala Big Data Projects ā€¢ Spark ā€¢ GeoMesa ā€¢ Khronus - Akka time-series DB ā€¢ Sirius - Akka distributed KV Store ā€¢ FiloDB!
  • 14. Actors vs Futures vs Observables
  • 16. Akka vs Futures NodeCoordinatorActor (NCA) DatasetCoordinatorActor (DsCA) DatasetCoordinatorActor (DsCA) Active MemTable Flushing MemTable Reprojector ColumnStore Data, commands Akka - control ļ¬‚ow Core I/O - Futures/Observables
  • 17. Akka vs Futures ā€¢ Akka Actors: ā€¢ External FiloDB node API (remote + cluster) ā€¢ Async messaging with clients ā€¢ Cluster/distributed state management ā€¢ Futures and Observables: ā€¢ Core I/O ā€¢ Columnar data processing / ingestion ā€¢ Type-safe processing stages
  • 18. Futures for Single Actions /** * Clears all data from the column store for that given projection, for all versions. * More like a truncation, not a drop. * NOTE: please make sure there are no reprojections or writes going on before calling this */ def clearProjectionData(projection: Projection): Future[Response] /** * Completely and permanently drops the dataset from the column store. * @param dataset the DatasetRef for the dataset to drop. */ def dropDataset(dataset: DatasetRef): Future[Response] /** * Appends the ChunkSets and incremental indices in the segment to the column store. * @param segment the ChunkSetSegment to write / merge to the columnar store * @param version the version # to write the segment to * @return Success. Future.failure(exception) otherwise. */ def appendSegment(projection: RichProjection, segment: ChunkSetSegment, version: Int): Future[Response]
  • 19. Monix / Reactive Streams ā€¢ http://monix.io ā€¢ ā€œobservable sequences that are exposed as asynchronous streams, expanding on the observer pattern, strongly inspired by ReactiveX and by Scalaz, but designed from the ground up for back-pressure and made to cleanly interact with Scalaā€™s standard library, compatible out-of- the-box with the Reactive Streams protocolā€ ā€¢ Much better than Future[Iterator[_]]
  • 20. Monix / Reactive Streams def readChunks(projection: RichProjection, columns: Seq[Column], version: Int, partMethod: PartitionScanMethod, chunkMethod: ChunkScanMethod = AllChunkScan): Observable[ChunkSetReader] = { scanPartitions(projection, version, partMethod) // Partitions to pipeline of single chunks .flatMap { partIndex => stats.incrReadPartitions(1) readPartitionChunks(projection.datasetRef, version, columns, partIndex, chunkMethod) // Collate single chunks to ChunkSetReaders }.scan(new ChunkSetReaderAggregator(columns, stats)) { _ add _ } .collect { case agg: ChunkSetReaderAggregator if agg.canEmit => agg.emit() } } }
  • 21. Functional Reactive Stream Processing ā€¢ Ingest stream merged with ļ¬‚ush commands ā€¢ Built in async/parallel tasks via mapAsync ā€¢ Notify on end of stream, errors val combinedStream = Observable.merge(stream.map(SomeData), flushStream) combinedStream.map { case SomeData(records) => shard.ingest(records) None case FlushCommand(group) => shard.switchGroupBuffers(group) Some(FlushGroup(shard.shardNum, group, shard.latestOffset)) }.collect { case Some(flushGroup) => flushGroup } .mapAsync(numParallelFlushes)(shard.createFlushTask _) .foreach { x => } .recover { case ex: Exception => errHandler(ex) }
  • 24. Adding one executor Driver NodeClusterActor Client executor1 NCA DsCA1 DsCA2 State:ā€Ø Executors -> (executor1) MemberUp ActorSelection ActorRef
  • 25. Adding second executor Driver NodeClusterActor Client executor1 NCA DsCA1 DsCA2 State:ā€Ø Executors -> (executor1, executor2) MemberUp ActorSelection ActorRef executor2 NCA DsCA1 DsCA2
  • 27. Yes, Akka in Spark ā€¢ Columnar ingestion is stateful - need stickiness of state. This is inherently difļ¬cult in Spark. ā€¢ Akka (cluster) gives us a separate, asynchronous control channel to talk to FiloDB ingestors ā€¢ Spark only gives data ļ¬‚ow primitives, not async messaging ā€¢ We need to route incoming records to the correct ingestion node. Sorting data is inefļ¬cient and forces all nodes to wait for sorting to be done.
  • 28. Data Ingestion Setup Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Node Cluster Actor Partition Map
  • 29. FiloDB NodeFiloDB Node FiloDB separate nodes Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Node Cluster Actor Partition Map
  • 30. Testing Akka Cluster ā€¢ MultiNodeSpec / sbt-multi-jvm ā€¢ NodeClusterSpec ā€¢ Tests joining of different cluster nodes and partition map updates ā€¢ Is partition map updated properly if a cluster node goes down ā€” inject network failures ā€¢ Lessons
  • 31. Kamon Tracing ā€¢ http://kamon.io ā€¢ One trace can encapsulate multiple Future steps all executing on different threads ā€¢ Tunable tracing levels ā€¢ Summary stats and histograms for segments ā€¢ Super useful for production debugging of reactive stack
  • 32. Kamon Tracing def appendSegment(projection: RichProjection, segment: ChunkSetSegment, version: Int): Future[Response] = Tracer.withNewContext("append-segment") { val ctx = Tracer.currentContext stats.segmentAppend() if (segment.chunkSets.isEmpty) { stats.segmentEmpty() return(Future.successful(NotApplied)) } for { writeChunksResp <- writeChunks(projection.datasetRef, version, segment, ctx) writeIndexResp <- writeIndices(projection, version, segment, ctx) if writeChunksResp == Success } yield { ctx.finish() writeIndexResp } } private def writeChunks(dataset: DatasetRef, version: Int, segment: ChunkSetSegment, ctx: TraceContext): Future[Response] = { asyncSubtrace(ctx, "write-chunks", "ingestion") { val binPartition = segment.binaryPartition val segmentId = segment.segmentId val chunkTable = getOrCreateChunkTable(dataset) Future.traverse(segment.chunkSets) { chunkSet => chunkTable.writeChunks(binPartition, version, segmentId, chunkSet.info.id, chunkSet.chunks, stats) }.map { responses => responses.head } } }
  • 33. Kamon Metrics ā€¢ Uses HDRHistogram for much ļ¬ner and more accurate buckets ā€¢ Built-in metrics for Akka actors, Spray, Akka-Http, Play, etc. etc. KAMON trace name=append-segment n=2863 min=765952 p50=2113536 p90=3211264 p95=3981312 p99=9895936 p999=16121856 max=19529728 KAMON trace-segment name=write-chunks n=2864 min=436224 p50=1597440 p90=2637824 p95=3424256 p99=9109504 p999=15335424 max=18874368 KAMON trace-segment name=write-index n=2863 min=278528 p50=432128 p90=544768 p95=598016 p99=888832 p999=2260992 max=8355840
  • 34. Validation: Scalactic private def getColumnsFromNames(allColumns: Seq[Column], columnNames: Seq[String]): Seq[Column] Or BadSchema = { if (columnNames.isEmpty) { Good(allColumns) } else { val columnMap = allColumns.map { c => c.name -> c }.toMap val missing = columnNames.toSet -- columnMap.keySet if (missing.nonEmpty) { Bad(MissingColumnNames(missing.toSeq, "projection")) } else { Good(columnNames.map(columnMap)) } } } for { computedColumns <- getComputedColumns(dataset.name, allColIds, columns) dataColumns <- getColumnsFromNames(columns, normProjection.columns) richColumns = dataColumns ++ computedColumns // scalac has problems dealing with (a, b, c) <- getColIndicesAndType... apparently segStuff <- getColIndicesAndType(richColumns, Seq(normProjection.segmentColId), "segment") keyStuff <- getColIndicesAndType(richColumns, normProjection.keyColIds, "row") partStuff <- getColIndicesAndType(richColumns, dataset.partitionColumns, "partition") } yield { ā€¢ Notice how multiple validations compose!
  • 36. How do you go REALLY fast? ā€¢ Donā€™t serialize ā€¢ Donā€™t allocate ā€¢ Donā€™t copy
  • 37. Filo fast ā€¢ Filo binary vectors - 2 billion records/sec ā€¢ Spark InMemoryColumnStore - 125 million records/sec ā€¢ Spark CassandraColumnStore - 25 million records/sec
  • 38. Filo: High Performance Binary Vectors ā€¢ Designed for NoSQL, not a ļ¬le format ā€¢ random or linear access ā€¢ on or off heap ā€¢ missing value support ā€¢ Scala only, but cross-platform support possible http://github.com/velvia/ļ¬lo is a binary data vector library designed for extreme read performance with minimal deserialization costs.
  • 39. Billions of Ops / Sec ā€¢ JMH benchmark: 0.5ns per FiloVector element access / add ā€¢ 2 Billion adds per second - single threaded ā€¢ Who said Scala cannot be fast? ā€¢ Spark API (row-based) limits performance signiļ¬cantly val randomInts = (0 until numValues).map(i => util.Random.nextInt) val randomIntsAray = randomInts.toArray val filoBuffer = VectorBuilder(randomInts).toFiloBuffer val sc = FiloVector[Int](filoBuffer) @Benchmark @BenchmarkMode(Array(Mode.AverageTime)) @OutputTimeUnit(TimeUnit.MICROSECONDS) def sumAllIntsFiloApply(): Int = { var total = 0 for { i <- 0 until numValues optimized } { total += sc(i) } total }
  • 40. JVM Inlining ā€¢ Very small methods can be inlined by the JVM ā€¢ ļ¬nal def avoids virtual method dispatch. ā€¢ Thus methods in traits, abstract classes not inlinable val base = baseReader.readInt(0) final def apply(i: Int): Int = base + dataReader.read(i) case (32, _) => new TypedBufferReader[Int] { final def read(i: Int): Int = reader.readInt(i) } final def readInt(i: Int): Int = unsafe.getInt(byteArray, (offset + i * 4).toLong) 0.5ns/read is achieved through a stack of very small methods:
  • 41. BinaryRecord ā€¢ Tough problem: FiloDB must handle many different datasets, each with different schemas ā€¢ Cannot rely on static types and standard serialization mechanisms - case classes, Protobuf, etc. ā€¢ Serialization very costly, especially strings ā€¢ Solution: BinaryRecord
  • 42. BinaryRecord II ā€¢ BinaryRecord is a binary (ie transport ready) record class that supports any schema or mix of column types ā€¢ Values can be extracted or written with no serialization cost ā€¢ UTF8-encoded string class ā€¢ String compare as fast as native Java strings ā€¢ Immutable API once built
  • 43. Use Case: Sorting ā€¢ Regular sorting: deserialize record, create sort key, compare sort key ā€¢ BinaryRecord sorting: binary compare ļ¬elds directly ā€” no deserialization, no object allocations
  • 44. Regular Sorting Protobuf/Avro etc record Deserialized instance Sort Key Protobuf/Avro etc record Deserialized instance Sort Key Cmp
  • 45. BinaryRecord Sorting ā€¢ BinaryRecord sorting: binary compare ļ¬elds directly ā€” no deserialization, no object allocations name: Str age: Int lastTimestamp: Long group: Str name: Str age: Int lastTimestamp: Long group: Str
  • 46. SBT-JMH ā€¢ Super useful tool to leverage JMH, the best micro benchmarking harness ā€¢ JMH is written by the JDK folks
  • 47. In Summary ā€¢ Scala, Akka, reactive can give you both awesome abstractions AND performance ā€¢ Use Akka for distribution, state, protocols ā€¢ Use reactive/Monix for functional, concurrent stream processing ā€¢ Build (or use FiloDBā€™s) fast low-level abstractions with good APIs