SlideShare a Scribd company logo
1 of 38
Download to read offline
Lessons: Porting a Streaming Pipeline from
Scala to Rust
2023 Scale by the Bay
Evan Chan
Principal Engineer - Conviva
http://velvia.github.io/presentations/2023-conviva-scala-to-rust
1 / 38
Conviva
2 / 38
Massive Real-time Streaming Analytics
5 trillion events processed per day
800-2000GB/hour (not peak!!)
Started with custom Java code
went through Spark Streaming and Flink iterations
Most backend data components in production are written in Scala
Today: 420 pods running custom Akka Streams processors
3 / 38
Data World is Going Native and Rust
Going native: Python, end of Moore's Law, cloud compute
Safe, fast, and high-level abstractions
Functional data patterns - map, fold, pattern matching, etc.
Static dispatch and no allocations by default
PyO3 - Rust is the best way to write native Python extensions
JVM Rust projects
Spark, Hive DataFusion, Ballista, Amadeus
Flink Arroyo, RisingWave, Materialize
Kafka/KSQL Fluvio
ElasticSearch / Lucene Toshi, MeiliDB
Cassandra, HBase Skytable, Sled, Sanakirja...
Neo4J TerminusDB, IndraDB
4 / 38
About our Architecture
graph LR; SAE(Streaming
Data
Pipeline) Sensors --> Gateways Gateways --> Kafka Kafka --> SAE SAE -->
DB[(Metrics
Database)] DB --> Dashboards
5 / 38
What We Are Porting to Rust
graph LR; classDef highlighted fill:#99f,stroke:#333,stroke-width:4px
SAE(Streaming
Data
Pipeline) Sensors:::highlighted --> Gateways:::highlighted Gateways --> Kafka
Kafka --> SAE:::highlighted SAE --> DB[(Metrics
Database)] DB --> Dashboards
graph LR; Notes1(Sensors: consolidate
fragmented code base) Notes2(Gateway:
Improve on JVM and Go) Notes3(Pipeline:
Improve efficiency
New operator architecture) Notes1 ~~~ Notes2 Notes2 ~~~ Notes3
6 / 38
Our Journey to Rust
gantt title From Hackathon to Multiple Teams dateFormat YYYY-MM
axisFormat %y-%b section Data Pipeline Hackathon :Small Kafka ingestion
project, 2022-11, 30d Scala prototype :2023-02, 6w Initial Rust Port : small
team, 2023-04, 45d Bring on more people :2023-07, 8w 20-25 people 4 teams
:2023-11, 1w section Gateway Go port :2023-07, 6w Rust port :2023-09, 4w
“I like that if it compiles, I know it will work, so it gives confidence.”
7 / 38
Promising Rust Hackathon
graph LR; Kafka --> RustDeser(Rust Deserializer) RustDeser --> RA(Rust Actors -
Lightweight Processing)
Measurement Improvement over Scala/Akka
Throughput (CPU) 2.6x more
Memory used 12x less
Mostly I/O-bound lightweight deserialization and processing workload
Found out Actix does not work well with Tokio
8 / 38
Performance Results - Gateway
9 / 38
Key Lessons or Questions
What matters for a Rust port?
The 4 P's ?
People How do we bring developers onboard?
Performance How do I get performance? Data structures? Static dispatch?
Patterns What coding patterns port well from Scala? Async?
Project How do I build? Tooling, IDEs?
10 / 38
People
How do we bring developers onboard?
11 / 38
A Phased Rust Bringup
We ported our main data pipeline in two phases:
Phase Team Rust Expertise Work
First 3-5, very senior
1-2 with significant
Rust
Port core project
components
Second
10-15, mixed,
distributed
Most with zero
Rust
Smaller, broken down
tasks
Have organized list of learning resources
2-3 weeks to learn Rust and come up to speed
12 / 38
Difficulties:
Lifetimes
Compiler errors
Porting previous patterns
Ownership and async
etc.
How we helped:
Good docs
Start with tests
ChatGPT!
Rust Book
Office hours
Lots of detailed reviews
Split project into async and
sync cores
Overcoming Challenges
13 / 38
Performance
Data structures, static dispatch, etc.
"I enjoy the fact that the default route is performant. It makes you write
performant code, and if you go out the way, it becomes explicit (e.g., with dyn,
Boxed, or clone etc). "
14 / 38
Porting from Scala: Huge Performance Win
graph LR; classDef highlighted fill:#99f,stroke:#333,stroke-width:4px
SAE(Streaming
Data
Pipeline) Sensors --> Gateways Gateways --> Kafka Kafka --> SAE:::highlighted
SAE --> DB[(Metrics
Database)] DB --> Dashboards
CPU-bound, programmable, heavy data processing
Neither Rust nor Scala is productionized nor optimized
Same architecture and same input/outputs
Scala version was not designed for speed, lots of objects
Rust: we chose static dispatch and minimizing allocations
Type of comparison Improvement over Scala
Throughput, end to end 22x
Throughput, single-threaded microbenchmark >= 40x
15 / 38
Building a Flexible Data Pipeline
graph LR; RawEvents(Raw Events) RawEvents -->| List of numbers | Extract1
RawEvents --> Extract2 Extract1 --> DoSomeMath Extract2 -->
TransformSomeFields DoSomeMath --> Filter1 TransformSomeFields -->
Filter1 Filter1 --> MoreProcessing
An interpreter passes time-ordered data between flexible DAG of operators.
Span1
Start time: 1000
End time: 1100
Events: ["start", "click"]
Span2
Start time: 1100
End time: 1300
Events: ["ad_load"]
16 / 38
Scala: Object Graph on Heap
graph TB; classDef default font-
size:24px
ArraySpan["`Array[Span]`"]
TL(Timeline - Seq) --> ArraySpan
ArraySpan --> Span1["`Span(start,
end, Payload)`"] ArraySpan -->
Span2["`Span(start, end,
Payload)`"] Span1 -->
EventsAtSpanEnd("`Events(Seq[A])`")
EventsAtSpanEnd -->
ArrayEvent["`Array[A]`"]
Rust: mostly stack based / 0 alloc:
flowchart TB; subgraph Timeline
subgraph OutputSpans subgraph
Span1 subgraph Events EvA ~~~
EvB end TimeInterval ~~~ Events
end subgraph Span2 Time2 ~~~
Events2 end Span1 ~~~ Span2 end
DataType ~~~ OutputSpans end
Data Structures: Scala vs Rust
17 / 38
Rust: Using Enums and Avoiding Boxing
pub enum Timeline {
EventNumber(OutputSpans<EventsAtEnd<f64>>),
EventBoolean(OutputSpans<EventsAtEnd<bool>>),
EventString(OutputSpans<EventsAtEnd<DataString>>),
}
type OutputSpans<V> = SmallVec<[Spans<V>; 2]>;
pub struct Span<SV: SpanValue> {
pub time: TimeInterval,
pub value: SV,
}
pub struct EventsAtEnd<V>(SmallVec<[V; 1]>);
In the above, the Timeline enum can fit entirely in the stack and avoid all
boxing and allocations, if:
The number of spans is very small, below limit set in code
The number of events in each span is very small (1 in this case, which is
the common case)
The base type is a primitive, or a string which is below a certain length 18 / 38
Avoiding Allocations using SmallVec and
SmallString
SmallVec is something like this:
pub enum SmallVec<T, const N: usize> {
Stack([T; N]),
Heap(Vec<T>),
}
The enum can hold up to N items inline in an array with no allocations, but
switches to the Heap variant if the number of items exceeds N.
There are various crates for small strings and other data structures.
19 / 38
Static vs Dynamic Dispatch
Often one will need to work with many different structs that implement a Trait
-- for us, different operator implementations supporting different types. Static
dispatch and inlined code is much faster.
1. Monomorphisation using generics
fn execute_op<O: Operator>(op: O) -> Result<...>
Compiler creates a new instance of execute_op for every different O
Only works when you know in advance what Operator to pass in
2. Use Enums and enum_dispatch
fn execute_op(op: OperatorEnum) -> Result<...>
3. Dynamic dispatch
fn execute_op(op: Box<dyn Operator>) -> Result<...>
fn execute_op(op: &dyn Operator) -> Result<...> (avoids allocation)
4. Function wrapping
Embedding functions in a generic struct
20 / 38
enum_dispatch
Suppose you have
trait KnobControl {
fn set_position(&mut self, value: f64);
fn get_value(&self) -> f64;
}
struct LinearKnob {
position: f64,
}
struct LogarithmicKnob {
position: f64,
}
impl KnobControl for LinearKnob...
enum_dispatch lets you do this:
#[enum_dispatch]
trait KnobControl {
//...
} 21 / 38
Function wrapping
Static function wrapping - no generics
pub struct OperatorWrapper {
name: String,
func: fn(input: &Data) -> Data,
}
Need a generic - but accepts closures
pub struct OperatorWrapper<F>
where F: Fn(input: &Data) -> Data {
name: String,
func: F,
}
22 / 38
Patterns
Async, Type Classes, etc.
23 / 38
Rust Async: Different Paradigms
"Async: It is well designed... Yes, it is still pretty complicated piece of code, but
the logic or the framework is easier to grasp compared to other languages."
Having to use Arc: Data Structures are not Thread-safe by default!
Scala Rust
Futures futures, async functions
?? async-await
Actors(Akka) Actix, Bastion, etc.
Async streams Tokio streams
Reactive (Akka streams, Monix, ZIO) reactive_rs, rxRust, etc.
24 / 38
Replacing Akka: Actors in Rust
Actix threading model doesn't mix well with Tokio
We moved to tiny-tokio-actor, then wrote our own
pub struct AnomalyActor {}
#[async_trait]
impl ChannelActor<Anomaly, AnomalyActorError> for AnomalyActor {
async fn handle(
&mut self,
msg: Anomaly,
ctx: &mut ActorContext<Anomaly>,
) -> Result<(), Report<AnomalyActorError>> {
use Anomaly::*;
match msg {
QuantityOverflowAnomaly {
ctx: _, ts: _, qual: _,
qty: _, cnt: _, data: _,
} => {}
PoisonPill => {
ctx.stop();
}
}
Ok(())
}
25 / 38
Other Patterns to Learn
Old Pattern New Pattern
No inheritance
Use composition!
- Compose data structures
- Compose small Traits
No exceptions Use Result and ?
Data structures are not
Thread safe
Learn to use Arc etc.
Returning Iterators
Don't return things that borrow other things.
This makes life difficult.
26 / 38
Type Classes
In Rust, type classes (Traits) are smaller and more compositional.
pub trait Inhale {
fn sniff(&self);
}
You can implement new Traits for existing types, and have different impl's for
different types.
impl Inhale for String {
fn sniff(&self) {
println!("I sniffed {}", self);
}
}
// Only implemented for specific N subtypes of MyStruct
impl<N: Numeric> Inhale for MyStruct<N> {
fn sniff(&self) {
....
}
}
27 / 38
Project
Build, IDE, Tooling
28 / 38
"Cargo is the best build tool ever"
Almost no dependency conflicts due to multiple dep versioning
Configuration by convention - common directory/file layouts for example
Really simple .toml - no need for XML, functional Scala, etc.
Rarely need code to build anything, even for large projects
[package]
name = "telemetry-subscribers"
version = "0.3.0"
license = "Apache-2.0"
description = "Library for common telemetry and observability functionality"
[dependencies]
console-subscriber = { version = "0.1.6", optional = true }
crossterm = "0.25.0"
once_cell = "1.13.0"
opentelemetry = { version = "0.18.0", features = ["rt-tokio"], optional = true }
29 / 38
IDEs, CI, and Tooling
IDEs/Editors
VSCode, RustRover (IntelliJ),
vim/emacs/etc with Rust Analyzer
Code Coverage VSCode inline, grcov/lcov, Tarpaulin (Linux only)
Slow build times Caching: cargo-chef, rust-cache
Slow test times cargo-nextest
Property Testing proptest
Benchmarking Criterion
https://blog.logrocket.com/optimizing-ci-cd-pipelines-rust-projects/
VSCode's "LiveShare" feature for distributed pair programming is TOP NOTCH.
30 / 38
Rust Resources and Projects
https://github.com/velvia/links/blob/main/rust.md - this is my list of Rust
projects and learning resources
https://github.com/rust-unofficial/awesome-rust
https://www.arewelearningyet.com - ML focused
31 / 38
What do we miss from Scala?
More mature libraries - in some cases: HDFS, etc.
Good streaming libraries - like Monix, Akka Streams etc.
I guess all of Akka
"Less misleading compiler messages"
Rust error messages read better from the CLI, IMO (not an IDE)
32 / 38
Takeaways
It's a long journey but Rust is worth it.
Structuring a project for successful onramp is really important
Think about data structure design early on
Allow plenty of time to ramp up on Rust patterns, tools
We are hiring across multiple roles/levels!
33 / 38
https://velvia.github.io/about
https://github.com/velvia
@evanfchan
IG: @platypus.arts
Thank You Very Much!
34 / 38
Extra slides
35 / 38
Data World is Going Native (from JVM)
The rise of Python and Data Science
Led to AnyScale, Dask, and many other Python-oriented data
frameworks
Rise of newer, developer-friendly native languages (Go, Swift, Rust, etc.)
Migration from Hadoop/HDFS to more cloud-based data architectures
Apache Arrow and other data interchange formats
Hardware architecture trends - end of Moore's Law, rise of GPUs etc
36 / 38
Why We Went with our Own Actors
1. Initial Hackathon prototype used Actix
Actix has its own event-loop / threading model, using Arbiters
Difficult to co-exist with Tokio and configure both
2. Moved to tiny-tokio-actor
Really thin layer on top of Tokio
25% improvement over rdkafka + Tokio + Actix
3. Ultimately wrote our own, 100-line mini Actor framework
tiny-tokio-actor required messages to be Clone so we could not, for
example, send OneShot channels for other actors to reply
Wanted ActorRef<MessageType> instead of ActorRef<ActorType,
MessageType>
supports tell() and ask() semantics
37 / 38
Scala: Object Graphs and Any
class Timeline extends BufferedIterator[Span[Payload]]
final case class Span[+A](start: Timestamp, end: Timestamp, payload: A) {
def mapPayload[B](f: A => B): Span[B] = copy(payload = f(payload))
}
type Event[+A] = Span[EventsAtSpanEnd[A]]
@newtype final case class EventsAtSpanEnd[+A](events: Iterable[A])
BufferedIterator must be on the heap
Each Span Payload is also boxed and on the heap, even for numbers
To be dynamically interpretable, we need BufferedIterator[Span[Any]]
in many places :(
Yes, specialization is possible, at the cost of complexity
38 / 38

More Related Content

What's hot

Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
Versioned State Stores in Kafka Streams with Victoria Xia
Versioned State Stores in Kafka Streams with Victoria XiaVersioned State Stores in Kafka Streams with Victoria Xia
Versioned State Stores in Kafka Streams with Victoria XiaHostedbyConfluent
 
Performance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL DatabasePerformance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL DatabaseTung Nguyen Thanh
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpHostedbyConfluent
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
DDD - DuyLV - VINID - 17.07.2019
DDD - DuyLV - VINID - 17.07.2019DDD - DuyLV - VINID - 17.07.2019
DDD - DuyLV - VINID - 17.07.2019Lê Văn Duy
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용I Goo Lee
 
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...Amazon Web Services
 
Rate limiters in big data systems
Rate limiters in big data systemsRate limiters in big data systems
Rate limiters in big data systemsSandeep Joshi
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registryconfluent
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySparkSpark Summit
 
Spring batch introduction
Spring batch introductionSpring batch introduction
Spring batch introductionAlex Fernandez
 

What's hot (20)

Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Versioned State Stores in Kafka Streams with Victoria Xia
Versioned State Stores in Kafka Streams with Victoria XiaVersioned State Stores in Kafka Streams with Victoria Xia
Versioned State Stores in Kafka Streams with Victoria Xia
 
Performance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL DatabasePerformance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL Database
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
DDD - DuyLV - VINID - 17.07.2019
DDD - DuyLV - VINID - 17.07.2019DDD - DuyLV - VINID - 17.07.2019
DDD - DuyLV - VINID - 17.07.2019
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용
 
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
 
Rate limiters in big data systems
Rate limiters in big data systemsRate limiters in big data systems
Rate limiters in big data systems
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Spring batch
Spring batchSpring batch
Spring batch
 
Presto
PrestoPresto
Presto
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
 
Spring batch introduction
Spring batch introductionSpring batch introduction
Spring batch introduction
 

Similar to Porting a Streaming Pipeline from Scala to Rust

End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Actor model in .NET - Akka.NET
Actor model in .NET - Akka.NETActor model in .NET - Akka.NET
Actor model in .NET - Akka.NETKonrad Dusza
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at ScaleSean Zhong
 
Server side JavaScript: going all the way
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the wayOleg Podsechin
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Thomas Weise
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache sparkRahul Kumar
 
H2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleH2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleSri Ambati
 
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayQuantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayPhil Estes
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinaloscon2007
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinaloscon2007
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streamingphanleson
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterAnne Nicolas
 
Introduction to Real Time Java
Introduction to Real Time JavaIntroduction to Real Time Java
Introduction to Real Time JavaDeniz Oguz
 

Similar to Porting a Streaming Pipeline from Scala to Rust (20)

End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Actor model in .NET - Akka.NET
Actor model in .NET - Akka.NETActor model in .NET - Akka.NET
Actor model in .NET - Akka.NET
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Server side JavaScript: going all the way
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the way
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clusters
 
H2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleH2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt Dowle
 
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayQuantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
 
An Optics Life
An Optics LifeAn Optics Life
An Optics Life
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
 
Introduction to Real Time Java
Introduction to Real Time JavaIntroduction to Real Time Java
Introduction to Real Time Java
 

More from Evan Chan

Designing Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesDesigning Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesEvan Chan
 
Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019Evan Chan
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleEvan Chan
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web ServiceEvan Chan
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkEvan Chan
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Evan Chan
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureEvan Chan
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and SparkEvan Chan
 
Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server TalkEvan Chan
 
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Evan Chan
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkEvan Chan
 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkEvan Chan
 

More from Evan Chan (16)

Designing Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesDesigning Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and Kubernetes
 
Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data Architecture
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and Spark
 
Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server Talk
 
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and Shark
 

Recently uploaded

Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 

Recently uploaded (20)

Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 

Porting a Streaming Pipeline from Scala to Rust

  • 1. Lessons: Porting a Streaming Pipeline from Scala to Rust 2023 Scale by the Bay Evan Chan Principal Engineer - Conviva http://velvia.github.io/presentations/2023-conviva-scala-to-rust 1 / 38
  • 3. Massive Real-time Streaming Analytics 5 trillion events processed per day 800-2000GB/hour (not peak!!) Started with custom Java code went through Spark Streaming and Flink iterations Most backend data components in production are written in Scala Today: 420 pods running custom Akka Streams processors 3 / 38
  • 4. Data World is Going Native and Rust Going native: Python, end of Moore's Law, cloud compute Safe, fast, and high-level abstractions Functional data patterns - map, fold, pattern matching, etc. Static dispatch and no allocations by default PyO3 - Rust is the best way to write native Python extensions JVM Rust projects Spark, Hive DataFusion, Ballista, Amadeus Flink Arroyo, RisingWave, Materialize Kafka/KSQL Fluvio ElasticSearch / Lucene Toshi, MeiliDB Cassandra, HBase Skytable, Sled, Sanakirja... Neo4J TerminusDB, IndraDB 4 / 38
  • 5. About our Architecture graph LR; SAE(Streaming Data Pipeline) Sensors --> Gateways Gateways --> Kafka Kafka --> SAE SAE --> DB[(Metrics Database)] DB --> Dashboards 5 / 38
  • 6. What We Are Porting to Rust graph LR; classDef highlighted fill:#99f,stroke:#333,stroke-width:4px SAE(Streaming Data Pipeline) Sensors:::highlighted --> Gateways:::highlighted Gateways --> Kafka Kafka --> SAE:::highlighted SAE --> DB[(Metrics Database)] DB --> Dashboards graph LR; Notes1(Sensors: consolidate fragmented code base) Notes2(Gateway: Improve on JVM and Go) Notes3(Pipeline: Improve efficiency New operator architecture) Notes1 ~~~ Notes2 Notes2 ~~~ Notes3 6 / 38
  • 7. Our Journey to Rust gantt title From Hackathon to Multiple Teams dateFormat YYYY-MM axisFormat %y-%b section Data Pipeline Hackathon :Small Kafka ingestion project, 2022-11, 30d Scala prototype :2023-02, 6w Initial Rust Port : small team, 2023-04, 45d Bring on more people :2023-07, 8w 20-25 people 4 teams :2023-11, 1w section Gateway Go port :2023-07, 6w Rust port :2023-09, 4w “I like that if it compiles, I know it will work, so it gives confidence.” 7 / 38
  • 8. Promising Rust Hackathon graph LR; Kafka --> RustDeser(Rust Deserializer) RustDeser --> RA(Rust Actors - Lightweight Processing) Measurement Improvement over Scala/Akka Throughput (CPU) 2.6x more Memory used 12x less Mostly I/O-bound lightweight deserialization and processing workload Found out Actix does not work well with Tokio 8 / 38
  • 9. Performance Results - Gateway 9 / 38
  • 10. Key Lessons or Questions What matters for a Rust port? The 4 P's ? People How do we bring developers onboard? Performance How do I get performance? Data structures? Static dispatch? Patterns What coding patterns port well from Scala? Async? Project How do I build? Tooling, IDEs? 10 / 38
  • 11. People How do we bring developers onboard? 11 / 38
  • 12. A Phased Rust Bringup We ported our main data pipeline in two phases: Phase Team Rust Expertise Work First 3-5, very senior 1-2 with significant Rust Port core project components Second 10-15, mixed, distributed Most with zero Rust Smaller, broken down tasks Have organized list of learning resources 2-3 weeks to learn Rust and come up to speed 12 / 38
  • 13. Difficulties: Lifetimes Compiler errors Porting previous patterns Ownership and async etc. How we helped: Good docs Start with tests ChatGPT! Rust Book Office hours Lots of detailed reviews Split project into async and sync cores Overcoming Challenges 13 / 38
  • 14. Performance Data structures, static dispatch, etc. "I enjoy the fact that the default route is performant. It makes you write performant code, and if you go out the way, it becomes explicit (e.g., with dyn, Boxed, or clone etc). " 14 / 38
  • 15. Porting from Scala: Huge Performance Win graph LR; classDef highlighted fill:#99f,stroke:#333,stroke-width:4px SAE(Streaming Data Pipeline) Sensors --> Gateways Gateways --> Kafka Kafka --> SAE:::highlighted SAE --> DB[(Metrics Database)] DB --> Dashboards CPU-bound, programmable, heavy data processing Neither Rust nor Scala is productionized nor optimized Same architecture and same input/outputs Scala version was not designed for speed, lots of objects Rust: we chose static dispatch and minimizing allocations Type of comparison Improvement over Scala Throughput, end to end 22x Throughput, single-threaded microbenchmark >= 40x 15 / 38
  • 16. Building a Flexible Data Pipeline graph LR; RawEvents(Raw Events) RawEvents -->| List of numbers | Extract1 RawEvents --> Extract2 Extract1 --> DoSomeMath Extract2 --> TransformSomeFields DoSomeMath --> Filter1 TransformSomeFields --> Filter1 Filter1 --> MoreProcessing An interpreter passes time-ordered data between flexible DAG of operators. Span1 Start time: 1000 End time: 1100 Events: ["start", "click"] Span2 Start time: 1100 End time: 1300 Events: ["ad_load"] 16 / 38
  • 17. Scala: Object Graph on Heap graph TB; classDef default font- size:24px ArraySpan["`Array[Span]`"] TL(Timeline - Seq) --> ArraySpan ArraySpan --> Span1["`Span(start, end, Payload)`"] ArraySpan --> Span2["`Span(start, end, Payload)`"] Span1 --> EventsAtSpanEnd("`Events(Seq[A])`") EventsAtSpanEnd --> ArrayEvent["`Array[A]`"] Rust: mostly stack based / 0 alloc: flowchart TB; subgraph Timeline subgraph OutputSpans subgraph Span1 subgraph Events EvA ~~~ EvB end TimeInterval ~~~ Events end subgraph Span2 Time2 ~~~ Events2 end Span1 ~~~ Span2 end DataType ~~~ OutputSpans end Data Structures: Scala vs Rust 17 / 38
  • 18. Rust: Using Enums and Avoiding Boxing pub enum Timeline { EventNumber(OutputSpans<EventsAtEnd<f64>>), EventBoolean(OutputSpans<EventsAtEnd<bool>>), EventString(OutputSpans<EventsAtEnd<DataString>>), } type OutputSpans<V> = SmallVec<[Spans<V>; 2]>; pub struct Span<SV: SpanValue> { pub time: TimeInterval, pub value: SV, } pub struct EventsAtEnd<V>(SmallVec<[V; 1]>); In the above, the Timeline enum can fit entirely in the stack and avoid all boxing and allocations, if: The number of spans is very small, below limit set in code The number of events in each span is very small (1 in this case, which is the common case) The base type is a primitive, or a string which is below a certain length 18 / 38
  • 19. Avoiding Allocations using SmallVec and SmallString SmallVec is something like this: pub enum SmallVec<T, const N: usize> { Stack([T; N]), Heap(Vec<T>), } The enum can hold up to N items inline in an array with no allocations, but switches to the Heap variant if the number of items exceeds N. There are various crates for small strings and other data structures. 19 / 38
  • 20. Static vs Dynamic Dispatch Often one will need to work with many different structs that implement a Trait -- for us, different operator implementations supporting different types. Static dispatch and inlined code is much faster. 1. Monomorphisation using generics fn execute_op<O: Operator>(op: O) -> Result<...> Compiler creates a new instance of execute_op for every different O Only works when you know in advance what Operator to pass in 2. Use Enums and enum_dispatch fn execute_op(op: OperatorEnum) -> Result<...> 3. Dynamic dispatch fn execute_op(op: Box<dyn Operator>) -> Result<...> fn execute_op(op: &dyn Operator) -> Result<...> (avoids allocation) 4. Function wrapping Embedding functions in a generic struct 20 / 38
  • 21. enum_dispatch Suppose you have trait KnobControl { fn set_position(&mut self, value: f64); fn get_value(&self) -> f64; } struct LinearKnob { position: f64, } struct LogarithmicKnob { position: f64, } impl KnobControl for LinearKnob... enum_dispatch lets you do this: #[enum_dispatch] trait KnobControl { //... } 21 / 38
  • 22. Function wrapping Static function wrapping - no generics pub struct OperatorWrapper { name: String, func: fn(input: &Data) -> Data, } Need a generic - but accepts closures pub struct OperatorWrapper<F> where F: Fn(input: &Data) -> Data { name: String, func: F, } 22 / 38
  • 24. Rust Async: Different Paradigms "Async: It is well designed... Yes, it is still pretty complicated piece of code, but the logic or the framework is easier to grasp compared to other languages." Having to use Arc: Data Structures are not Thread-safe by default! Scala Rust Futures futures, async functions ?? async-await Actors(Akka) Actix, Bastion, etc. Async streams Tokio streams Reactive (Akka streams, Monix, ZIO) reactive_rs, rxRust, etc. 24 / 38
  • 25. Replacing Akka: Actors in Rust Actix threading model doesn't mix well with Tokio We moved to tiny-tokio-actor, then wrote our own pub struct AnomalyActor {} #[async_trait] impl ChannelActor<Anomaly, AnomalyActorError> for AnomalyActor { async fn handle( &mut self, msg: Anomaly, ctx: &mut ActorContext<Anomaly>, ) -> Result<(), Report<AnomalyActorError>> { use Anomaly::*; match msg { QuantityOverflowAnomaly { ctx: _, ts: _, qual: _, qty: _, cnt: _, data: _, } => {} PoisonPill => { ctx.stop(); } } Ok(()) } 25 / 38
  • 26. Other Patterns to Learn Old Pattern New Pattern No inheritance Use composition! - Compose data structures - Compose small Traits No exceptions Use Result and ? Data structures are not Thread safe Learn to use Arc etc. Returning Iterators Don't return things that borrow other things. This makes life difficult. 26 / 38
  • 27. Type Classes In Rust, type classes (Traits) are smaller and more compositional. pub trait Inhale { fn sniff(&self); } You can implement new Traits for existing types, and have different impl's for different types. impl Inhale for String { fn sniff(&self) { println!("I sniffed {}", self); } } // Only implemented for specific N subtypes of MyStruct impl<N: Numeric> Inhale for MyStruct<N> { fn sniff(&self) { .... } } 27 / 38
  • 29. "Cargo is the best build tool ever" Almost no dependency conflicts due to multiple dep versioning Configuration by convention - common directory/file layouts for example Really simple .toml - no need for XML, functional Scala, etc. Rarely need code to build anything, even for large projects [package] name = "telemetry-subscribers" version = "0.3.0" license = "Apache-2.0" description = "Library for common telemetry and observability functionality" [dependencies] console-subscriber = { version = "0.1.6", optional = true } crossterm = "0.25.0" once_cell = "1.13.0" opentelemetry = { version = "0.18.0", features = ["rt-tokio"], optional = true } 29 / 38
  • 30. IDEs, CI, and Tooling IDEs/Editors VSCode, RustRover (IntelliJ), vim/emacs/etc with Rust Analyzer Code Coverage VSCode inline, grcov/lcov, Tarpaulin (Linux only) Slow build times Caching: cargo-chef, rust-cache Slow test times cargo-nextest Property Testing proptest Benchmarking Criterion https://blog.logrocket.com/optimizing-ci-cd-pipelines-rust-projects/ VSCode's "LiveShare" feature for distributed pair programming is TOP NOTCH. 30 / 38
  • 31. Rust Resources and Projects https://github.com/velvia/links/blob/main/rust.md - this is my list of Rust projects and learning resources https://github.com/rust-unofficial/awesome-rust https://www.arewelearningyet.com - ML focused 31 / 38
  • 32. What do we miss from Scala? More mature libraries - in some cases: HDFS, etc. Good streaming libraries - like Monix, Akka Streams etc. I guess all of Akka "Less misleading compiler messages" Rust error messages read better from the CLI, IMO (not an IDE) 32 / 38
  • 33. Takeaways It's a long journey but Rust is worth it. Structuring a project for successful onramp is really important Think about data structure design early on Allow plenty of time to ramp up on Rust patterns, tools We are hiring across multiple roles/levels! 33 / 38
  • 36. Data World is Going Native (from JVM) The rise of Python and Data Science Led to AnyScale, Dask, and many other Python-oriented data frameworks Rise of newer, developer-friendly native languages (Go, Swift, Rust, etc.) Migration from Hadoop/HDFS to more cloud-based data architectures Apache Arrow and other data interchange formats Hardware architecture trends - end of Moore's Law, rise of GPUs etc 36 / 38
  • 37. Why We Went with our Own Actors 1. Initial Hackathon prototype used Actix Actix has its own event-loop / threading model, using Arbiters Difficult to co-exist with Tokio and configure both 2. Moved to tiny-tokio-actor Really thin layer on top of Tokio 25% improvement over rdkafka + Tokio + Actix 3. Ultimately wrote our own, 100-line mini Actor framework tiny-tokio-actor required messages to be Clone so we could not, for example, send OneShot channels for other actors to reply Wanted ActorRef<MessageType> instead of ActorRef<ActorType, MessageType> supports tell() and ask() semantics 37 / 38
  • 38. Scala: Object Graphs and Any class Timeline extends BufferedIterator[Span[Payload]] final case class Span[+A](start: Timestamp, end: Timestamp, payload: A) { def mapPayload[B](f: A => B): Span[B] = copy(payload = f(payload)) } type Event[+A] = Span[EventsAtSpanEnd[A]] @newtype final case class EventsAtSpanEnd[+A](events: Iterable[A]) BufferedIterator must be on the heap Each Span Payload is also boxed and on the heap, even for numbers To be dynamically interpretable, we need BufferedIterator[Span[Any]] in many places :( Yes, specialization is possible, at the cost of complexity 38 / 38