SlideShare a Scribd company logo
1 of 32
Download to read offline
Quark: A Purely-Functional
Scala DSL for Data
Processing & Analytics
John A. De Goes
@jdegoes -
Apache Spark
Apache Spark is a fast and general engine for big data
processing, with built-in modules for streaming, SQL,
machine learning and graph processing.
val textFile = sc.textFile("hdfs://...")
val counts =
textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
Spark Sucks
— Functional-ish
— Exceptions, typecasts
— SparkContext
— Serializable
— Unsafe type-safe programs
— Second-class support for databases
— Dependency hell (>100)
— Painful debugging
— Implementation-dependent performance
Why Does Spark Have to Suck?
val textFile = sc.textFile("hdfs://...")
val counts =
textFile.flatMap(line => line.split(" ")) <---- Where Spark goes wrong
.map(word => (word, 1)) <---- Where Spark goes wrong
.reduceByKey(_ + _) <---- Where Spark goes wrong
— Purely functional
— No exceptions, no casts, no nulls
— No global variables
— No serialization
— Safe type-safe programs
— First-class support for databases
— Few dependencies
— Better debugging
— Implementation-independent performance
Rule #1 in Functional
Don't solve the problem, describe the solution.
AKA the "Do Nothing" rule
=> Don't compute, embed a compiled language into
Quark is a Scala DSL built on Quasar Analytics, a general-
purpose compiler for translating data processing over
semi-structured data into efficient plans that execute
100% inside the target infrastructure.
val textFile = Dataset.load("...")
val counts =
textFile.flatMap(line => line.typed[Str].split(" "))
.map(word => (word, 1))
More Quark
val dataset = Dataset.load("/prod/profiles")
val averageAge = dataset.groupBy([Str]).map(_.age[Int]).reduceBy(_.average)
Quark Targets
One DSL to Rule Them All
— MongoDB
— Couchbase
— MarkLogic
— Hadoop / HDFS
— Add your connector here!
Both Quark and Quasar Analytics are purely-functional,
open source projects written in 100% Scala.
How To DSL
Adding Integers
sealed trait Expr
final case class Integer(v: Int) extends Expr
final case class Addition(v: Expr, v: Expr) extends Expr
def int(v: Int): Expr = Integer(v)
def add(l: Expr, r: Expr): Expr = Addition(l, r)
add(add(int(1), int(2)), int(3)) : Expr
def interpret(e: Expr): Int = e match {
case Integer(v) => v
case Addition(l, r) => interpret(l) + interpret(r)
def serialize(v: Expr): Json = ???
def deserialize(v: Json): Expr = ???
How To DSL
Adding Strings
sealed trait Expr
final case class Integer(v: Int) extends Expr
final case class Addition(l: Expr, r: Expr) extends Expr // Uh, oh!
final case class Str(v: String) extends Expr
final case class StringConcat(l: Expr, r: Expr) extends Expr // Uh, oh!
How To DSL
Phantom Type
sealed trait Expr[A]
final case class Integer(v: Int) extends Expr[Int]
final case class Addition(l: Expr[Int], r: Expr[Int]) extends Expr[Int]
final case class Str(v: String) extends Expr[String]
final case class StringConcat(l: Expr[String], r: Expr[String]) extends Expr[String]
def interpret[A](e: Expr[A]): A = e match {
case Integer(v) => v
case Addition(l, r) => interpret(l) + interpret(r)
case Str(v) => v
case StringConcat(l, r) => interpret(l) ++ interpret(r)
def serialize[A](v: Expr[A]): Json = ???
def deserialize[Z](v: Json): Expr[A] forSome { type A } = ???
How To DSL
GADTs in Scala still have bugs
SI-8563, SI-9345, SI-6680
How To DSL
Finally Tagless
trait Expr[F[_]] {
def int(v: Int): F[Int]
def str(v: String): F[String]
def add(l: F[Int], r: F[Int]): F[Int]
def concat(l: F[String], r: F[String]): F[String]
trait Dsl[A] {
def apply[F[_]](implicit F: Expr[F]): F[A]
def int(v: Int): Dsl[Int] = new Dsl[Int] {
def apply[F[_]](implicit F: Expr[F]): F[Int] =
def add(l: Dsl[Int], r: Dsl[Int]): Dsl[Int] = new Dsl[Int] {
def apply[F[_]](implicit F: Expr[F]): F[Int] = F.add(l.apply[F], r.apply[F])
// ...
How To DSL
Finally Tagless
type Id[A] = A
def interpret: Expr[Id] = new Expr[Id] {
def int(v: Int): Id[Int] = v
def str(v: String): Id[String] = v
def add(l: Id[Int], r: Id[Int]): Id[Int] = l + r
def concat(l: Id[String], r: Id[String]): Id[String] = l + r
add(int(1), int(2)).apply(interpret) // Id(3)
final case class Const[A, B](a: A)
def serialize: Expr[Const[Json, ?]] = ???
def deserialize[F[_]: Expr](json: Json): F[A] forSome { type A } = ???
Quark 101
The Building Blocks
— Type. Represents a reified type of an element in a dataset.
— **Dataset[A]**. Represents a dataset, produced by successive
application of set-level operations (SetOps). Describes a directed-
acyclic graph.
— **MappingFunc[A, B]**. Represents a function from A to B that is
produced by successive application of mapping-level operations
(MapOps) to the input.
— **ReduceFunc[A, B]**. Represents a reduction from A to B, produced
by application of reduction-level operations (ReduceOps) to the input.
Let's Build Us a Mini-Quark!
Type System
sealed trait Type
object Type {
final case class Unknown() extends Type
final case class Timestamp() extends Type
final case class Date() extends Type
final case class Time() extends Type
final case class Interval() extends Type
final case class Int() extends Type
final case class Dec() extends Type
final case class Str() extends Type
final case class Map[A <: Type, B <: Type](key: A, value: B) extends Type
final case class Arr[A <: Type](element: A) extends Type
final case class Tuple2[A <: Type, B <: Type](_1: A, _2: B) extends Type
final case class Bool() extends Type
final case class Null() extends Type
type UnknownMap = Map[Unknown, Unknown]
val UnknownMap : UnknownMap = Map(Unknown(), Unknown())
type UnknownArr = Arr[Unknown]
val UnknownArr : UnknownArr = Arr(Unknown())
type Record[A <: Type] = Map[Str, A]
type UnknownRecord = Record[Unknown]
Set-Level Operations
sealed trait SetOps[F[_]] {
def read(path: String): F[Unknown]
sealed trait Dataset[A] {
def apply[F[_]](implicit F: SetOps[F]): F[A]
object Dataset {
def read(path: String): Dataset[Unknown] = new Dataset[Unknown] {
def apply[F[_]](implicit F: SetOps[F]): F[Unknown] =
sealed trait SetOps[F[_]] {
def read(path: String): F[Unknown]
def map[A, B](v: F[A], f: ???) // What goes here?
Mapping: Attempt #1
sealed trait SetOps[F[_]] {
def read(path: String): F[Unknown]
def map[A, B](v: F[A], f: F[A] => F[B]) // Doesn't really work...
Mapping: Attempt #2
sealed trait MappingFunc[A, B] {
def apply[F[_]](v: F[A])(implicit F: MappingOps[F]): F[B]
trait MappingOps[F[_]] {
def str(v: String): F[Type.Str]
def project[K <: Type, V <: Type](v: F[Type.Map[K, V]], k: F[K]): F[V]
def add(l: F[Type.Int], r: F[Type.Int]): F[Type.Int]
def length[A <: Type](v: F[Type.Arr[A]]): F[Type.Int]
object MappingOps {
def id[A]: MappingFunc[A, B] = new MappingFunc[A, A] {
def apply[F[_]](v: F[A])(implicit F: MappingOps[F]): F[A] = v
Mapping: Attempt #2
trait SetOps[F[_]] {
def read(path: String): F[Unknown]
def map[A, B](v: F[A], f: MappingFunc[A, B]): F[B] // Yay!!!
Dataset: Mapping
sealed trait Dataset[A] {
def apply[F[_]](implicit F: SetOps[F]): F[A]
def map[B](f: ???): Dataset[B] = ??? // What goes here???
object Dataset {
def read(path: String): Dataset[Unknown] = new Dataset[Unknown] {
def apply[F[_]](implicit F: SetOps[F]): F[Unknown] =
Dataset: Mapping Attempt #1
sealed trait Dataset[A] { self =>
def apply[F[_]](implicit F: SetOps[F]): F[A]
def map[B](f: MappingFunc[A, B]): Dataset[B] = new Dataset[B] {
def apply[F[_]](implicit F: SetOps[F]): F[B] =, f)
object Dataset {
def read(path: String): Dataset[Unknown] = new Dataset[Unknown] {
def apply[F[_]](implicit F: SetOps[F]): F[Unknown] =
// // Cannot ever work!
// => v.profits[Dec] - v.losses[Dec]) // Cannot ever work!
Dataset: Mapping Attempt #2
sealed trait Dataset[A] {
def apply[F[_]](implicit F: SetOps[F]): F[A]
def map[B](f: MappingFunc[A, A] => MappingFunc[A, B]): Dataset[B] = new Dataset[B] {
def apply[F[_]](implicit F: SetOps[F]): F[B] =, f([A]))
object Dataset {
def read(path: String): Dataset[Unknown] = new Dataset[Unknown] {
def apply[F[_]](implicit F: SetOps[F]): F[Unknown] =
// // Works with right methods on MappingFunc!
// => v.profits[Dec] - v.losses[Dec]) // Works with right methods on MappingFunc!
Dataset: Mapping Binary Operators
val netProfit = => v.netRevenue[Dec] - v.netCosts[Dec])
MappingFuncs Are Arrows!
trait MappingFunc[A <: Type, B <: Type] extends Dynamic { self =>
import MappingFunc.Case
def apply[F[_]: MappingOps](v: F[A]): F[B]
def >>> [C <: Type](that: MappingFunc[B, C]): MappingFunc[A, C] = new MappingFunc[A, C] {
def apply[F[_]: MappingOps](v: F[A]): F[C] = that.apply[F](self.apply[F](v))
def + (that: MappingFunc[A, B])(implicit W: NumberLike[B]): MappingFunc[A, B] = new MappingFunc[A, B] {
def apply[F[_]: MappingOps](v: F[A]): F[B] = MappingOps[F].add(self(v), that(v))
def - (that: MappingFunc[A, B])(implicit W: NumberLike[B]): MappingFunc[A, B] = new MappingFunc[A, B] {
def apply[F[_]: MappingOps](v: F[A]): F[B] = MappingOps[F].subtract(self(v), that(v))
Applicative Composition
MappingFunc[A, B]
A -----------------------------B
 / MappingFunc[A, B ⊕ C]
MappingFunc[A, C]  /
Learn More
— Finally Tagless:
— Quark:
— Quasar:
@jdegoes -

More Related Content

What's hot

Apostila Linguagens Formais e Autômatos (LFA)
Apostila Linguagens Formais e Autômatos (LFA)Apostila Linguagens Formais e Autômatos (LFA)
Apostila Linguagens Formais e Autômatos (LFA)Ricardo Terra
HTML - Aula 01 - Estrutura básica e tags básicas no html
HTML - Aula 01 - Estrutura básica e tags básicas no htmlHTML - Aula 01 - Estrutura básica e tags básicas no html
HTML - Aula 01 - Estrutura básica e tags básicas no htmlTiago Luiz Ribeiro da Silva
Curso CSS 3 - Aula Introdutória com conceitos básicos
Curso CSS 3 - Aula Introdutória com conceitos básicosCurso CSS 3 - Aula Introdutória com conceitos básicos
Curso CSS 3 - Aula Introdutória com conceitos básicosTiago Antônio da Silva
O que há de incrível sobre o Flutter
O que há de incrível sobre o FlutterO que há de incrível sobre o Flutter
O que há de incrível sobre o FlutterWiliam Buzatto
Pesquisa Operacional - Aula 06 - Dualidade
Pesquisa Operacional - Aula 06 - DualidadePesquisa Operacional - Aula 06 - Dualidade
Pesquisa Operacional - Aula 06 - DualidadeLeinylson Fontinele
Programando em python interfaces graficas com tk
Programando em python   interfaces graficas com tkProgramando em python   interfaces graficas com tk
Programando em python interfaces graficas com tksamuelthiago
Aprendendo a programar - Programação Procedural vs OOP
Aprendendo a programar - Programação Procedural vs OOPAprendendo a programar - Programação Procedural vs OOP
Aprendendo a programar - Programação Procedural vs OOPLeonardo Bastos
Entenda o ciclo de vida das entidades jpa
Entenda o ciclo de vida das entidades jpaEntenda o ciclo de vida das entidades jpa
Entenda o ciclo de vida das entidades jpaMoisesInacio
Introdução ao Android
Introdução ao AndroidIntrodução ao Android
Introdução ao AndroidJanynne Gomes
Introdução ao Desenvolvimento Android
Introdução ao Desenvolvimento AndroidIntrodução ao Desenvolvimento Android
Introdução ao Desenvolvimento AndroidJosé Alexandre Macedo
Java orientação a objetos (associacao, composicao, agregacao)
Java   orientação a objetos (associacao, composicao, agregacao)Java   orientação a objetos (associacao, composicao, agregacao)
Java orientação a objetos (associacao, composicao, agregacao)Armando Daniel

What's hot (20)

Apostila Linguagens Formais e Autômatos (LFA)
Apostila Linguagens Formais e Autômatos (LFA)Apostila Linguagens Formais e Autômatos (LFA)
Apostila Linguagens Formais e Autômatos (LFA)
Introdução ao Prolog
Introdução ao PrologIntrodução ao Prolog
Introdução ao Prolog
HTML - Aula 01 - Estrutura básica e tags básicas no html
HTML - Aula 01 - Estrutura básica e tags básicas no htmlHTML - Aula 01 - Estrutura básica e tags básicas no html
HTML - Aula 01 - Estrutura básica e tags básicas no html
Curso CSS 3 - Aula Introdutória com conceitos básicos
Curso CSS 3 - Aula Introdutória com conceitos básicosCurso CSS 3 - Aula Introdutória com conceitos básicos
Curso CSS 3 - Aula Introdutória com conceitos básicos
O que há de incrível sobre o Flutter
O que há de incrível sobre o FlutterO que há de incrível sobre o Flutter
O que há de incrível sobre o Flutter
Python Interface Gráfica Tkinter
Python Interface Gráfica TkinterPython Interface Gráfica Tkinter
Python Interface Gráfica Tkinter
Pesquisa Operacional - Aula 06 - Dualidade
Pesquisa Operacional - Aula 06 - DualidadePesquisa Operacional - Aula 06 - Dualidade
Pesquisa Operacional - Aula 06 - Dualidade
Programando em python interfaces graficas com tk
Programando em python   interfaces graficas com tkProgramando em python   interfaces graficas com tk
Programando em python interfaces graficas com tk
Aula03 - JavaScript
Aula03 - JavaScriptAula03 - JavaScript
Aula03 - JavaScript
React - Introdução
React - IntroduçãoReact - Introdução
React - Introdução
Aprendendo a programar - Programação Procedural vs OOP
Aprendendo a programar - Programação Procedural vs OOPAprendendo a programar - Programação Procedural vs OOP
Aprendendo a programar - Programação Procedural vs OOP
Entenda o ciclo de vida das entidades jpa
Entenda o ciclo de vida das entidades jpaEntenda o ciclo de vida das entidades jpa
Entenda o ciclo de vida das entidades jpa
Introdução ao Android
Introdução ao AndroidIntrodução ao Android
Introdução ao Android
Aula 7 banco de dados
Aula 7   banco de dadosAula 7   banco de dados
Aula 7 banco de dados
Introdução ao Desenvolvimento Android
Introdução ao Desenvolvimento AndroidIntrodução ao Desenvolvimento Android
Introdução ao Desenvolvimento Android
Curso de Desenvolvimento Web - Módulo 03 - JavaScript
Curso de Desenvolvimento Web - Módulo 03 - JavaScriptCurso de Desenvolvimento Web - Módulo 03 - JavaScript
Curso de Desenvolvimento Web - Módulo 03 - JavaScript
Java orientação a objetos (associacao, composicao, agregacao)
Java   orientação a objetos (associacao, composicao, agregacao)Java   orientação a objetos (associacao, composicao, agregacao)
Java orientação a objetos (associacao, composicao, agregacao)

Similar to Quark: A Purely-Functional Scala DSL for Data Processing & Analytics

Scala Functional Patterns
Scala Functional PatternsScala Functional Patterns
Scala Functional Patternsleague
Generic Functional Programming with Type Classes
Generic Functional Programming with Type ClassesGeneric Functional Programming with Type Classes
Generic Functional Programming with Type ClassesTapio Rautonen
Fp in scala with adts part 2
Fp in scala with adts part 2Fp in scala with adts part 2
Fp in scala with adts part 2Hang Zhao
Modular Module Systems
Modular Module SystemsModular Module Systems
Modular Module Systemsleague
Scala - where objects and functions meet
Scala - where objects and functions meetScala - where objects and functions meet
Scala - where objects and functions meetMario Fusco
ITT 2015 - Saul Mora - Object Oriented Function Programming
ITT 2015 - Saul Mora - Object Oriented Function ProgrammingITT 2015 - Saul Mora - Object Oriented Function Programming
ITT 2015 - Saul Mora - Object Oriented Function ProgrammingIstanbul Tech Talks
Functions, Types, Programs and Effects
Functions, Types, Programs and EffectsFunctions, Types, Programs and Effects
Functions, Types, Programs and EffectsRaymond Roestenburg
Fp in scala part 2
Fp in scala part 2Fp in scala part 2
Fp in scala part 2Hang Zhao
The Essence of the Iterator Pattern
The Essence of the Iterator PatternThe Essence of the Iterator Pattern
The Essence of the Iterator PatternEric Torreborre
The Essence of the Iterator Pattern (pdf)
The Essence of the Iterator Pattern (pdf)The Essence of the Iterator Pattern (pdf)
The Essence of the Iterator Pattern (pdf)Eric Torreborre
Scalapeno18 - Thinking Less with Scala
Scalapeno18 - Thinking Less with ScalaScalapeno18 - Thinking Less with Scala
Scalapeno18 - Thinking Less with ScalaDaniel Sebban
Introduction to Functional Programming with Scala
Introduction to Functional Programming with ScalaIntroduction to Functional Programming with Scala
Introduction to Functional Programming with Scalapramode_ce
Beginning Scala Svcc 2009
Beginning Scala Svcc 2009Beginning Scala Svcc 2009
Beginning Scala Svcc 2009David Pollak
TI1220 Lecture 6: First-class Functions
TI1220 Lecture 6: First-class FunctionsTI1220 Lecture 6: First-class Functions
TI1220 Lecture 6: First-class FunctionsEelco Visser

Similar to Quark: A Purely-Functional Scala DSL for Data Processing & Analytics (20)

Scala Functional Patterns
Scala Functional PatternsScala Functional Patterns
Scala Functional Patterns
Generic Functional Programming with Type Classes
Generic Functional Programming with Type ClassesGeneric Functional Programming with Type Classes
Generic Functional Programming with Type Classes
Fp in scala with adts part 2
Fp in scala with adts part 2Fp in scala with adts part 2
Fp in scala with adts part 2
Modular Module Systems
Modular Module SystemsModular Module Systems
Modular Module Systems
Scala best practices
Scala best practicesScala best practices
Scala best practices
Spark workshop
Spark workshopSpark workshop
Spark workshop
Scala - where objects and functions meet
Scala - where objects and functions meetScala - where objects and functions meet
Scala - where objects and functions meet
ITT 2015 - Saul Mora - Object Oriented Function Programming
ITT 2015 - Saul Mora - Object Oriented Function ProgrammingITT 2015 - Saul Mora - Object Oriented Function Programming
ITT 2015 - Saul Mora - Object Oriented Function Programming
Functions, Types, Programs and Effects
Functions, Types, Programs and EffectsFunctions, Types, Programs and Effects
Functions, Types, Programs and Effects
Fp in scala part 2
Fp in scala part 2Fp in scala part 2
Fp in scala part 2
C# programming
C# programming C# programming
C# programming
SDC - Einführung in Scala
SDC - Einführung in ScalaSDC - Einführung in Scala
SDC - Einführung in Scala
The Essence of the Iterator Pattern
The Essence of the Iterator PatternThe Essence of the Iterator Pattern
The Essence of the Iterator Pattern
The Essence of the Iterator Pattern (pdf)
The Essence of the Iterator Pattern (pdf)The Essence of the Iterator Pattern (pdf)
The Essence of the Iterator Pattern (pdf)
Scala for curious
Scala for curiousScala for curious
Scala for curious
Scalapeno18 - Thinking Less with Scala
Scalapeno18 - Thinking Less with ScalaScalapeno18 - Thinking Less with Scala
Scalapeno18 - Thinking Less with Scala
Introduction to Functional Programming with Scala
Introduction to Functional Programming with ScalaIntroduction to Functional Programming with Scala
Introduction to Functional Programming with Scala
Beginning Scala Svcc 2009
Beginning Scala Svcc 2009Beginning Scala Svcc 2009
Beginning Scala Svcc 2009
Scala Paradigms
Scala ParadigmsScala Paradigms
Scala Paradigms
TI1220 Lecture 6: First-class Functions
TI1220 Lecture 6: First-class FunctionsTI1220 Lecture 6: First-class Functions
TI1220 Lecture 6: First-class Functions

More from John De Goes

Refactoring Functional Type Classes
Refactoring Functional Type ClassesRefactoring Functional Type Classes
Refactoring Functional Type ClassesJohn De Goes
One Monad to Rule Them All
One Monad to Rule Them AllOne Monad to Rule Them All
One Monad to Rule Them AllJohn De Goes
Error Management: Future vs ZIO
Error Management: Future vs ZIOError Management: Future vs ZIO
Error Management: Future vs ZIOJohn De Goes
Atomically { Delete Your Actors }
Atomically { Delete Your Actors }Atomically { Delete Your Actors }
Atomically { Delete Your Actors }John De Goes
The Death of Final Tagless
The Death of Final TaglessThe Death of Final Tagless
The Death of Final TaglessJohn De Goes
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: RebirthJohn De Goes
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: RebirthJohn De Goes
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional ProgrammingZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional ProgrammingJohn De Goes
Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018John De Goes
Scalaz 8: A Whole New Game
Scalaz 8: A Whole New GameScalaz 8: A Whole New Game
Scalaz 8: A Whole New GameJohn De Goes
Scalaz 8 vs Akka Actors
Scalaz 8 vs Akka ActorsScalaz 8 vs Akka Actors
Scalaz 8 vs Akka ActorsJohn De Goes
Orthogonal Functional Architecture
Orthogonal Functional ArchitectureOrthogonal Functional Architecture
Orthogonal Functional ArchitectureJohn De Goes
The Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemThe Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemJohn De Goes
Post-Free: Life After Free Monads
Post-Free: Life After Free MonadsPost-Free: Life After Free Monads
Post-Free: Life After Free MonadsJohn De Goes
Streams for (Co)Free!
Streams for (Co)Free!Streams for (Co)Free!
Streams for (Co)Free!John De Goes
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...John De Goes
Halogen: Past, Present, and Future
Halogen: Past, Present, and FutureHalogen: Past, Present, and Future
Halogen: Past, Present, and FutureJohn De Goes
All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!John De Goes

More from John De Goes (20)

Refactoring Functional Type Classes
Refactoring Functional Type ClassesRefactoring Functional Type Classes
Refactoring Functional Type Classes
One Monad to Rule Them All
One Monad to Rule Them AllOne Monad to Rule Them All
One Monad to Rule Them All
Error Management: Future vs ZIO
Error Management: Future vs ZIOError Management: Future vs ZIO
Error Management: Future vs ZIO
Atomically { Delete Your Actors }
Atomically { Delete Your Actors }Atomically { Delete Your Actors }
Atomically { Delete Your Actors }
The Death of Final Tagless
The Death of Final TaglessThe Death of Final Tagless
The Death of Final Tagless
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: Rebirth
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: Rebirth
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional ProgrammingZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Queue
ZIO QueueZIO Queue
ZIO Queue
Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Scalaz 8: A Whole New Game
Scalaz 8: A Whole New GameScalaz 8: A Whole New Game
Scalaz 8: A Whole New Game
Scalaz 8 vs Akka Actors
Scalaz 8 vs Akka ActorsScalaz 8 vs Akka Actors
Scalaz 8 vs Akka Actors
Orthogonal Functional Architecture
Orthogonal Functional ArchitectureOrthogonal Functional Architecture
Orthogonal Functional Architecture
The Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemThe Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect System
Post-Free: Life After Free Monads
Post-Free: Life After Free MonadsPost-Free: Life After Free Monads
Post-Free: Life After Free Monads
Streams for (Co)Free!
Streams for (Co)Free!Streams for (Co)Free!
Streams for (Co)Free!
MTL Versus Free
MTL Versus FreeMTL Versus Free
MTL Versus Free
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
Halogen: Past, Present, and Future
Halogen: Past, Present, and FutureHalogen: Past, Present, and Future
Halogen: Past, Present, and Future
All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!

Recently uploaded

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix

Recently uploaded (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .

Quark: A Purely-Functional Scala DSL for Data Processing & Analytics

  • 1. Quark: A Purely-Functional Scala DSL for Data Processing & Analytics John A. De Goes @jdegoes -
  • 2. Apache Spark Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)
  • 3. Spark Sucks — Functional-ish — Exceptions, typecasts — SparkContext — Serializable — Unsafe type-safe programs — Second-class support for databases — Dependency hell (>100) — Painful debugging — Implementation-dependent performance
  • 4. Why Does Spark Have to Suck? Computation val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) <---- Where Spark goes wrong .map(word => (word, 1)) <---- Where Spark goes wrong .reduceByKey(_ + _) <---- Where Spark goes wrong
  • 5. WWFPD? — Purely functional — No exceptions, no casts, no nulls — No global variables — No serialization — Safe type-safe programs — First-class support for databases — Few dependencies — Better debugging — Implementation-independent performance
  • 6. Rule #1 in Functional Programming Don't solve the problem, describe the solution. AKA the "Do Nothing" rule => Don't compute, embed a compiled language into Scala
  • 7. Quark Compilation Quark is a Scala DSL built on Quasar Analytics, a general- purpose compiler for translating data processing over semi-structured data into efficient plans that execute 100% inside the target infrastructure. val textFile = Dataset.load("...") val counts = textFile.flatMap(line => line.typed[Str].split(" ")) .map(word => (word, 1)) .reduceByKey(_.sum)
  • 8. More Quark Compilation val dataset = Dataset.load("/prod/profiles") val averageAge = dataset.groupBy([Str]).map(_.age[Int]).reduceBy(_.average)
  • 9. Quark Targets One DSL to Rule Them All — MongoDB — Couchbase — MarkLogic — Hadoop / HDFS — Add your connector here!
  • 10. Both Quark and Quasar Analytics are purely-functional, open source projects written in 100% Scala.
  • 11. How To DSL Adding Integers sealed trait Expr final case class Integer(v: Int) extends Expr final case class Addition(v: Expr, v: Expr) extends Expr def int(v: Int): Expr = Integer(v) def add(l: Expr, r: Expr): Expr = Addition(l, r) add(add(int(1), int(2)), int(3)) : Expr def interpret(e: Expr): Int = e match { case Integer(v) => v case Addition(l, r) => interpret(l) + interpret(r) } def serialize(v: Expr): Json = ??? def deserialize(v: Json): Expr = ???
  • 12. How To DSL Adding Strings sealed trait Expr final case class Integer(v: Int) extends Expr final case class Addition(l: Expr, r: Expr) extends Expr // Uh, oh! final case class Str(v: String) extends Expr final case class StringConcat(l: Expr, r: Expr) extends Expr // Uh, oh!
  • 13. How To DSL Phantom Type sealed trait Expr[A] final case class Integer(v: Int) extends Expr[Int] final case class Addition(l: Expr[Int], r: Expr[Int]) extends Expr[Int] final case class Str(v: String) extends Expr[String] final case class StringConcat(l: Expr[String], r: Expr[String]) extends Expr[String] def interpret[A](e: Expr[A]): A = e match { case Integer(v) => v case Addition(l, r) => interpret(l) + interpret(r) case Str(v) => v case StringConcat(l, r) => interpret(l) ++ interpret(r) } def serialize[A](v: Expr[A]): Json = ??? def deserialize[Z](v: Json): Expr[A] forSome { type A } = ???
  • 14. How To DSL GADTs in Scala still have bugs SI-8563, SI-9345, SI-6680 FRIENDS DON'T LET FRIENDS USE GADTS IN SCALA.
  • 15. How To DSL Finally Tagless trait Expr[F[_]] { def int(v: Int): F[Int] def str(v: String): F[String] def add(l: F[Int], r: F[Int]): F[Int] def concat(l: F[String], r: F[String]): F[String] } trait Dsl[A] { def apply[F[_]](implicit F: Expr[F]): F[A] } def int(v: Int): Dsl[Int] = new Dsl[Int] { def apply[F[_]](implicit F: Expr[F]): F[Int] = } def add(l: Dsl[Int], r: Dsl[Int]): Dsl[Int] = new Dsl[Int] { def apply[F[_]](implicit F: Expr[F]): F[Int] = F.add(l.apply[F], r.apply[F]) } // ...
  • 16. How To DSL Finally Tagless type Id[A] = A def interpret: Expr[Id] = new Expr[Id] { def int(v: Int): Id[Int] = v def str(v: String): Id[String] = v def add(l: Id[Int], r: Id[Int]): Id[Int] = l + r def concat(l: Id[String], r: Id[String]): Id[String] = l + r } add(int(1), int(2)).apply(interpret) // Id(3) final case class Const[A, B](a: A) def serialize: Expr[Const[Json, ?]] = ??? def deserialize[F[_]: Expr](json: Json): F[A] forSome { type A } = ???
  • 17. Quark 101 The Building Blocks — Type. Represents a reified type of an element in a dataset. — **Dataset[A]**. Represents a dataset, produced by successive application of set-level operations (SetOps). Describes a directed- acyclic graph. — **MappingFunc[A, B]**. Represents a function from A to B that is produced by successive application of mapping-level operations (MapOps) to the input. — **ReduceFunc[A, B]**. Represents a reduction from A to B, produced by application of reduction-level operations (ReduceOps) to the input.
  • 18. Let's Build Us a Mini-Quark!
  • 19. Mini-Quark Type System sealed trait Type object Type { final case class Unknown() extends Type final case class Timestamp() extends Type final case class Date() extends Type final case class Time() extends Type final case class Interval() extends Type final case class Int() extends Type final case class Dec() extends Type final case class Str() extends Type final case class Map[A <: Type, B <: Type](key: A, value: B) extends Type final case class Arr[A <: Type](element: A) extends Type final case class Tuple2[A <: Type, B <: Type](_1: A, _2: B) extends Type final case class Bool() extends Type final case class Null() extends Type type UnknownMap = Map[Unknown, Unknown] val UnknownMap : UnknownMap = Map(Unknown(), Unknown()) type UnknownArr = Arr[Unknown] val UnknownArr : UnknownArr = Arr(Unknown()) type Record[A <: Type] = Map[Str, A] type UnknownRecord = Record[Unknown] }
  • 20. Mini-Quark Set-Level Operations sealed trait SetOps[F[_]] { def read(path: String): F[Unknown] }
  • 21. Mini-Quark Dataset sealed trait Dataset[A] { def apply[F[_]](implicit F: SetOps[F]): F[A] } object Dataset { def read(path: String): Dataset[Unknown] = new Dataset[Unknown] { def apply[F[_]](implicit F: SetOps[F]): F[Unknown] = } }
  • 22. Mini-Quark Mapping sealed trait SetOps[F[_]] { def read(path: String): F[Unknown] def map[A, B](v: F[A], f: ???) // What goes here? }
  • 23. Mini-Quark Mapping: Attempt #1 sealed trait SetOps[F[_]] { def read(path: String): F[Unknown] def map[A, B](v: F[A], f: F[A] => F[B]) // Doesn't really work... }
  • 24. Mini-Quark Mapping: Attempt #2 sealed trait MappingFunc[A, B] { def apply[F[_]](v: F[A])(implicit F: MappingOps[F]): F[B] } trait MappingOps[F[_]] { def str(v: String): F[Type.Str] def project[K <: Type, V <: Type](v: F[Type.Map[K, V]], k: F[K]): F[V] def add(l: F[Type.Int], r: F[Type.Int]): F[Type.Int] def length[A <: Type](v: F[Type.Arr[A]]): F[Type.Int] ... } object MappingOps { def id[A]: MappingFunc[A, B] = new MappingFunc[A, A] { def apply[F[_]](v: F[A])(implicit F: MappingOps[F]): F[A] = v } }
  • 25. Mini-Quark Mapping: Attempt #2 trait SetOps[F[_]] { def read(path: String): F[Unknown] def map[A, B](v: F[A], f: MappingFunc[A, B]): F[B] // Yay!!! }
  • 26. Mini-Quark Dataset: Mapping sealed trait Dataset[A] { def apply[F[_]](implicit F: SetOps[F]): F[A] def map[B](f: ???): Dataset[B] = ??? // What goes here??? } object Dataset { def read(path: String): Dataset[Unknown] = new Dataset[Unknown] { def apply[F[_]](implicit F: SetOps[F]): F[Unknown] = } }
  • 27. Mini-Quark Dataset: Mapping Attempt #1 sealed trait Dataset[A] { self => def apply[F[_]](implicit F: SetOps[F]): F[A] def map[B](f: MappingFunc[A, B]): Dataset[B] = new Dataset[B] { def apply[F[_]](implicit F: SetOps[F]): F[B] =, f) } } object Dataset { def read(path: String): Dataset[Unknown] = new Dataset[Unknown] { def apply[F[_]](implicit F: SetOps[F]): F[Unknown] = } } // // Cannot ever work! // => v.profits[Dec] - v.losses[Dec]) // Cannot ever work!
  • 28. Mini-Quark Dataset: Mapping Attempt #2 sealed trait Dataset[A] { def apply[F[_]](implicit F: SetOps[F]): F[A] def map[B](f: MappingFunc[A, A] => MappingFunc[A, B]): Dataset[B] = new Dataset[B] { def apply[F[_]](implicit F: SetOps[F]): F[B] =, f([A])) } } object Dataset { def read(path: String): Dataset[Unknown] = new Dataset[Unknown] { def apply[F[_]](implicit F: SetOps[F]): F[Unknown] = } } // // Works with right methods on MappingFunc! // => v.profits[Dec] - v.losses[Dec]) // Works with right methods on MappingFunc!
  • 29. Mini-Quark Dataset: Mapping Binary Operators val netProfit = => v.netRevenue[Dec] - v.netCosts[Dec])
  • 30. Mini-Quark MappingFuncs Are Arrows! trait MappingFunc[A <: Type, B <: Type] extends Dynamic { self => import MappingFunc.Case def apply[F[_]: MappingOps](v: F[A]): F[B] def >>> [C <: Type](that: MappingFunc[B, C]): MappingFunc[A, C] = new MappingFunc[A, C] { def apply[F[_]: MappingOps](v: F[A]): F[C] = that.apply[F](self.apply[F](v)) } def + (that: MappingFunc[A, B])(implicit W: NumberLike[B]): MappingFunc[A, B] = new MappingFunc[A, B] { def apply[F[_]: MappingOps](v: F[A]): F[B] = MappingOps[F].add(self(v), that(v)) } def - (that: MappingFunc[A, B])(implicit W: NumberLike[B]): MappingFunc[A, B] = new MappingFunc[A, B] { def apply[F[_]: MappingOps](v: F[A]): F[B] = MappingOps[F].subtract(self(v), that(v)) } ... }
  • 31. Mini-Quark Applicative Composition MappingFunc[A, B] A -----------------------------B / / / / MappingFunc[A, B ⊕ C] / MappingFunc[A, C] / / C
  • 32. Learn More — Finally Tagless: — Quark: — Quasar: THANK YOU @jdegoes -