10. https://github.com/twitter/scalding
Summingbird is “op top of” Scalding or Storm,
which is “on top of” Cascading,
which is “on top of” Hadoop
http://www.cascading.org/
https://github.com/twitter/summingbird
http://storm.incubator.apache.org/
11. https://github.com/twitter/scalding
Summingbird is “op top of” Scalding or Storm,
which is “on top of” Cascading,
which is “on top of” Hadoop;
Spark is a bit “separate” currently.
http://www.cascading.org/
https://github.com/twitter/summingbird
http://storm.incubator.apache.org/
http://spark.apache.org/
12. https://github.com/twitter/scalding
Summingbird is “op top of” Scalding or Storm,
which is “on top of” Cascading,
which is “on top of” Hadoop;
Spark is a bit “separate” currently.
http://www.cascading.org/
https://github.com/twitter/summingbird
http://storm.incubator.apache.org/
http://spark.apache.org/
HDFS yes,
MapReduce no
13. https://github.com/twitter/scalding
Summingbird is “op top of” Scalding or Storm,
which is “on top of” Cascading,
which is “on top of” Hadoop;
Spark is a bit “separate” currently.
http://www.cascading.org/
https://github.com/twitter/summingbird
http://storm.incubator.apache.org/
http://spark.apache.org/
14. https://github.com/twitter/scalding
Summingbird is “op top of” Scalding or Storm,
which is “on top of” Cascading,
which is “on top of” Hadoop;
Spark is a bit “separate” currently.
http://www.cascading.org/
https://github.com/twitter/summingbird
http://storm.incubator.apache.org/
http://spark.apache.org/
HDFS yes,
MapReduce no
15. https://github.com/twitter/scalding
Summingbird is “op top of” Scalding or Storm,
which is “on top of” Cascading,
which is “on top of” Hadoop;
Spark is a bit “separate” currently.
http://www.cascading.org/
https://github.com/twitter/summingbird
http://storm.incubator.apache.org/
http://spark.apache.org/
HDFS yes,
MapReduce no
Possibly soon?!
16. https://github.com/twitter/scalding
Summingbird is “op top of” Scalding or Storm,
which is “on top of” Cascading,
which is “on top of” Hadoop;
Spark has nothing to do with all this.
http://www.cascading.org/
https://github.com/twitter/summingbird
http://storm.incubator.apache.org/
http://spark.apache.org/
-streams
17. https://github.com/twitter/scalding
Summingbird is “op top of” Scalding or Storm,
which is “on top of” Cascading,
which is “on top of” Hadoop
http://www.cascading.org/
https://github.com/twitter/summingbird
http://storm.incubator.apache.org/
http://spark.apache.org/
this talk
19. Stuff > Memory
Scala collections... fun but, memory bound!
val text = "so many words... waaah! ..."!
!
!
text!
.split(" ")!
.map(a => (a, 1))!
.groupBy(_._1)!
.map(a => (a._1, a._2.map(_._2).sum))!
20. Stuff > Memory
Scala collections... fun but, memory bound!
val text = "so many words... waaah! ..."!
!
!
text!
.split(" ")!
.map(a => (a, 1))!
.groupBy(_._1)!
.map(a => (a._1, a._2.map(_._2).sum))!
in Memory
21. Stuff > Memory
Scala collections... fun but, memory bound!
val text = "so many words... waaah! ..."!
!
!
text!
.split(" ")!
.map(a => (a, 1))!
.groupBy(_._1)!
.map(a => (a._1, a._2.map(_._2).sum))!
in Memory
in Memory
22. Stuff > Memory
Scala collections... fun but, memory bound!
val text = "so many words... waaah! ..."!
!
!
text!
.split(" ")!
.map(a => (a, 1))!
.groupBy(_._1)!
.map(a => (a._1, a._2.map(_._2).sum))!
in Memory
in Memory
in Memory
23. Stuff > Memory
Scala collections... fun but, memory bound!
val text = "so many words... waaah! ..."!
!
!
text!
.split(" ")!
.map(a => (a, 1))!
.groupBy(_._1)!
.map(a => (a._1, a._2.map(_._2).sum))!
in Memory
in Memory
in Memory
in Memory
24. Stuff > Memory
Scala collections... fun but, memory bound!
val text = "so many words... waaah! ..."!
!
!
text!
.split(" ")!
.map(a => (a, 1))!
.groupBy(_._1)!
.map(a => (a._1, a._2.map(_._2).sum))!
in Memory
in Memory
in Memory
in Memory
in Memory
25. package org.myorg;!
!
import org.apache.hadoop.fs.Path;!
import org.apache.hadoop.io.IntWritable;!
import org.apache.hadoop.io.LongWritable;!
import org.apache.hadoop.io.Text;!
import org.apache.hadoop.mapred.*;!
!
import java.io.IOException;!
import java.util.Iterator;!
import java.util.StringTokenizer;!
!
public class WordCount {!
!
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {!
private final static IntWritable one = new IntWritable(1);!
private Text word = new Text();!
!
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) thro
IOException {!
String line = value.toString();!
StringTokenizer tokenizer = new StringTokenizer(line);!
while (tokenizer.hasMoreTokens()) {!
word.set(tokenizer.nextToken());!
output.collect(word, one);!
Why Scalding?
Word Count in Hadoop MR
26. private final static IntWritable one = new IntWritable(1);!
private Text word = new Text();!
!
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) thro
IOException {!
String line = value.toString();!
StringTokenizer tokenizer = new StringTokenizer(line);!
while (tokenizer.hasMoreTokens()) {!
word.set(tokenizer.nextToken());!
output.collect(word, one);!
}!
}!
}!
!
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {!
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter
reporter) throws IOException {!
int sum = 0;!
while (values.hasNext()) {!
sum += values.next().get();!
}!
output.collect(key, new IntWritable(sum));!
}!
}!
!
public static void main(String[] args) throws Exception {!
JobConf conf = new JobConf(WordCount.class);!
conf.setJobName("wordcount");!
!
conf.setOutputKeyClass(Text.class);!
conf.setOutputValueClass(IntWritable.class);!
!
conf.setMapperClass(Map.class);!
conf.setCombinerClass(Reduce.class);!
conf.setReducerClass(Reduce.class);!
!
conf.setInputFormat(TextInputFormat.class);!
conf.setOutputFormat(TextOutputFormat.class);!
!
FileInputFormat.setInputPaths(conf, new Path(args[0]));!
FileOutputFormat.setOutputPath(conf, new Path(args[1]));!
!
JobClient.runJob(conf);!
}!
}!
Why Scalding?
Word Count in Hadoop MR
37. val data = 1 :: 2 :: 3 :: Nil
val doubled = data map { _ * 2 }
map
Scala:
38. val data = 1 :: 2 :: 3 :: Nil
val doubled = data map { _ * 2 }
map
Scala:
39. val data = 1 :: 2 :: 3 :: Nil
val doubled = data map { _ * 2 }
map
IterableSource(data)
.map('number -> 'doubled) { n: Int => n * 2 }
Scala:
40. val data = 1 :: 2 :: 3 :: Nil
val doubled = data map { _ * 2 }
map
IterableSource(data)
.map('number -> 'doubled) { n: Int => n * 2 }
Scala:
available in Pipe
41. val data = 1 :: 2 :: 3 :: Nil
val doubled = data map { _ * 2 }
map
IterableSource(data)
.map('number -> 'doubled) { n: Int => n * 2 }
Scala:
available in Pipestays in Pipe
42. val data = 1 :: 2 :: 3 :: Nil!
!
val doubled = data map { _ * 2 }!
!
// Int => Int
map
IterableSource(data)!
.map('number -> 'doubled) { n: Int => n * 2 }!
!
!
// Int => Int
Scala:
must choose type!
44. var data = 1 :: 2 :: 3 :: Nil
val doubled = data map { _ * 2 }
data = null
mapTo
Scala:
45. var data = 1 :: 2 :: 3 :: Nil
val doubled = data map { _ * 2 }
data = null
mapTo
Scala:
“release reference”
46. var data = 1 :: 2 :: 3 :: Nil
val doubled = data map { _ * 2 }
data = null
mapTo
Scala:
“release reference”
47. var data = 1 :: 2 :: 3 :: Nil
val doubled = data map { _ * 2 }
data = null
mapTo
IterableSource(data)
.mapTo('doubled) { n: Int => n * 2 }
Scala:
“release reference”
48. var data = 1 :: 2 :: 3 :: Nil
val doubled = data map { _ * 2 }
data = null
mapTo
IterableSource(data)
.mapTo('doubled) { n: Int => n * 2 }
Scala:
doubled stays in Pipe
“release reference”
49. var data = 1 :: 2 :: 3 :: Nil
val doubled = data map { _ * 2 }
data = null
mapTo
IterableSource(data)
.mapTo('doubled) { n: Int => n * 2 }
Scala:
doubled stays in Pipenumber is removed
“release reference”
58. val data = 1 :: 2 :: 30 :: 42 :: Nil // List[Int]
val groups = data groupBy { _ < 10 }
groups // Map[Boolean, Int]
groupBy
Scala:
59. val data = 1 :: 2 :: 30 :: 42 :: Nil // List[Int]
val groups = data groupBy { _ < 10 }
groups // Map[Boolean, Int]
groupBy
Scala:
60. val data = 1 :: 2 :: 30 :: 42 :: Nil // List[Int]
val groups = data groupBy { _ < 10 }
groups // Map[Boolean, Int]
groupBy
IterableSource(List(1, 2, 30, 42), 'num)
.map('num -> 'lessThanTen) { i: Int => i < 10 }
.groupBy('lessThanTen) { _.size }
Scala:
61. val data = 1 :: 2 :: 30 :: 42 :: Nil // List[Int]
val groups = data groupBy { _ < 10 }
groups // Map[Boolean, Int]
groupBy
IterableSource(List(1, 2, 30, 42), 'num)
.map('num -> 'lessThanTen) { i: Int => i < 10 }
.groupBy('lessThanTen) { _.size }
Scala:
groups all with == value
62. val data = 1 :: 2 :: 30 :: 42 :: Nil // List[Int]
val groups = data groupBy { _ < 10 }
groups // Map[Boolean, Int]
groupBy
IterableSource(List(1, 2, 30, 42), 'num)
.map('num -> 'lessThanTen) { i: Int => i < 10 }
.groupBy('lessThanTen) { _.size }
Scala:
groups all with == value 'lessThanTenCounts
107. Reduce, these Monoids
+ 3 laws:
(T, T) => TClosure:
Associativity:
trait Monoid[T] {!
def zero: T!
def +(a: T, b: T): T!
}
∀a,b∈T:a·b∈T
∀a,b,c∈T:(a·b)·c=a·(b·c)
(a + b) + c!
==!
a + (b + c)
interface:
108. Reduce, these Monoids
+ 3 laws:
(T, T) => TClosure:
Associativity:
Identity element:
trait Monoid[T] {!
def zero: T!
def +(a: T, b: T): T!
}
∀a,b∈T:a·b∈T
∀a,b,c∈T:(a·b)·c=a·(b·c)
(a + b) + c!
==!
a + (b + c)
interface:
109. Reduce, these Monoids
+ 3 laws:
(T, T) => TClosure:
Associativity:
Identity element:
trait Monoid[T] {!
def zero: T!
def +(a: T, b: T): T!
}
∀a,b∈T:a·b∈T
∀a,b,c∈T:(a·b)·c=a·(b·c)
(a + b) + c!
==!
a + (b + c)
interface:
∃z∈T:∀a∈T:z·a=a·z=a z + a == a + z == a
110. Reduce, these Monoids
object IntSum extends Monoid[Int] {!
def zero = 0!
def +(a: Int, b: Int) = a + b!
}
Summing:
111. Monoid ops can start “Map-side”
bear, 2
car, 3
deer, 2
Monoid ops can already start
being computed map-side!
Monoid ops can already start
being computed map-side!
river, 2
112. Monoid ops can start “Map-side”
average()
sum()
sortWithTake()
histogram()
Examples:
bear, 2
car, 3
deer, 2
river, 2
113. Obligatory: “Go check out Algebird, NOW!” slide
https://github.com/twitter/algebird
ALGE-birds
114. BloomFilterMonoid
https://github.com/twitter/algebird/wiki/Algebird-Examples-with-REPL
val NUM_HASHES = 6!
val WIDTH = 32!
val SEED = 1!
val bfMonoid = new BloomFilterMonoid(NUM_HASHES, WIDTH, SEED)!
!
val bf1 = bfMonoid.create("1", "2", "3", "4", "100")!
val bf2 = bfMonoid.create("12", "45")!
val bf = bf1 ++ bf2!
// bf: com.twitter.algebird.BF =!
!
val approxBool = bf.contains("1")!
// approxBool: com.twitter.algebird.ApproximateBoolean =
ApproximateBoolean(true,0.9290349745708529)!
!
val res = approxBool.isTrue!
// res: Boolean = true
115. BloomFilterMonoid
https://github.com/twitter/algebird/wiki/Algebird-Examples-with-REPL
val NUM_HASHES = 6!
val WIDTH = 32!
val SEED = 1!
val bfMonoid = new BloomFilterMonoid(NUM_HASHES, WIDTH, SEED)!
!
val bf1 = bfMonoid.create("1", "2", "3", "4", "100")!
val bf2 = bfMonoid.create("12", "45")!
val bf = bf1 ++ bf2!
// bf: com.twitter.algebird.BF =!
!
val approxBool = bf.contains("1")!
// approxBool: com.twitter.algebird.ApproximateBoolean =
ApproximateBoolean(true,0.9290349745708529)!
!
val res = approxBool.isTrue!
// res: Boolean = true
116. BloomFilterMonoid
https://github.com/twitter/algebird/wiki/Algebird-Examples-with-REPL
val NUM_HASHES = 6!
val WIDTH = 32!
val SEED = 1!
val bfMonoid = new BloomFilterMonoid(NUM_HASHES, WIDTH, SEED)!
!
val bf1 = bfMonoid.create("1", "2", "3", "4", "100")!
val bf2 = bfMonoid.create("12", "45")!
val bf = bf1 ++ bf2!
// bf: com.twitter.algebird.BF =!
!
val approxBool = bf.contains("1")!
// approxBool: com.twitter.algebird.ApproximateBoolean =
ApproximateBoolean(true,0.9290349745708529)!
!
val res = approxBool.isTrue!
// res: Boolean = true
123. Joins
that.joinWithLarger('id1 -> 'id2, other)!
that.joinWithSmaller('id1 -> 'id2, other)!
!
!
that.joinWithTiny('id1 -> 'id2, other)
joinWithTiny is appropriate when you know that # of rows
in bigger pipe > mappers * # rows in smaller pipe, where
mappers is the number of mappers in the job.
124. Joins
that.joinWithLarger('id1 -> 'id2, other)!
that.joinWithSmaller('id1 -> 'id2, other)!
!
!
that.joinWithTiny('id1 -> 'id2, other)
joinWithTiny is appropriate when you know that # of rows
in bigger pipe > mappers * # rows in smaller pipe, where
mappers is the number of mappers in the job.
The “usual”
127. Joins
import com.twitter.scalding.FunctionImplicits._!
!
people.joinWithLarger('id -> 'ownerId, cars)!
.map(('name, 'carName) -> 'sentence) { !
(name: String, car: String) =>!
s"Hello $name, your $car is really nice"!
}!
.project('sentence)!
.write(output)
Hello hans, your bmw is really nice!
Hello bob, your bob's car is really nice!
val people = IterableSource(!
(1, “hans”) ::!
(2, “bob”) ::!
(3, “hermut”) ::!
(4, “heinz”) ::!
(5, “klemens”) :: … :: Nil,!
('id, 'name))
val cars = IterableSource(!
(99, 1, “bmw") :: !
(123, 2, "mercedes”) ::!
(240, 11, “other”) :: Nil,!
('carId, 'ownerId, 'carName))!
128. “map-side” join
that.joinWithTiny('id1 -> 'id2, tinyPipe)
Choose this when:
!
or:
when the Left side is 3 orders of magnitude larger.
Left > max(mappers,reducers) * Right!
129. Skew Joins
val sampleRate = 0.001!
val reducers = 10!
val replicationFactor = 1!
val replicator = SkewReplicationA(replicationFactor)!
!
!
val genders: RichPipe = …!
val followers: RichPipe = …!
!
followers!
.skewJoinWithSmaller('y1 -> 'y2, in1, sampleRate, reducers, replicator)!
.project('x1, 'y1, 's1, 'x2, 'y2, 's2)!
.write(Tsv("output"))
130. Skew Joins
val sampleRate = 0.001!
val reducers = 10!
val replicationFactor = 1!
val replicator = SkewReplicationA(replicationFactor)!
!
!
val genders: RichPipe = …!
val followers: RichPipe = …!
!
followers!
.skewJoinWithSmaller('y1 -> 'y2, in1, sampleRate, reducers, replicator)!
.project('x1, 'y1, 's1, 'x2, 'y2, 's2)!
.write(Tsv("output"))
1. Sample from the left and right pipes with some small probability,
in order to determine approximately how often each join key appears in each pipe.
131. Skew Joins
val sampleRate = 0.001!
val reducers = 10!
val replicationFactor = 1!
val replicator = SkewReplicationA(replicationFactor)!
!
!
val genders: RichPipe = …!
val followers: RichPipe = …!
!
followers!
.skewJoinWithSmaller('y1 -> 'y2, in1, sampleRate, reducers, replicator)!
.project('x1, 'y1, 's1, 'x2, 'y2, 's2)!
.write(Tsv("output"))
1. Sample from the left and right pipes with some small probability,
in order to determine approximately how often each join key appears in each pipe.
2. Use these estimated counts to replicate the join keys,
according to the given replication strategy.
132. Skew Joins
val sampleRate = 0.001!
val reducers = 10!
val replicationFactor = 1!
val replicator = SkewReplicationA(replicationFactor)!
!
!
val genders: RichPipe = …!
val followers: RichPipe = …!
!
followers!
.skewJoinWithSmaller('y1 -> 'y2, in1, sampleRate, reducers, replicator)!
.project('x1, 'y1, 's1, 'x2, 'y2, 's2)!
.write(Tsv("output"))
1. Sample from the left and right pipes with some small probability,
in order to determine approximately how often each join key appears in each pipe.
2. Use these estimated counts to replicate the join keys,
according to the given replication strategy.
3. Join the replicated pipes together.
134. Where did my type-safety go?!
Tsv(in, ('userId1, 'userId2, 'rel))!
.filter('userId1) { uid1: Long => uid1 == 1337 }!
.write(Tsv(out))!
135. Where did my type-safety go?!
Tsv(in, ('userId1, 'userId2, 'rel))!
.filter('userId1) { uid1: Long => uid1 == 1337 }!
.write(Tsv(out))!
Caused by: cascading.flow.FlowException: local step failed
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:219)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:149)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)
Caused by: cascading.pipe.OperatorException: [com.twitter.scalding.C...][com.twitter.scalding.RichPipe.filter(RichPipe.scala:325)] operator Each failed executing operation
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:81)
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:34)
at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
at cascading.flow.stream.SourceStage.call(SourceStage.java:53)
at cascading.flow.stream.SourceStage.call(SourceStage.java:38)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.NumberFormatException: For input string: "bob"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at cascading.tuple.coerce.LongCoerce.coerce(LongCoerce.java:50)
at cascading.tuple.coerce.LongCoerce.coerce(LongCoerce.java:29)
136. Where did my type-safety go?!
Tsv(in, ('userId1, 'userId2, 'rel))!
.filter('userId1) { uid1: Long => uid1 == 1337 }!
.write(Tsv(out))!
Caused by: cascading.flow.FlowException: local step failed
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:219)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:149)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)
Caused by: cascading.pipe.OperatorException: [com.twitter.scalding.C...][com.twitter.scalding.RichPipe.filter(RichPipe.scala:325)] operator Each failed executing operation
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:81)
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:34)
at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
at cascading.flow.stream.SourceStage.call(SourceStage.java:53)
at cascading.flow.stream.SourceStage.call(SourceStage.java:38)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.NumberFormatException: For input string: "bob"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at cascading.tuple.coerce.LongCoerce.coerce(LongCoerce.java:50)
at cascading.tuple.coerce.LongCoerce.coerce(LongCoerce.java:29)
“oh, right… We
changed that file to be
user names, not ids…”
148. TypedAPI’s
Tsv(in, ('userId1, 'userId2, 'rel))!
.filter('userId1) { rel: Long => rel == 1337 }!
.write(Tsv(out))!
// … with Relationships {!
import TDsl._!
!
userRelationships(date)!
.filter { _._ == "bob" }!
.write(TypedTsv(out))!
!
}
Easier to reuse
schemas now
149. TypedAPI’s
Tsv(in, ('userId1, 'userId2, 'rel))!
.filter('userId1) { rel: Long => rel == 1337 }!
.write(Tsv(out))!
// … with Relationships {!
import TDsl._!
!
userRelationships(date)!
.filter { _._ == "bob" }!
.write(TypedTsv(out))!
!
}
Easier to reuse
schemas now
Not coupled by Field names,
but still too magic for reuse… “_1”?
174. “Parallelize all the batches!”
Feels much like Scala collections
Local Mode thanks to Cascading
175. “Parallelize all the batches!”
Feels much like Scala collections
Local Mode thanks to Cascading
Easy to add custom Taps
176. “Parallelize all the batches!”
Feels much like Scala collections
Local Mode thanks to Cascading
Easy to add custom Taps
Type Safe, when you want to
177. “Parallelize all the batches!”
Feels much like Scala collections
Local Mode thanks to Cascading
Easy to add custom Taps
Type Safe, when you want to
Pure Scala
178. “Parallelize all the batches!”
Feels much like Scala collections
Local Mode thanks to Cascading
Easy to add custom Taps
Type Safe, when you want to
Pure Scala
Testing friendly
179. “Parallelize all the batches!”
Feels much like Scala collections
Local Mode thanks to Cascading
Easy to add custom Taps
Type Safe, when you want to
Pure Scala
Testing friendly
180. “Parallelize all the batches!”
Feels much like Scala collections
Local Mode thanks to Cascading
Easy to add custom Taps
Type Safe, when you want to
Pure Scala
Testing friendly
Matrix API
181. “Parallelize all the batches!”
Feels much like Scala collections
Local Mode thanks to Cascading
Easy to add custom Taps
Type Safe, when you want to
Pure Scala
Testing friendly
Matrix API
Efficient columnar storage (Parquet)