16. Nested parallelism going recursive
def vowel(c: Char): Boolean = ...
def gen(n: Int, acc: Seq[String]): Seq[String] =
if (n == 0) acc
else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield
recursive algorithms
17. Nested parallelism going recursive
def vowel(c: Char): Boolean = ...
def gen(n: Int, acc: Seq[String]): Seq[String] =
if (n == 0) acc
else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield
if (s.length == 0) s + c
18. Nested parallelism going recursive
def vowel(c: Char): Boolean = ...
def gen(n: Int, acc: Seq[String]): Seq[String] =
if (n == 0) acc
else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield
if (s.length == 0) s + c
else if (vowel(s.last) && !vowel(c)) s + c
else if (!vowel(s.last) && vowel(c)) s + c
19. Nested parallelism going recursive
def vowel(c: Char): Boolean = ...
def gen(n: Int, acc: Seq[String]): Seq[String] =
if (n == 0) acc
else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield
if (s.length == 0) s + c
else if (vowel(s.last) && !vowel(c)) s + c
else if (!vowel(s.last) && vowel(c)) s + c
else s
gen(5, Array(""))
20. Nested parallelism going recursive
def vowel(c: Char): Boolean = ...
def gen(n: Int, acc: Seq[String]): Seq[String] =
if (n == 0) acc
else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield
if (s.length == 0) s + c
else if (vowel(s.last) && !vowel(c)) s + c
else if (!vowel(s.last) && vowel(c)) s + c
else s
gen(5, Array(""))
1545 ms
21. Nested parallelism going recursive
def vowel(c: Char): Boolean = ...
def gen(n: Int, acc: ParSeq[String]): ParSeq[String] =
if (n == 0) acc
else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield
if (s.length == 0) s + c
else if (vowel(s.last) && !vowel(c)) s + c
else if (!vowel(s.last) && vowel(c)) s + c
else s
gen(5, ParArray(""))
22. Nested parallelism going recursive
def vowel(c: Char): Boolean = ...
def gen(n: Int, acc: ParSeq[String]): ParSeq[String] =
if (n == 0) acc
else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield
if (s.length == 0) s + c
else if (vowel(s.last) && !vowel(c)) s + c
else if (!vowel(s.last) && vowel(c)) s + c
else s
gen(5, ParArray(""))
1 core
1575 ms
23. Nested parallelism going recursive
def vowel(c: Char): Boolean = ...
def gen(n: Int, acc: ParSeq[String]): ParSeq[String] =
if (n == 0) acc
else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield
if (s.length == 0) s + c
else if (vowel(s.last) && !vowel(c)) s + c
else if (!vowel(s.last) && vowel(c)) s + c
else s
gen(5, ParArray(""))
2 cores
809 ms
24. Nested parallelism going recursive
def vowel(c: Char): Boolean = ...
def gen(n: Int, acc: ParSeq[String]): ParSeq[String] =
if (n == 0) acc
else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield
if (s.length == 0) s + c
else if (vowel(s.last) && !vowel(c)) s + c
else if (!vowel(s.last) && vowel(c)) s + c
else s
gen(5, ParArray(""))
4 cores
530 ms
27. Character count use case for foldLeft
val txt: String = ...
txt.foldLeft(0) {
case (a, ‘ ‘) => a
case (a, c) => a + 1
}
28. 6
5
4
3
2
1
0
Character count use case for foldLeft
txt.foldLeft(0) {
case (a, ‘ ‘) => a
case (a, c) => a + 1
}
going left to right - not parallelizable!
A
B
C
D
E
F
_ + 1
29. Character count use case for foldLeft
txt.foldLeft(0) {
case (a, ‘ ‘) => a
case (a, c) => a + 1
}
going left to right – not really necessary
3
2
1
0
A
B
C
_ + 1
3
2
1
0
D
E
F
_ + 1
_ + _
6
30. Character count in parallel
txt.fold(0) {
case (a, ‘ ‘) => a
case (a, c) => a + 1
}
31. Character count in parallel
txt.fold(0) {
case (a, ‘ ‘) => a
case (a, c) => a + 1
}
3
2
1
A
B
C
_ + 1
3
2
1
A
B
C
: (Int, Char) => Int
32. Character count fold not applicable
txt.fold(0) {
case (a, ‘ ‘) => a
case (a, c) => a + 1
}
3
2
1
A
B
C
_ + _
3
3
3
2
1
A
B
C
! (Int, Int) => Int
33. Character count use case for aggregate
txt.aggregate(0)({
case (a, ‘ ‘) => a
case (a, c) => a + 1
}, _ + _)
34. 3
2
1
A
B
C
Character count use case for aggregate
txt.aggregate(0)({
case (a, ‘ ‘) => a
case (a, c) => a + 1
}, _ + _)
_ + _
3
3
3
2
1
A
B
C
_ + 1
35. Character count use case for aggregate
aggregation element
3
2
1
A
B
C
_ + _
3
3
3
2
1
A
B
C
txt.aggregate(0)({
case (a, ‘ ‘) => a
case (a, c) => a + 1
}, _ + _)
_ + 1
36. Character count use case for aggregate
aggregation aggregation
aggregation element
3
2
1
A
B
C
_ + _
3
3
3
2
1
A
B
C
txt.aggregate(0)({
case (a, ‘ ‘) => a
case (a, c) => a + 1
}, _ + _)
_ + 1
37. Word count another use case for foldLeft
txt.foldLeft((0, true)) {
case ((wc, _), ' ') => (wc, true)
case ((wc, true), x) => (wc + 1, false)
case ((wc, false), x) => (wc, false)
}
38. Word count initial accumulation
txt.foldLeft((0, true)) {
case ((wc, _), ' ') => (wc, true)
case ((wc, true), x) => (wc + 1, false)
case ((wc, false), x) => (wc, false)
}
0 words so far
last character was a space
“Folding me softly.”
39. Word count a space
txt.foldLeft((0, true)) {
case ((wc, _), ' ') => (wc, true)
case ((wc, true), x) => (wc + 1, false)
case ((wc, false), x) => (wc, false)
}
“Folding me softly.”
last seen character is a space
40. Word count a non space
txt.foldLeft((0, true)) {
case ((wc, _), ' ') => (wc, true)
case ((wc, true), x) => (wc + 1, false)
case ((wc, false), x) => (wc, false)
}
“Folding me softly.”
last seen character was a space – a new word
41. Word count a non space
txt.foldLeft((0, true)) {
case ((wc, _), ' ') => (wc, true)
case ((wc, true), x) => (wc + 1, false)
case ((wc, false), x) => (wc, false)
}
“Folding me softly.”
last seen character wasn’t a space – no new word
42. Word count in parallel
“softly.“
“Folding me “
P1
P2
43. Word count in parallel
“softly.“
“Folding me “
wc = 2; rs = 1
wc = 1; ls = 0
P1
P2
44. Word count in parallel
“softly.“
“Folding me “
wc = 2; rs = 1
wc = 1; ls = 0
wc = 3
P1
P2
45. Word count must assume arbitrary partitions
“g me softly.“
“Foldin“
wc = 1; rs = 0
wc = 3; ls = 0
P1
P2
46. Word count must assume arbitrary partitions
“g me softly.“
“Foldin“
wc = 1; rs = 0
wc = 3; ls = 0
P1
P2
wc = 3
48. Word count initial aggregation
txt.par.aggregate((0, 0, 0))
# spaces on the left
# spaces on the right
#words
49. Word count initial aggregation
txt.par.aggregate((0, 0, 0))
# spaces on the left
# spaces on the right
#words
””
50. Word count aggregation aggregation
...
}, {
case ((0, 0, 0), res) => res
case (res, (0, 0, 0)) => res
““
“Folding me“
“softly.“
““
51. Word count aggregation aggregation
...
}, {
case ((0, 0, 0), res) => res
case (res, (0, 0, 0)) => res
case ((lls, lwc, 0), (0, rwc, rrs)) =>
(lls, lwc + rwc - 1, rrs)
“e softly.“
“Folding m“
52. Word count aggregation aggregation
...
}, {
case ((0, 0, 0), res) => res
case (res, (0, 0, 0)) => res
case ((lls, lwc, 0), (0, rwc, rrs)) =>
(lls, lwc + rwc - 1, rrs)
case ((lls, lwc, _), (_, rwc, rrs)) =>
(lls, lwc + rwc, rrs)
“ softly.“
“Folding me”
53. Word count aggregation element
txt.par.aggregate((0, 0, 0))({
case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
”_”
0 words and a space – add one more space each side
54. Word count aggregation element
txt.par.aggregate((0, 0, 0))({
case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
” m”
0 words and a non-space – one word, no spaces on the right side
55. Word count aggregation element
txt.par.aggregate((0, 0, 0))({
case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
case ((ls, wc, rs), ' ') => (ls, wc, rs + 1)
” me_”
nonzero words and a space – one more space on the right side
56. Word count aggregation element
txt.par.aggregate((0, 0, 0))({
case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
case ((ls, wc, rs), ' ') => (ls, wc, rs + 1)
case ((ls, wc, 0), c) => (ls, wc, 0)
” me sof”
nonzero words, last non-space and current non-space – no change
57. Word count aggregation element
txt.par.aggregate((0, 0, 0))({
case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
case ((ls, wc, rs), ' ') => (ls, wc, rs + 1)
case ((ls, wc, 0), c) => (ls, wc, 0)
case ((ls, wc, rs), c) => (ls, wc + 1, 0)
” me s”
nonzero words, last space and current non-space – one more word
58. Word count in parallel
txt.par.aggregate((0, 0, 0))({
case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
case ((ls, wc, rs), ' ') => (ls, wc, rs + 1)
case ((ls, wc, 0), c) => (ls, wc, 0)
case ((ls, wc, rs), c) => (ls, wc + 1, 0)
}, {
case ((0, 0, 0), res) => res
case (res, (0, 0, 0)) => res
case ((lls, lwc, 0), (0, rwc, rrs)) =>
(lls, lwc + rwc - 1, rrs)
case ((lls, lwc, _), (_, rwc, rrs)) =>
(lls, lwc + rwc, rrs)
})
59. Word count using parallel strings?
txt.par.aggregate((0, 0, 0))({
case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
case ((ls, wc, rs), ' ') => (ls, wc, rs + 1)
case ((ls, wc, 0), c) => (ls, wc, 0)
case ((ls, wc, rs), c) => (ls, wc + 1, 0)
}, {
case ((0, 0, 0), res) => res
case (res, (0, 0, 0)) => res
case ((lls, lwc, 0), (0, rwc, rrs)) =>
(lls, lwc + rwc - 1, rrs)
case ((lls, lwc, _), (_, rwc, rrs)) =>
(lls, lwc + rwc, rrs)
})
60. Word count string not really parallelizable
scala> (txt: String).par
61. Word count string not really parallelizable
scala> (txt: String).par
collection.parallel.ParSeq[Char] = ParArray(…)
62. Word count string not really parallelizable
scala> (txt: String).par
collection.parallel.ParSeq[Char] = ParArray(…)
different internal representation!
63. Word count string not really parallelizable
scala> (txt: String).par
collection.parallel.ParSeq[Char] = ParArray(…)
different internal representation!
ParArray
64. Word count string not really parallelizable
scala> (txt: String).par
collection.parallel.ParSeq[Char] = ParArray(…)
different internal representation!
ParArray
copy string contents into an array
77. Custom collection splitters are iterators
class ParStringSplitter(i: Int, len: Int)
extends Splitter[Char] {
def hasNext = i < len
def next = {
val r = str.charAt(i)
i += 1
r
}
79. Custom collection splitters know how many elements remain
...
def dup = new ParStringSplitter(i, len)
def remaining = len - i
80. Custom collection splitters can be split
...
def psplit(sizes: Int*): Seq[ParStringSplitter] = {
val splitted = new ArrayBuffer[ParStringSplitter]
for (sz <- sizes) {
val next = (i + sz) min ntl
splitted += new ParStringSplitter(i, next)
i = next
}
splitted
}
81. Word count now with parallel strings
new ParString(txt).aggregate((0, 0, 0))({
case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
case ((ls, wc, rs), ' ') => (ls, wc, rs + 1)
case ((ls, wc, 0), c) => (ls, wc, 0)
case ((ls, wc, rs), c) => (ls, wc + 1, 0)
}, {
case ((0, 0, 0), res) => res
case (res, (0, 0, 0)) => res
case ((lls, lwc, 0), (0, rwc, rrs)) =>
(lls, lwc + rwc - 1, rrs)
case ((lls, lwc, _), (_, rwc, rrs)) =>
(lls, lwc + rwc, rrs)
})
82. Word count performance
txt.foldLeft((0, true)) {
case ((wc, _), ' ') => (wc, true)
case ((wc, true), x) => (wc + 1, false)
case ((wc, false), x) => (wc, false)
}
new ParString(txt).aggregate((0, 0, 0))({
case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
case ((ls, wc, rs), ' ') => (ls, wc, rs + 1)
case ((ls, wc, 0), c) => (ls, wc, 0)
case ((ls, wc, rs), c) => (ls, wc + 1, 0)
}, {
case ((0, 0, 0), res) => res
case (res, (0, 0, 0)) => res
case ((lls, lwc, 0), (0, rwc, rrs)) =>
(lls, lwc + rwc - 1, rrs)
case ((lls, lwc, _), (_, rwc, rrs)) =>
(lls, lwc + rwc, rrs)
})
100 ms
cores: 1 2 4
time: 137 ms 70 ms 35 ms
84. Hierarchy
def nonEmpty(sq: Seq[String]) = {
val res = new mutable.ArrayBuffer[String]()
for (s <- sq) {
if (s.nonEmpty) res += s
}
res
}
85. Hierarchy
def nonEmpty(sq: ParSeq[String]) = {
val res = new mutable.ArrayBuffer[String]()
for (s <- sq) {
if (s.nonEmpty) res += s
}
res
}
86. Hierarchy
def nonEmpty(sq: ParSeq[String]) = {
val res = new mutable.ArrayBuffer[String]()
for (s <- sq) {
if (s.nonEmpty) res += s
}
res
}
side-effects!
ArrayBuffer is not synchronized!
87. Hierarchy
def nonEmpty(sq: ParSeq[String]) = {
val res = new mutable.ArrayBuffer[String]()
for (s <- sq) {
if (s.nonEmpty) res += s
}
res
}
side-effects!
ArrayBuffer is not synchronized!
ParSeq
Seq
88. Hierarchy
def nonEmpty(sq: GenSeq[String]) = {
val res = new mutable.ArrayBuffer[String]()
for (s <- sq) {
if (s.nonEmpty) res.synchronized {
res += s
}
}
res
}
89. Accessors vs. transformers some methods need more than just splitters
foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, …
map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, …
90. Accessors vs. transformers some methods need more than just splitters
foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, …
map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, …
These return collections!
91. Accessors vs. transformers some methods need more than just splitters
foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, …
map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, …
Sequential collections – builders
92. Accessors vs. transformers some methods need more than just splitters
foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, …
map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, …
Sequential collections – builders
Parallel collections – combiners
93. Builders building a sequential collection
1
2
3
4
5
6
7
Nil
Nil
ListBuilder
+=
+=
+=
result
128. Custom combiners for methods returning custom collections
...
def result = {
val rsb = new StringBuilder
for (sb <- chunks) rsb.append(sb)
new ParString(rsb.toString)
}
...
129. Custom combiners for methods returning custom collections
...
def result = ...
lastc
chunks
StringBuilder
130. Custom combiners for methods expecting implicit builder factories
// only for big boys
...
with GenericParTemplate[T, ParColl]
...
object ParColl extends ParFactory[ParColl] {
implicit def canCombineFrom[T] =
new GenericCanCombineFrom[T]
...
132. txt.filter(_ != ‘ ‘)
new ParString(txt).filter(_ != ‘ ‘)
106 ms
Custom combiners performance measurement
133. txt.filter(_ != ‘ ‘)
new ParString(txt).filter(_ != ‘ ‘)
106 ms
1 core
125 ms
Custom combiners performance measurement
134. txt.filter(_ != ‘ ‘)
new ParString(txt).filter(_ != ‘ ‘)
106 ms
1 core
125 ms
2 cores
81 ms
Custom combiners performance measurement
135. txt.filter(_ != ‘ ‘)
new ParString(txt).filter(_ != ‘ ‘)
106 ms
1 core
125 ms
2 cores
81 ms
4 cores
56 ms
Custom combiners performance measurement
136. 1 core
125 ms
2 cores
81 ms
4 cores
56 ms
t/ms
proc
125 ms
1
2
4
81 ms
56 ms
Custom combiners performance measurement
137. 1 core
125 ms
2 cores
81 ms
4 cores
56 ms
t/ms
proc
125 ms
1
2
4
81 ms
56 ms
def result
(not parallelized)
Custom combiners performance measurement
138. Custom combiners tricky!
•two-step evaluation
–parallelize the result method in combiners
•efficient merge operation
–binomial heaps, ropes, etc.
•concurrent data structures
–non-blocking scalable insertion operation
–we’re working on this
139. Future work coming up
•concurrent data structures
•more efficient vectors
•custom task pools
•user defined scheduling
•parallel bulk in-place modifications