12. Workload function
Could be…
Runtime value
dependent
for {
x <- 0 until width
y <- 0 until height
} img(x, y) = compute(x, y)
workload(n) – work spent on element n after the data-parallel operation completed
12
13. Workload function
Could be…
Execution-schedule
dependent
for (n <- nodes)
n.neighbours += new Node
workload(n) – work spent on element n after the data-parallel operation completed
13
14. Workload function
Could be…
Totally random
for ((x, y) <- img.indices)
img(x, y) = sample(
x + random(),
y + random()
)
workload(n) – work spent on element n after the data-parallel operation completed
14
16. Data-parallel scheduler
1. Linear speedup for the baseline workload
Assign loop elements to workers
without knowledge about the workload function.
16
17. Data-parallel scheduler
1. Linear speedup for the baseline workload
2. Optimal speedup for irregular workloads
Assign loop elements to workers
without knowledge about the workload function.
17
18. Static batching
Decides on the worker-element assignment before the data-parallel operation begins.
N
cycles
18
19. Static batching
Decides on the worker-element assignment before the data-parallel operation begins.
No knowledge → divide uniformly.
Not optimal for even mildly irregular workloads.
N
cycles
19
39. Work-stealing tree
0
0
T0
N
0
50
T0
N
0
N
T0
N
…
owned
owned
completed
T0: CAS
T0: CAS
What about stealing?
39
40. Work-stealing tree
0
0
T0
N
0
50
T0
N
0
N
T0
N
…
owned
owned
completed
0
-51
T0
N
T0: CAS
T1: CAS
stolen
T0: CAS
40
41. Work-stealing tree
0
50
T0
N
0
N
T0
N
…
owned
completed
0
-51
T0
N
T0: CAS
stolen
T0: CAS
0
0
T0
N
owned
T1: CAS
41
42. Work-stealing tree
0
50
T0
N
0
N
T0
N
…
owned
completed
0
-51
T0
N
T0: CAS
stolen
0
-51
T0
N
expanded
50
50
T0
M
M
M
T1
N
T0: CAS
0
0
T0
N
owned
M = (50 + N) / 2
42
43. Work-stealing tree
0
50
T0
N
0
N
T0
N
…
owned
completed
0
-51
T0
N
T0: CAS
stolen
0
-51
T0
N
expanded
50
50
T0
M
M
M
T1
N
T0: CAS
0
0
T0
N
owned
M = (50 + N) / 2
T0 or T1: CAS
43
44. Work-stealing tree
0
50
T0
N
0
N
T0
N
…
owned
completed
0
-51
T0
N
T0: CAS
stolen
0
-51
T0
N
expanded
50
50
T0
M
M
M
T1
N
T0 or T1: CAS
T0: CAS
0
0
T0
N
owned
M = (50 + N) / 2
44
46. Work-stealing tree scheduling
1)find either a non-expanded, non-completed node
2)if not found, terminate
3)if not owned, steal and/or expand, and descend
4)advance until node is completed or stolen
5)go to 1)
50
47. Work-stealing tree scheduling
2)if not found, terminate
3)if not owned, steal and/or expand, and descend
4)advance until node is completed or stolen
5)go to 1)
1)find either a non-expanded, non-completed node
51
48. Choosing the node to steal
Find first, in-order traversal
2
9
5
3
52
49. Choosing the node to steal
Find first, in-order traversal
2
9
5
3
Catastrophic – a lot of stealing, huge trees
53
50. Choosing the node to steal
Find first, in-order traversal
Find first, random order traversal
2
9
5
3
2
9
5
3
Catastrophic – a lot of stealing, huge trees
54
51. Choosing the node to steal
Find first, in-order traversal
Find first, random order traversal
2
9
5
3
2
9
5
3
Catastrophic – a lot of stealing, huge trees
Works reasonably well.
55
52. Choosing the node to steal
Find first, in-order traversal
Find first, random order traversal
Find most elements
2
9
5
3
2
9
5
3
2
9
5
3
Catastrophic – a lot of stealing, huge trees
Works reasonably well.
Generates least nodes.
Seems to be best.
56