4. q The first priority is “performance”
4
Supercomputers
Personal Computers
Smartphones
Embedded
Pictures Borrowed From : commons.wikimedia.org, www.hirt-japan.info
Parallel Computing
5. Parallel programming is hard…
5
DRAM
L3 Cache
Core Core Core Core
L2 Cache L2 Cache
SIMD SIMD SIMD SIMD
L1$ L1$ L1$ L1$
DRAM(slowest)–Register(fastest)
Exploiting
SIMD
Scheduling tasks on CPUs
Optimizing
Data Locality
Multi-core CPUs Many-core GPUs
C
L2 Cache
DRAM
C
C
C
C
C
C
C
C C C C
C
C
C
C
C
C
C
C
C C C C
Utilizing
Accelerators
6. A gap between domain experts and
hardware
6
Application Domain
(Domain Experts)
Prog Lang.
Compilers
Runtime
Want to get significant
performance
improvement easily
(Performance Portability)
Hard to exploit the full
capability of hardware
We believe
Languages and
Compilers are
very important!
Hardware
(Concurrency Experts)
7. A review of literature
q Automatic Parallelizing Compiler
§ IBM XL Compilers, Intel Compilers, OSCAR, Pluto, Polly,
Polaris, R-Stream, SUIF, …
q Parallel Languages
§ Language-based:
ü Cilk, CUDA, OpenCL, C++AMP, Java, Habanero C/Java, PGAS, …
§ Directive-based:
ü OpenMP, OpenACC, OmpSs, …
§ Library-based:
ü Charm++, TBB, Thrust, RAJA, Kokkos, UPC++, HJLib, …
7
8. From the perspective of
compilers…
q Compilers are one of the most
complicated software L
§ Pointer Analysis
§ Scalar Optimizations
§ Loop Transformations
§ Vectorization/SIMDization
§ Scheduling
§ Exploiting accelerators
§ …
8
Credits: dragon by Cassie McKown from the Noun Project, crossed swords by anbileru adaleru from the Noun Project, https://en.wikipedia.org/
9. What are compilers doing?
9
x = a + b;
y = a + b;
z = x + y;
+
a b
+
a b
+
Intermediate Representation
(e.g. AST)
Programs
x = a + b;
y = x;
z = x + y;
“Optimized” Code
Parsing Optimizations
10. What are compilers doing?
10
q Compiler can modify programs (e.g. change the execution
order of statements) as long as maintaining the semantics
of programs
x = a + b;
y = a + b;
z = x + y;
+
a b
+
a b
+
Intermediate Representation
(e.g. AST)
Programs
x = a + b;
y = x;
z = x + y;
“Optimized” Code
z = x + y;
x = a + b;
11. Examples of optimizations:
Scalar optimizations
11
x = a + b;
y = x;
z = x + y;
x = a + b;
y = a + b;
z = x + y;
a = 0;
if (a) {
…
}
Constant
Propaga4on
a = 0;
if (0) {
…
}
Dead Code
Elimina4on
a = 0;
CSE
12. Examples of optimizations:
loop permutation (interchange)
12
for (i = 0; i < M; i++) {
for (j = 0; j < N; j++) {
b[i][j] = a[i][j];
}
}
for (j = 0; j < N; j++) {
for (i = 0; i < M; i++) {
b[i][j] = a[i][j];
}
}
Offset access
(faster on CPUs)
Stride access
(slower on CPUs)
Interchanged Original
13. Examples of optimizations:
loop fusion/distribution
13
for (i = 0; i < N; i++) {
a[i] = b[i] + c[i];
d[i] = a[i] + e[i];
}
for (i = 0; i < N; i++) {
a[i] = b[i] + c[i];
}
for (i = 0; i < N; i++) {
d[i] = a[i] + e[i];
}
Better temporal locality
on CPUs
Fused Distributed
Good for Vectorization
on CPUs
Depending on the loop size “N”
14. The phase-ordering problem
q Which order is better?
14
a = 0;
if (a) {
…
}
Dead Code
Elimina4on
a = 0;
if (a) {
…
}
a = 0;
if (0) {
…
}
Constant
Propaga4on
a = 0;
if (a) {
…
}
Constant
Propaga4on
a = 0;
if (0) {
…
}
Dead Code
Elimina4on
a = 0;
15. 15
x = a + b;
y = a + b;
z = x + y;
+
a b
+
a b
+
ASTPrograms
x = a + b;
y = x;
z = x + y;
“Optimized” Code
AST vs. The Polyhedral Model
i >= 0;
i < N;
…
Polyhedron
(Affine Inequalities) “Synthesized” Code
TODAY
AST
16. Why Polyhedral Model?
q One solution for tackling the phase-ordering problem
q Good for performing a set of loop transformations
§ Loop permutation
§ Loop fusion/distribution
§ Loop tiling
§ …
16
“The Polyhedral Model is a convenient alternative representation
which combines analysis power, expressiveness and high flexibility”
- OpenScop Specification and Library
18. The polyhedral model in a nutshell
q The polyhedral transformation = “scheduling (determine the execution order of statements)”
q 3 important things:
§ Domain: A set of instances for a statement
§ Scattering (Scheduling): an instance -> time stamp
§ Access: an instance -> array element(s)
q Limitation: Only applicable for Static Control Part (SCoP) in general
§ Loop bounds and conditionals are affine functions of the surrounding the loop iterators
18
for (i=1; …){
S1;
for (j=1; …)
S2;
1 ≤ iS1
≤ 2;
1 ≤ iS2
≤ 2;
1 ≤ jS2
≤ 3;
iS1
= iS2
;
InequalitiesProgram
Constraints:
Cost Function:
ILP
δe
(
!
s,
!
t) = φSj
(
!
t) − φSi
(
!
s)
for (i=1; …){
S1;
}
for (i=1; …) {
…;
“Synthesized” Code
Ci
− Cj
≥ 0,!
19. Representation of “Domain”
q Observations:
§ S1 is executed 30
times (30 instances)
§ Each instance is
associated with (i,j)
19
for (i=1; i <= 5; i++)
for (j=1; j <= 6;j++)
S1;
“The key aspect of the polyhedral model is to consider statement instances.”
- OpenScop Specification and Library
20. Iteration Domain
q A set of constraints to represent instances of a statement
§ Using iteration vectors (i,j);
§ If those constraints are affine -> Polyhedron
20
for (i=1; i <= 5; i++)
for (j=1; j <= 6;j++)
S1;
1 ≤ i ≤ 5,1 ≤ j ≤ 6;
DS1
=
1 0 −1
−1 0 5
0 1 −1
0 −1 6
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
i
j
1
⎛
⎝
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
≥ 0
Credits: Clint (https://www.ozinenko.com/clint)
21. Representation of “Scheduling”:
1-dimensional schedules
q Function T: returns the logical date of each statement
21
x = a + b; // S1
y = a + b; // S2
z = x + y; // S3
T_S1 = 0;
T_S2 = 1;
T_S3 = 2;
LogicalTime
T=0
T=1
T=2
22. Representation of “Scheduling”:
multi-dimensional schedules
22
x = a + b; // S1
for (i = 0; i < 2; i++) {
a[i] = x; // S2
}
z = x + y; // S3
LogicalTime
T=1
T=2
T_S1 = (0);
T_S2(0) = (1, 0);
T_S2(1) = (1, 1);
T_S3 = (2)
T=0
i=0
i=1
q Function T: returns the logical date of each statement
q Logical dates may be multi-dimensional (c.f. clocks
§ Lexicographical Order:
§ C.f. Clocks (days, hours, minutes, seconds)
TS1
≺ TS2
≺ TS3
⇔ (0) ≺ (1,i) ≺ (2)
23. Representation of “Scheduling”:
multi-dimensional schedules
23
x = a + b; // S1
for (i = 0; i < 2; i++) {
a[i] = x; // S2
}
z = x + y; // S3
LogicalTime
T=1
T=2
T_S1 = (0);
T_S2(i) = (1, i);
T_S3 = (2)
T=0
i=0
i=1
Parameterized:
Recall “Iteration
domain”
0 ≤ i < 2
q Function T: returns the logical date of each statement
q Logical dates may be multi-dimensional (c.f. clocks
§ Lexicographical Order:
§ C.f. Clocks (days, hours, minutes, seconds)
TS1
≺ TS2
≺ TS3
⇔ (0) ≺ (1,i) ≺ (2)
24. Representation of “Scheduling”:
multi-dimensional schedules
24
x = a + b; // S1
for (i = 0; i < 2; i++) {
a[i] = x; // S2
}
for (i = 0; i < 2; i++) {
for (j = 0; j < 3; j++) {
b[i][j] += a[i]; // S3
}
}
LogicalTime
T=1
T=2
T_S1 = (0);
T_S2(i) = (1, i);
T_S3(i,j) = (2, i, j);
T=0
i=0
i=1
j=0
i=1
j=0
i=1
i=0
i=1
25. Loop transformations with schedules
25
for (i = 0; i < 2; i++) {
for (j = 0; j < 3; j++) {
b[i][j] = ...; // S1
}
}
for (i = 0; i < 2; i++) {
for (j = 0; j < 3; j++) {
b[i][j] = ...; // S1
}
}
TS1
(i, j) = 1 0
0 1
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟ = i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
T_S1(i, j) = (i, j);
T_S1(i, j) = (i, j);
Original schedule
New schedule
New
Schedule
Iteration
Vector
Original
New
Transformation
26. Loop transformations with schedules:
Loop Reversal
26
for (i = 0; i < 2; i++) {
for (j = 0; j < 3; j++) {
b[i][j] = ...; // S1
}
}
T_S1(i, j) = (i, j);
T_S1(i, j) = (-i, j);
Original schedule
New schedule
Original
New
TS1
(i,j) = −1 0
0 1
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟ = −i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
New
Schedule
Iteration
VectorTransformation
for (i = -1; i <= 0; i++) {
for (j = 0; j < 3; j++) {
b[-i][j] = ...; // S1
}
}
inew
= −iold
;
iold
→ −inew
;
27. Loop transformations with schedules:
Loop Permutation
27
for (i = 0; i < 2; i++) {
for (j = 0; j < 3; j++) {
b[i][j] = ...; // S1
}
}
for (j = 0; j < 3; j++) {
for (i = 0; i < 2; i++)
b[i][j] = ...; // S1
}
}
T_S1(i, j) = (i, j);
T_S1(i, j) = (j, i);
Original schedule
New schedule
Original
New
TS1
(i,j) = 0 1
1 0
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟ = j
i
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
New
Schedule
Iteration
VectorTransformation
28. Loop transformations with schedules:
Loop Skewing
28
for (i = 1; i <= 5; i++) {
for (j = 1; j <= 5; j++) {
a[i][j] = a[i-1][j+1]; // S1
}
}
for (i = 1; i <= 5; i++) {
for (j = i+1; j <= i+5; j++) {
a[i][j-i] =
a[i-1][j-i+1]; // S1
}
}
T_S1(i, j) = (i, j);
T_S1(i, j) = (i, i+j);
Original schedule
New schedule
Original
New
TS1
(i,j) = 1 0
1 1
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟ = i
i + j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
New
Schedule
Iteration
VectorTransformation
jnew
= i + jold
;
jold
→ jnew
− i;
30. Scalar Dimensions in schedules
q 2d+1 format (d+d+1)
q Can represent/transform imperfectly nested loops
§ e.g., Loop fusion/distribution
30
for (i = 0; i < 2; i++)
s[i] = ...; // S1
for (j = 0; j < 3; j++)
a[i][j] = ...; // S2
for (i = 0; i < 2; i++)
for (j = 0; j < 3; j++)
b[i] = ...; // S3
T_S1(i) = (0, i, 0);
T_S2(i,j) = (0, i, 1, j, 0);
T_S3(i,j) = (1, i, 0, j, 0)
31. Loop transformations to schedules
loop fusion w/ scalar dimensions
31
for (i = 0; i < 2; i++)
for (j = 0; j < 3; j++)
a[i] = ...; // S1
for (i = 0; i < 2; i++)
for (j = 0; j < 3; j++)
b[i] = ...; // S2
for (i = 0; i < 2; i++)
for (j = 0; j < 3; j++)
a[i] = ...; // S1
for (j = 0; j < 3; j++)
b[i] = ...; // S2
T_S1(i,j) = (0, i, 0, j);
T_S2(i,j) = (1, i, 0, j);
T_S1(i,j) = (0, i, 0, j);
T_S2(i,j) = (0, i, 1, j);
Original schedule
New schedule
TS2
(i,j) =
0 0
1 0
0 0
0 1
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟ +
0
0
1
0
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
=
0
i
1
j
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
New
Schedule
Scalar
DimensionsTransformation
Original
New
32. Schedules in general
32
TS
(
!
i) =
φS
1
(
!
i)
φS
2
(
!
i)
φS
3
(
!
i)
φS
4
(
!
i)
"
φS
d
(
!
i)
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
=
C11
S
C12
S
C13
S
C14
S
" C1mS
S
C21
S
C22
S
C23
S
C24
S
" C2mS
S
C31
S
C32
S
C33
S
C34
S
" C3mS
S
C41
S
C42
S
C43
S
C44
S
" C4mS
S
# # # # $ #
Cd 1
S
Cd 2
S
Cd 3
S
Cd 4
S
" CdmS
S
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
!
i( ) +
C10
S
C20
S
C30
S
C40
S
!
Cd 0
S
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
Scalar DimensionsA transformation for an iteration vector
d
mS
d
1
Schedules
e.g., (0, i, 0, j)
d = 2mS
+ 1, mS
= the size of iteration vector
33. Schedules in general
33
TS
(
!
i) =
φS
1
(
!
i)
φS
2
(
!
i)
φS
3
(
!
i)
φS
4
(
!
i)
"
φS
d
(
!
i)
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
=
C11
S
C12
S
C13
S
C14
S
" C1mS
S
C21
S
C22
S
C23
S
C24
S
" C2mS
S
C31
S
C32
S
C33
S
C34
S
" C3mS
S
C41
S
C42
S
C43
S
C44
S
" C4mS
S
# # # # $ #
Cd 1
S
Cd 2
S
Cd 3
S
Cd 4
S
" CdmS
S
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
!
i( ) +
C10
S
C20
S
C30
S
C40
S
!
Cd 0
S
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
Scalar DimensionsA transformation for an iteration vector
d
mS
d
1
Schedules
e.g., (0, i, 0, j)
d = 2mS
+ 1, mS
= the size of iteration vector
Goal:
Compute the coefficients and offsets for each statement
34. Legality of transformations
q All transformations are valid? NO!
34
for (i = 1; i <= 10; i++)
s[i] = ...; // S1
for (j = 0; j < 3; j++)
a[i][j] = s[i]; // S2
T_S1(i) = (0, i, 0);
T_S2(i,j) = (0, i, 1, j, 0);
for (i = 1; i <= 10; i++)
for (j = 0; j < 3; j++)
a[i][j] = s[i]; // S2
s[i] = ...; // S1
T_S2(i,j) = (0, i, 0, j, 0);
T_S2(i) = (0, i, 1);
Original
New
Transforma4on
35. Dependences
q Three types of dependence:
§ Read-After-Write: (a=1; then b=a;)
§ Write-After-Read: (b=a; then a=1;)
§ Write-After-Write: (a=1; then a=2;)
q Dependence: computed from domain, access,
and schedule
§ Transformation = Find a new schedule that satisfies
all dependences
35
37. Legality of transformations
q Dependence polyhedron:
q Legality:
§
§ If “source” instance must happen before
“target” instance in the original program, the
transformed program must preserve this
property (must satisfy the dependence)
37
∀ s,t ∈ Pe
,(s ∈ DSi
,t ∈ DSj
),TSi
(s) ≺ TSj
(t)
Pe
39. Linearizing the legality condition
(The Pluto Algorithm)
q The Legality condition (for iteration vectors)
q Uniform dependences : distance between two dependent
iteration is a constant ( is a constant)
q Non-uniform dependences : distance between two
dependence varies ( is a function of j )
§ Apply the Farkas lemma
39
δ(s,t) = (c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s ≥ 0, s,t ∈ P
i → i + 1 ⇒ δ(s,t)
i → i + j ⇒ δ(s,t)
(c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s ≥ 0, s,t ∈ Pe
⇔
(c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s ≡ λe0
+ λek
k =1
me
∑ Pe
k
, λek
≥ 0
Each inequality in
a dependence
polyhedron
40. Cost Function & Objective Function
(The Pluto Algorithm)
q Compute all coefficients and offsets under the legality
condition : Solve an ILP problem
q Cost Function = Transformation policy
§ Pluto’s cost function = dependence distance
ü Fuse loops as much as possible
ü Push loops carrying dependence inner level
§ Also used in ISL (Polly, PPCG, …)
q Objective Function:
§ Iteratively find linearly independent solutions
40
δ(s,t) = (c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s, s,t ∈ P
minimize ≺ (u1
,w,c1
Sj
,c2
Sj
)
42. (is
, js
) → (it
, jt
)
c1
S1
,c2
S1
( )
it
jt
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
− c1
S1
,c2
S1
( )
is
js
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
≥ 0, is
, js
,it
, jt
∈ Pe1
⇒ c1
S1
it
+ c2
S1
jt
− (c1
S1
is
+ c2
S1
js
) = c1
S1
it
+ c2
S1
jt
− (c1
S1
it
+ c2
S1
(jt
− 1)) ≥ 0
⇒ c2
S1
≥ 0
Step-by-step example:
Legality Constraints 1 (The Pluto Algorithm)
q Dependence 1 : RAW (flow dependence )
42
Source: a[0][1] = a[1][0] + a[0][0]; // S1(0,1)
Target: a[0][2] = a[2][0] + a[0][1]; // S1(0,2)
...
Pe1
: is
= it
, js
= jt
− 1,0 ≤ it
≤ N − 1,2 ≤ jt
≤ N
δ(s,t) = (c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s ≥ 0, s,t ∈ PLegality Constraints:
Dependence Polyhedron Pe1
43. Step-by-step example:
Legality Constraints 2 (The Pluto Algorithm)
q Dependence 2 : RAW (flow dependence )
43
Pe2
: is
= jt
, js
= it
,1 ≤ it
≤ N,2 ≤ jt
≤ N,it
− jt
≥ 1
c1
S1
,c2
S1
( )
it
jt
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
− c1
S1
,c2
S1
( )
is
js
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
≥ 0, is
, js
,it
, jt
∈ Pe1
⇒ c1
S1
it
+ c2
S1
jt
− (c1
S1
is
+ c2
S1
js
) = c1
S1
it
+ c2
S1
jt
− (c1
S1
jt
+ c2
S1
it
) ≥ 0
⇒ (c1
S1
− c2
S1
)it
+ (c2
S1
− c1
S1
)jt
≥ 0,1 ≤ it
≤ N,2 ≤ jt
≤ N,it
− jt
≥ 1
δ(s,t) = (c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s ≥ 0, s,t ∈ PLegality Constraints:
Dependence Polyhedron Pe2
Target: a[1][2] = a[2][1] + a[1][1]; // S1(1,2)
...
Source: a[2][1] = a[1][2] + a[2][0]; // S1(2,1)
(is
, js
) → (it
, jt
)
Farkas Lemma + Fourier Mozkin c1
S1
− c2
S1
≥ 0
44. Step-by-step example:
Putting it all together (The Pluto Algorithm)
q Dependence 1
q Dependence 2 & 3
q Avoiding zero vector
q Objective Function:
44
c2
S1
≥ 0,w ≥ c2
S1
c1
S1
− c2
S1
≥ 0,u1
≥ 0,u1
≥ c1
S1
− c2
S1
,3u1
+ w ≥ c1
S1
− c2
S1
c1
S1
+ c2
S1
≥ 1
minimize ≺ (u1
,w,c1
S1
,c2
S1
) → (0,1,1,1)
Constraints using parameter N
that bound the dependence distances
Find linearly Independent answer TS1
(i,j) = 1 1
1 0
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
45. Summary
q The polyhedral transformation = “scheduling (determine the execution order of statements)”
q 3 important things:
§ Domain: A set of instances for a statement
§ Scattering (Scheduling): an instance -> time stamp
§ Access: an instance -> array element(s)
q Limitation: Only applicable for Static Control Part (SCoP) in general
§ Loop bounds and conditionals are affine functions of the surrounding the loop iterators
45
for (i=1; …){
S1;
for (j=1; …)
S2;
1 ≤ iS1
≤ 2;
1 ≤ iS2
≤ 2;
1 ≤ jS2
≤ 3;
iS1
= iS2
;
InequalitiesProgram
Constraints:
Cost Function:
ILP
δe
(
!
s,
!
t) = φSj
(
!
t) − φSi
(
!
s)
for (i=1; …){
S1;
}
for (i=1; …) {
…;
“Synthesized” Code
Ci
− Cj
≥ 0,!
47. Polyhedral Compilers & Tools
q PoCC (The Polyhedral Compiler Collection)
§ http://web.cs.ucla.edu/~pouchet/software/pocc/
§ Clan: extract a polyhedral IR from the source code
§ Candl: a dependence analyzer
§ LetSee: legal transformation space explorer
§ PLuTo: an automatic parallelizer and locality
optimizer
§ CLooG: code generation from the polyhedral IR
47
48. Polyhedral Compilers & Tools
q Polly
§ http://polly.llvm.org/
§ ISL: Integer Set Library (including code generator)
q Clay/Chrole/Clint
§ https://www.ozinenko.com/projects
§ Clay: “Chunky Loop Alteration wizardrY”
§ Chrole: “Recovering high-level syntactic description of the
automatically computed polyhedral optimization”
§ Clint: “Interactive graphical interface to the manual and
compiler-assisted program restructuring in the polyhedral
model”
48
50. Further readings
q Fundamentals
§ OpenScop Specification
ü http://icps.u-strasbg.fr/people/bastoul/public_html/development/openscop/docs/openscop.html
§ ISL
ü https://lirias.kuleuven.be/bitstream/123456789/270231/1/icms2010verdoolaege.pdf
q Pluto algorithm
§ U. Bondhugula, “Effective Automatic Parallelization and Locality Optimization Using The Polyhedral
Model” (PhD Dissertation, 2010)
§ U. Bondhugula, A. Hartono, J. Ramanujam, P. Sadayappan, “A Practical Automatic Polyhedral
Parallelizer and Locality Optimizer.” [PLDI’08]
q Polly
§ T. Grosser, S. Verdoolaege, A. Cohen, “Polyhedral AST generation is more than scanning
polyhedra” [ACM TOPLAS2015]
q Polyhedral model + AST-based Cost Function
§ J. Shirako, L.N. Pouchet, V. Sarkar, “Oil and Water Can Mix: An Integration of Polyhedral and AST-
based Transformations.” [SC’14]
q GPU Code Generation
§ S. Verdoolaege, J.C Juega. A. Cohen, J.I Gomez, C. Tenllado, F. Catthoor, “Polyhedral parallel code
generation for CUDA” [ACM TACO2013]
§ J. Shirako, A. Hayashi, V. Sarkar., “Optimized Two-level Parallelization for GPU Accelerators using the
Polyhedral Model” [CC’17]
50