SlideShare a Scribd company logo
1 of 50
Download to read offline
Introduction to Polyhedral Compilation
Akihiro Hayashi, Jun Shirako
Rice University

1
Outline
q High-level Summary
q Theory
q Compilers and Tools
2
HIGH LEVEL SUMMARY
Introduction to Polyhedral Compilation
3
q The first priority is “performance” 
4	
Supercomputers
Personal Computers
Smartphones
Embedded
Pictures Borrowed From : commons.wikimedia.org, www.hirt-japan.info
Parallel Computing
Parallel programming is hard…
5	
DRAM
L3 Cache
Core Core Core Core
L2 Cache L2 Cache
SIMD SIMD SIMD SIMD
L1$ L1$ L1$ L1$
DRAM(slowest)–Register(fastest)
Exploiting
SIMD
Scheduling tasks on CPUs
Optimizing
Data Locality
Multi-core CPUs Many-core GPUs
C
L2 Cache
DRAM
C
C
C
C
C
C
C
C C C C
C
C
C
C
C
C
C
C
C C C C
Utilizing
Accelerators
A gap between domain experts and
hardware
6	
Application Domain

(Domain Experts)
Prog Lang.
Compilers
Runtime
Want to get significant
performance
improvement easily
(Performance Portability)
Hard to exploit the full
capability of hardware
We believe
Languages and
Compilers are
very important! 
Hardware
(Concurrency Experts)
A review of literature
q Automatic Parallelizing Compiler 
§  IBM XL Compilers, Intel Compilers, OSCAR, Pluto, Polly,
Polaris, R-Stream, SUIF, …
q Parallel Languages
§  Language-based:
ü Cilk, CUDA, OpenCL, C++AMP, Java, Habanero C/Java, PGAS, …
§  Directive-based:
ü OpenMP, OpenACC, OmpSs, …
§  Library-based:
ü Charm++, TBB, Thrust, RAJA, Kokkos, UPC++, HJLib, …
7
From the perspective of
compilers…
q Compilers are one of the most
complicated software L
§  Pointer Analysis
§  Scalar Optimizations
§  Loop Transformations
§  Vectorization/SIMDization
§  Scheduling
§  Exploiting accelerators
§  …
8	
Credits: dragon by Cassie McKown from the Noun Project, crossed swords by anbileru adaleru from the Noun Project, https://en.wikipedia.org/
What are compilers doing?
9	
x = a + b;
y = a + b;
z = x + y;
+
a b
+
a b
+
Intermediate Representation
(e.g. AST)
Programs
x = a + b;
y = x;
z = x + y;
“Optimized” Code
Parsing Optimizations
What are compilers doing?
10	
q Compiler can modify programs (e.g. change the execution
order of statements) as long as maintaining the semantics
of programs 
x = a + b;
y = a + b;
z = x + y;
+
a b
+
a b
+
Intermediate Representation
(e.g. AST)
Programs
x = a + b;
y = x;
z = x + y;
“Optimized” Code
z = x + y;
x = a + b;
Examples of optimizations:

Scalar optimizations
11	
x = a + b;
y = x;
z = x + y;
x = a + b;
y = a + b;
z = x + y;
a = 0;
if (a) {
…
}
Constant	
Propaga4on	
a = 0;
if (0) {
…
}
Dead	Code	
Elimina4on	
a = 0;
CSE
Examples of optimizations:

loop permutation (interchange)
12	
for (i = 0; i < M; i++) {
for (j = 0; j < N; j++) {
b[i][j] = a[i][j];
}
}
for (j = 0; j < N; j++) {
for (i = 0; i < M; i++) {
b[i][j] = a[i][j];
}
}
Offset access
(faster on CPUs)
Stride access
(slower on CPUs)
Interchanged	Original
Examples of optimizations:

loop fusion/distribution
13	
for (i = 0; i < N; i++) {
a[i] = b[i] + c[i];
d[i] = a[i] + e[i];
}
for (i = 0; i < N; i++) {
a[i] = b[i] + c[i];
}
for (i = 0; i < N; i++) {
d[i] = a[i] + e[i];
}
Better temporal locality
on CPUs
Fused	 Distributed	
Good for Vectorization
on CPUs
Depending on the loop size “N”
The phase-ordering problem
q Which order is better?
14	
a = 0;
if (a) {
…
}
Dead	Code	
Elimina4on	
a = 0;
if (a) {
…
}
a = 0;
if (0) {
…
}
Constant	
Propaga4on	
a = 0;
if (a) {
…
}
Constant	
Propaga4on	
a = 0;
if (0) {
…
}
Dead	Code	
Elimina4on	
a = 0;
15	
x = a + b;
y = a + b;
z = x + y;
+
a b
+
a b
+
ASTPrograms
x = a + b;
y = x;
z = x + y;
“Optimized” Code
AST vs. The Polyhedral Model
i >= 0;
i < N;
…
Polyhedron
(Affine Inequalities) “Synthesized” Code
TODAY
AST
Why Polyhedral Model?
q One solution for tackling the phase-ordering problem
q Good for performing a set of loop transformations 
§  Loop permutation
§  Loop fusion/distribution
§  Loop tiling 
§  …
16	
“The Polyhedral Model is a convenient alternative representation 

which combines analysis power, expressiveness and high flexibility”
- OpenScop Specification and Library
THEORY
Introduction to Polyhedral Compilation
17
The polyhedral model in a nutshell
q  The polyhedral transformation = “scheduling (determine the execution order of statements)”

q  3 important things:
§  Domain: A set of instances for a statement 
§  Scattering (Scheduling): an instance -> time stamp
§  Access: an instance -> array element(s)
q  Limitation: Only applicable for Static Control Part (SCoP) in general 
§  Loop bounds and conditionals are affine functions of the surrounding the loop iterators
18	
for (i=1; …){
S1;
for (j=1; …)
S2;
1 ≤ iS1
≤ 2;
1 ≤ iS2
≤ 2;
1 ≤ jS2
≤ 3;
iS1
= iS2
;
InequalitiesProgram
Constraints:
Cost Function:
ILP
δe
(
!
s,
!
t) = φSj
(
!
t) − φSi
(
!
s)
for (i=1; …){
S1;
}
for (i=1; …) {
…;
“Synthesized” Code
Ci
− Cj
≥ 0,!
Representation of “Domain”
q Observations:
§  S1 is executed 30
times (30 instances)
§  Each instance is
associated with (i,j)
19	
for (i=1; i <= 5; i++)
for (j=1; j <= 6;j++)
S1;
“The key aspect of the polyhedral model is to consider statement instances.”
- OpenScop Specification and Library
Iteration Domain
q  A set of constraints to represent instances of a statement
§  Using iteration vectors (i,j);
§  If those constraints are affine -> Polyhedron 
20	
for (i=1; i <= 5; i++)
for (j=1; j <= 6;j++)
S1;
1 ≤ i ≤ 5,1 ≤ j ≤ 6;
DS1
=
1 0 −1
−1 0 5
0 1 −1
0 −1 6
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
i
j
1
⎛
⎝
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
≥ 0
Credits: Clint (https://www.ozinenko.com/clint)
Representation of “Scheduling”:

1-dimensional schedules
q Function T: returns the logical date of each statement 
21	
x = a + b; // S1
y = a + b; // S2
z = x + y; // S3
T_S1 = 0;
T_S2 = 1;
T_S3 = 2;
LogicalTime
T=0
T=1
T=2
Representation of “Scheduling”:

multi-dimensional schedules
22	
x = a + b; // S1
for (i = 0; i < 2; i++) {
 a[i] = x; // S2
}
z = x + y; // S3
LogicalTime
T=1
T=2
T_S1 = (0);
T_S2(0) = (1, 0);
T_S2(1) = (1, 1);
T_S3 = (2)
T=0
i=0
i=1
q Function T: returns the logical date of each statement
q Logical dates may be multi-dimensional (c.f. clocks
§  Lexicographical Order: 
§  C.f. Clocks (days, hours, minutes, seconds)
TS1
≺ TS2
≺ TS3
⇔ (0) ≺ (1,i) ≺ (2)
Representation of “Scheduling”:

multi-dimensional schedules
23	
x = a + b; // S1
for (i = 0; i < 2; i++) {
 a[i] = x; // S2
}
z = x + y; // S3
LogicalTime
T=1
T=2
T_S1 = (0);
T_S2(i) = (1, i);
T_S3 = (2)
T=0
i=0
i=1
Parameterized:
Recall “Iteration
domain”
0 ≤ i < 2
q Function T: returns the logical date of each statement
q Logical dates may be multi-dimensional (c.f. clocks
§  Lexicographical Order: 
§  C.f. Clocks (days, hours, minutes, seconds)
TS1
≺ TS2
≺ TS3
⇔ (0) ≺ (1,i) ≺ (2)
Representation of “Scheduling”:

multi-dimensional schedules 
24	
x = a + b; // S1
for (i = 0; i < 2; i++) {
 a[i] = x; // S2
}
for (i = 0; i < 2; i++) {
for (j = 0; j < 3; j++) {
b[i][j] += a[i]; // S3
}
}
LogicalTime
T=1
T=2
T_S1 = (0);
T_S2(i) = (1, i);
T_S3(i,j) = (2, i, j);
T=0
i=0
i=1
j=0
i=1
j=0
i=1
i=0
i=1
Loop transformations with schedules
25	
for (i = 0; i < 2; i++) {
for (j = 0; j < 3; j++) {
b[i][j] = ...; // S1
}
}
for (i = 0; i < 2; i++) {
for (j = 0; j < 3; j++) {
b[i][j] = ...; // S1
}
}
TS1
(i, j) = 1 0
0 1
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟ = i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
T_S1(i, j) = (i, j);
T_S1(i, j) = (i, j);
Original	schedule	
New	schedule	
New
Schedule
Iteration 

Vector
Original	
New	
Transformation
Loop transformations with schedules: 

Loop Reversal
26	
for (i = 0; i < 2; i++) {
for (j = 0; j < 3; j++) {
b[i][j] = ...; // S1
}
}
T_S1(i, j) = (i, j);
T_S1(i, j) = (-i, j);
Original	schedule	
New	schedule	
Original	
New	
TS1
(i,j) = −1 0
0 1
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟ = −i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
New
Schedule
Iteration 

VectorTransformation
for (i = -1; i <= 0; i++) {
for (j = 0; j < 3; j++) {
b[-i][j] = ...; // S1
}
}
inew
= −iold
;
iold
→ −inew
;
Loop transformations with schedules: 

Loop Permutation
27	
for (i = 0; i < 2; i++) {
for (j = 0; j < 3; j++) {
b[i][j] = ...; // S1
}
}
for (j = 0; j < 3; j++) {
  for (i = 0; i < 2; i++)
b[i][j] = ...; // S1
}
}
T_S1(i, j) = (i, j);
T_S1(i, j) = (j, i);
Original	schedule	
New	schedule	
Original	
New	
TS1
(i,j) = 0 1
1 0
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟ = j
i
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
New
Schedule
Iteration 

VectorTransformation
Loop transformations with schedules: 

Loop Skewing
28	
for (i = 1; i <= 5; i++) {
for (j = 1; j <= 5; j++) {
a[i][j] = a[i-1][j+1]; // S1
}
}
for (i = 1; i <= 5; i++) {
for (j = i+1; j <= i+5; j++) {
a[i][j-i] = 

a[i-1][j-i+1]; // S1
}
}
T_S1(i, j) = (i, j);
T_S1(i, j) = (i, i+j);
Original	schedule	
New	schedule	
Original	
New	
TS1
(i,j) = 1 0
1 1
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟ = i
i + j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
New
Schedule
Iteration 

VectorTransformation
jnew
= i + jold
;
jold
→ jnew
− i;
Loop transformations with schedules: 

Loop Skewing (Cont’d)
29	
TS1
= 1 0
1 1
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
Credits: Clint (https://www.ozinenko.com/clint)
(i,i+j)
=
(1,2);
(1,3);
(1,4);
(1,5);
(2,3);
(2,4);
(2,5);
(2,6);
(2,7);
(3,4);
(3,5);
(3,6);
(3,7);
(3,8);
(4,5);
…
(i,j)
=
(1,1);
(1,2);
(1,3);
(1,4);
(1,5);
(2,1);
(2,2);
(2,3);
(2,4);
(2,5);
(3,1);
(3,2);
(3,3);
(3,4);
(3,5);
…
Dependence
Execution Order
Scalar Dimensions in schedules
q 2d+1 format (d+d+1) 
q Can represent/transform imperfectly nested loops
§  e.g., Loop fusion/distribution
30	
for (i = 0; i < 2; i++)
s[i] = ...; // S1
for (j = 0; j < 3; j++)
a[i][j] = ...; // S2
for (i = 0; i < 2; i++)
for (j = 0; j < 3; j++)
b[i] = ...; // S3
T_S1(i) = (0, i, 0);
T_S2(i,j) = (0, i, 1, j, 0);
T_S3(i,j) = (1, i, 0, j, 0)
Loop transformations to schedules

loop fusion w/ scalar dimensions
31	
for (i = 0; i < 2; i++)
for (j = 0; j < 3; j++)
a[i] = ...; // S1
for (i = 0; i < 2; i++)
for (j = 0; j < 3; j++)
b[i] = ...; // S2
for (i = 0; i < 2; i++)
for (j = 0; j < 3; j++)
a[i] = ...; // S1
for (j = 0; j < 3; j++)
b[i] = ...; // S2
T_S1(i,j) = (0, i, 0, j);
T_S2(i,j) = (1, i, 0, j);
T_S1(i,j) = (0, i, 0, j);
T_S2(i,j) = (0, i, 1, j);
Original	schedule	
New	schedule	
TS2
(i,j) =
0 0
1 0
0 0
0 1
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟ +
0
0
1
0
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
=
0
i
1
j
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
New 

Schedule
Scalar
DimensionsTransformation
Original	
New
Schedules in general
32	
TS
(
!
i) =
φS
1
(
!
i)
φS
2
(
!
i)
φS
3
(
!
i)
φS
4
(
!
i)
"
φS
d
(
!
i)
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
=
C11
S
C12
S
C13
S
C14
S
" C1mS
S
C21
S
C22
S
C23
S
C24
S
" C2mS
S
C31
S
C32
S
C33
S
C34
S
" C3mS
S
C41
S
C42
S
C43
S
C44
S
" C4mS
S
# # # # $ #
Cd 1
S
Cd 2
S
Cd 3
S
Cd 4
S
" CdmS
S
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
!
i( ) +
C10
S
C20
S
C30
S
C40
S
!
Cd 0
S
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
Scalar DimensionsA transformation for an iteration vector
d
mS
d
1
Schedules	
e.g.,	(0,	i,	0,	j)	
d = 2mS
+ 1, mS
= the size of iteration vector
Schedules in general
33	
TS
(
!
i) =
φS
1
(
!
i)
φS
2
(
!
i)
φS
3
(
!
i)
φS
4
(
!
i)
"
φS
d
(
!
i)
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
=
C11
S
C12
S
C13
S
C14
S
" C1mS
S
C21
S
C22
S
C23
S
C24
S
" C2mS
S
C31
S
C32
S
C33
S
C34
S
" C3mS
S
C41
S
C42
S
C43
S
C44
S
" C4mS
S
# # # # $ #
Cd 1
S
Cd 2
S
Cd 3
S
Cd 4
S
" CdmS
S
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
!
i( ) +
C10
S
C20
S
C30
S
C40
S
!
Cd 0
S
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
Scalar DimensionsA transformation for an iteration vector
d
mS
d
1
Schedules	
e.g.,	(0,	i,	0,	j)	
d = 2mS
+ 1, mS
= the size of iteration vector
Goal: 

Compute the coefficients and offsets for each statement
Legality of transformations
q All transformations are valid? NO!
 34	
for (i = 1; i <= 10; i++)
s[i] = ...; // S1
for (j = 0; j < 3; j++)
a[i][j] = s[i]; // S2
T_S1(i) = (0, i, 0);
T_S2(i,j) = (0, i, 1, j, 0);
for (i = 1; i <= 10; i++)
for (j = 0; j < 3; j++)
a[i][j] = s[i]; // S2
s[i] = ...; // S1
T_S2(i,j) = (0, i, 0, j, 0);
T_S2(i) = (0, i, 1);
Original	
New	
Transforma4on
Dependences
q Three types of dependence:
§  Read-After-Write: (a=1; then b=a;) 
§  Write-After-Read: (b=a; then a=1;)
§  Write-After-Write: (a=1; then a=2;)
q Dependence: computed from domain, access,
and schedule
§  Transformation = Find a new schedule that satisfies
all dependences
35
Dependence polyhedron
q  Dependence polyhedron : a set of inequalities ( )
§  A general and accurate representation of instance-wise dependences
36	
for (i = 1; i <= 10; i++)
s[i] = ...; // S1
for (j = 0; j < 3; j++)
a[i][j] = s[i]; // S2
iS1
= iS2
1 ≤ iS1
≤ 10,
1 ≤ iS2
≤ 10;
0 ≤ jS2
< 3;
1 −1 0 0
1 0 0 −1
−1 0 0 10
0 0 1 0
0 0 −1 2
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
iS1
iS2
jS2
1
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
=
≥
0
DS1
DS2
S1
S2
Credits: Clint (https://www.ozinenko.com/clint)
iS1
= iS2
⇒ iS1
− iS2
≥ 0 ∧ iS2
− iS1
≥ 0
Legality of transformations
q Dependence polyhedron: 
q Legality:
§  
§  If “source” instance must happen before
“target” instance in the original program, the
transformed program must preserve this
property (must satisfy the dependence)
37	
∀ s,t ∈ Pe
,(s ∈ DSi
,t ∈ DSj
),TSi
(s) ≺ TSj
(t)
Pe
Putting it all together
q Goal : Compute all coefficients and offsets such that 
38	
TS2
(i, j) =
C11
S2
C12
S2
C21
S2
C22
S2
C31
S2
C32
S2
C41
S2
C32
S2
C51
S2
C52
S2
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
iS2
jS2
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
+
C10
S2
C20
S2
C30
S2
C40
S2
C50
S2
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
TS1
(i) =
C11
S1
C21
S1
C31
S1
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
iS1( ) +
C10
S1
C20
S1
C30
S1
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
1 −1 0 0
1 0 0 −1
−1 0 0 10
0 0 1 0
0 0 −1 2
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
iS1
iS2
jS2
1
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
=
≥
0
∀ s,t ∈ Pe
,(s ∈ DS1
,t ∈ DS2
),TS1
(s) ≺ TS2
(t)
Dependence	Polyhedron		 PeSchedules	
iS1
= iS2
1 ≤ iS1
≤ 10,
1 ≤ iS2
≤ 10;
0 ≤ jS2
< 3;
Linearizing the legality condition

(The Pluto Algorithm)
q The Legality condition (for iteration vectors)

q Uniform dependences : distance between two dependent
iteration is a constant ( is a constant)
q  Non-uniform dependences : distance between two
dependence varies ( is a function of j )
§  Apply the Farkas lemma


 39	
δ(s,t) = (c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s ≥ 0, s,t ∈ P
i → i + 1 ⇒ δ(s,t)
i → i + j ⇒ δ(s,t)
(c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s ≥ 0, s,t ∈ Pe
⇔
(c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s ≡ λe0
+ λek
k =1
me
∑ Pe
k
, λek
≥ 0
Each inequality in
a dependence
polyhedron
Cost Function & Objective Function

(The Pluto Algorithm)
q Compute all coefficients and offsets under the legality
condition : Solve an ILP problem 
q Cost Function = Transformation policy
§  Pluto’s cost function = dependence distance
ü Fuse loops as much as possible
ü Push loops carrying dependence inner level
§  Also used in ISL (Polly, PPCG, …) 
q Objective Function:
§  Iteratively find linearly independent solutions
40	
δ(s,t) = (c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s, s,t ∈ P
minimize ≺ (u1
,w,c1
Sj
,c2
Sj
)
Step-by-step example
41	
for (i = 0; i < N; i++) {
for (j = 1; j < N; j++) {
a[i][j] = a[j][i] + a[i][j-1]; // S1
}
}
a[0][1] = a[1][0] + a[0][0]; // S1(0,1)
a[0][2] = a[2][0] + a[0][1]; // S1(0,2)
a[0][3] = a[3][0] + a[0][2]; // S1(0,3)
...
a[1][1] = a[1][1] + a[1][0]; // S1(1,1)
a[1][2] = a[2][1] + a[1][1]; // S1(1,2)
a[1][3] = a[3][1] + a[1][2]; // S1(1,3)
...
a[2][1] = a[1][2] + a[2][0]; // S1(2,1)
a[2][2] = a[2][2] + a[2][1]; // S1(2,2)
a[2][3] = a[3][2] + a[2][2]; // S1(2,3)
...
a[3][1] = a[1][3] + a[3][0]; // S1(3,1)
Dependence 1 (RAW)
Dependence 2 (RAW)
Dependence 3 (WAR)
(is
, js
) → (it
, jt
)
c1
S1
,c2
S1
( )
it
jt
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
− c1
S1
,c2
S1
( )
is
js
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
≥ 0, is
, js
,it
, jt
∈ Pe1
⇒ c1
S1
it
+ c2
S1
jt
− (c1
S1
is
+ c2
S1
js
) = c1
S1
it
+ c2
S1
jt
− (c1
S1
it
+ c2
S1
(jt
− 1)) ≥ 0
⇒ c2
S1
≥ 0
Step-by-step example:

Legality Constraints 1 (The Pluto Algorithm)
q Dependence 1 : RAW (flow dependence )
42	
Source: a[0][1] = a[1][0] + a[0][0]; // S1(0,1)
Target: a[0][2] = a[2][0] + a[0][1]; // S1(0,2)
...
Pe1
: is
= it
, js
= jt
− 1,0 ≤ it
≤ N − 1,2 ≤ jt
≤ N
δ(s,t) = (c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s ≥ 0, s,t ∈ PLegality Constraints:
Dependence	Polyhedron		Pe1
Step-by-step example:

Legality Constraints 2 (The Pluto Algorithm)
q Dependence 2 : RAW (flow dependence )
43	
Pe2
: is
= jt
, js
= it
,1 ≤ it
≤ N,2 ≤ jt
≤ N,it
− jt
≥ 1
c1
S1
,c2
S1
( )
it
jt
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
− c1
S1
,c2
S1
( )
is
js
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
≥ 0, is
, js
,it
, jt
∈ Pe1
⇒ c1
S1
it
+ c2
S1
jt
− (c1
S1
is
+ c2
S1
js
) = c1
S1
it
+ c2
S1
jt
− (c1
S1
jt
+ c2
S1
it
) ≥ 0
⇒ (c1
S1
− c2
S1
)it
+ (c2
S1
− c1
S1
)jt
≥ 0,1 ≤ it
≤ N,2 ≤ jt
≤ N,it
− jt
≥ 1
δ(s,t) = (c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s ≥ 0, s,t ∈ PLegality Constraints:
Dependence	Polyhedron		Pe2
Target: a[1][2] = a[2][1] + a[1][1]; // S1(1,2)
...
Source: a[2][1] = a[1][2] + a[2][0]; // S1(2,1)
(is
, js
) → (it
, jt
)
Farkas	Lemma	+	Fourier	Mozkin	 c1
S1
− c2
S1
≥ 0
Step-by-step example:

Putting it all together (The Pluto Algorithm)
q Dependence 1

q Dependence 2 & 3
q Avoiding zero vector
q Objective Function:


44	
c2
S1
≥ 0,w ≥ c2
S1
c1
S1
− c2
S1
≥ 0,u1
≥ 0,u1
≥ c1
S1
− c2
S1
,3u1
+ w ≥ c1
S1
− c2
S1
c1
S1
+ c2
S1
≥ 1
minimize ≺ (u1
,w,c1
S1
,c2
S1
) → (0,1,1,1)
Constraints using parameter N

that bound the dependence distances
Find linearly Independent answer TS1
(i,j) = 1 1
1 0
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
Summary
q  The polyhedral transformation = “scheduling (determine the execution order of statements)”

q  3 important things:
§  Domain: A set of instances for a statement 
§  Scattering (Scheduling): an instance -> time stamp
§  Access: an instance -> array element(s)
q  Limitation: Only applicable for Static Control Part (SCoP) in general 
§  Loop bounds and conditionals are affine functions of the surrounding the loop iterators
45	
for (i=1; …){
S1;
for (j=1; …)
S2;
1 ≤ iS1
≤ 2;
1 ≤ iS2
≤ 2;
1 ≤ jS2
≤ 3;
iS1
= iS2
;
InequalitiesProgram
Constraints:
Cost Function:
ILP
δe
(
!
s,
!
t) = φSj
(
!
t) − φSi
(
!
s)
for (i=1; …){
S1;
}
for (i=1; …) {
…;
“Synthesized” Code
Ci
− Cj
≥ 0,!
COMPILERS AND TOOLS
Introduction to Polyhedral Compilation
46
Polyhedral Compilers & Tools
q PoCC (The Polyhedral Compiler Collection)
§  http://web.cs.ucla.edu/~pouchet/software/pocc/
§  Clan: extract a polyhedral IR from the source code
§  Candl: a dependence analyzer
§  LetSee: legal transformation space explorer
§  PLuTo: an automatic parallelizer and locality
optimizer 
§  CLooG: code generation from the polyhedral IR
47
Polyhedral Compilers & Tools
q Polly
§  http://polly.llvm.org/
§  ISL: Integer Set Library (including code generator)
q Clay/Chrole/Clint
§  https://www.ozinenko.com/projects
§  Clay: “Chunky Loop Alteration wizardrY”
§  Chrole: “Recovering high-level syntactic description of the
automatically computed polyhedral optimization”
§  Clint: “Interactive graphical interface to the manual and
compiler-assisted program restructuring in the polyhedral
model”
48
Clint
49
Further readings
q  Fundamentals
§  OpenScop Specification
ü  http://icps.u-strasbg.fr/people/bastoul/public_html/development/openscop/docs/openscop.html
§  ISL
ü  https://lirias.kuleuven.be/bitstream/123456789/270231/1/icms2010verdoolaege.pdf
q  Pluto algorithm
§  U. Bondhugula, “Effective Automatic Parallelization and Locality Optimization Using The Polyhedral
Model” (PhD Dissertation, 2010)
§  U. Bondhugula, A. Hartono, J. Ramanujam, P. Sadayappan, “A Practical Automatic Polyhedral
Parallelizer and Locality Optimizer.” [PLDI’08]
q  Polly
§  T. Grosser, S. Verdoolaege, A. Cohen, “Polyhedral AST generation is more than scanning
polyhedra” [ACM TOPLAS2015]
q  Polyhedral model + AST-based Cost Function
§  J. Shirako, L.N. Pouchet, V. Sarkar, “Oil and Water Can Mix: An Integration of Polyhedral and AST-
based Transformations.” [SC’14]
q  GPU Code Generation
§  S. Verdoolaege, J.C Juega. A. Cohen, J.I Gomez, C. Tenllado, F. Catthoor, “Polyhedral parallel code
generation for CUDA” [ACM TACO2013]
§  J. Shirako, A. Hayashi, V. Sarkar., “Optimized Two-level Parallelization for GPU Accelerators using the
Polyhedral Model” [CC’17]
 50

More Related Content

What's hot

Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6Ono Shigeru
 
TVMの次期グラフIR Relayの紹介
TVMの次期グラフIR Relayの紹介TVMの次期グラフIR Relayの紹介
TVMの次期グラフIR Relayの紹介Takeo Imai
 
Rate-Distortion Function for Gamma Sources under Absolute-Log Distortion
Rate-Distortion Function for Gamma Sources under Absolute-Log DistortionRate-Distortion Function for Gamma Sources under Absolute-Log Distortion
Rate-Distortion Function for Gamma Sources under Absolute-Log Distortion奈良先端大 情報科学研究科
 
The Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and FoldThe Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and FoldPhilip Schwarz
 
[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example Detection
[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example Detection[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example Detection
[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example DetectionDeep Learning JP
 
One Monad to Rule Them All
One Monad to Rule Them AllOne Monad to Rule Them All
One Monad to Rule Them AllJohn De Goes
 
Tensor flow usergroup 2016 (公開版)
Tensor flow usergroup 2016 (公開版)Tensor flow usergroup 2016 (公開版)
Tensor flow usergroup 2016 (公開版)Hiroki Nakahara
 
(公開版)Reconf研2017GUINNESS
(公開版)Reconf研2017GUINNESS(公開版)Reconf研2017GUINNESS
(公開版)Reconf研2017GUINNESSHiroki Nakahara
 
Boost.Preprocessorでプログラミングしましょう
Boost.PreprocessorでプログラミングしましょうBoost.Preprocessorでプログラミングしましょう
Boost.Preprocessorでプログラミングしましょうdigitalghost
 
Brief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsBrief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsAhmed Gad
 
Demand-Driven Context-Sensitive Alias Analysis for Java
Demand-Driven Context-Sensitive Alias Analysis for JavaDemand-Driven Context-Sensitive Alias Analysis for Java
Demand-Driven Context-Sensitive Alias Analysis for JavaDacong (Tony) Yan
 
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術について
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術についてAIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術について
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術についてFixstars Corporation
 
PRML輪読#6
PRML輪読#6PRML輪読#6
PRML輪読#6matsuolab
 
Neural Processes Family
Neural Processes FamilyNeural Processes Family
Neural Processes FamilyKota Matsui
 
Vivado hls勉強会3(axi4 lite slave)
Vivado hls勉強会3(axi4 lite slave)Vivado hls勉強会3(axi4 lite slave)
Vivado hls勉強会3(axi4 lite slave)marsee101
 
CNNの誤差逆伝播/Deconvolutionの計算過程
CNNの誤差逆伝播/Deconvolutionの計算過程CNNの誤差逆伝播/Deconvolutionの計算過程
CNNの誤差逆伝播/Deconvolutionの計算過程ssuser87f46e
 

What's hot (20)

Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
 
TVMの次期グラフIR Relayの紹介
TVMの次期グラフIR Relayの紹介TVMの次期グラフIR Relayの紹介
TVMの次期グラフIR Relayの紹介
 
Rate-Distortion Function for Gamma Sources under Absolute-Log Distortion
Rate-Distortion Function for Gamma Sources under Absolute-Log DistortionRate-Distortion Function for Gamma Sources under Absolute-Log Distortion
Rate-Distortion Function for Gamma Sources under Absolute-Log Distortion
 
最急降下法
最急降下法最急降下法
最急降下法
 
直交領域探索
直交領域探索直交領域探索
直交領域探索
 
The Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and FoldThe Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and Fold
 
[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example Detection
[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example Detection[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example Detection
[DL輪読会]Understanding Measures of Uncertainty for Adversarial Example Detection
 
One Monad to Rule Them All
One Monad to Rule Them AllOne Monad to Rule Them All
One Monad to Rule Them All
 
楕円曲線と暗号
楕円曲線と暗号楕円曲線と暗号
楕円曲線と暗号
 
Tensor flow usergroup 2016 (公開版)
Tensor flow usergroup 2016 (公開版)Tensor flow usergroup 2016 (公開版)
Tensor flow usergroup 2016 (公開版)
 
(公開版)Reconf研2017GUINNESS
(公開版)Reconf研2017GUINNESS(公開版)Reconf研2017GUINNESS
(公開版)Reconf研2017GUINNESS
 
TVM の紹介
TVM の紹介TVM の紹介
TVM の紹介
 
Boost.Preprocessorでプログラミングしましょう
Boost.PreprocessorでプログラミングしましょうBoost.Preprocessorでプログラミングしましょう
Boost.Preprocessorでプログラミングしましょう
 
Brief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsBrief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNs
 
Demand-Driven Context-Sensitive Alias Analysis for Java
Demand-Driven Context-Sensitive Alias Analysis for JavaDemand-Driven Context-Sensitive Alias Analysis for Java
Demand-Driven Context-Sensitive Alias Analysis for Java
 
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術について
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術についてAIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術について
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術について
 
PRML輪読#6
PRML輪読#6PRML輪読#6
PRML輪読#6
 
Neural Processes Family
Neural Processes FamilyNeural Processes Family
Neural Processes Family
 
Vivado hls勉強会3(axi4 lite slave)
Vivado hls勉強会3(axi4 lite slave)Vivado hls勉強会3(axi4 lite slave)
Vivado hls勉強会3(axi4 lite slave)
 
CNNの誤差逆伝播/Deconvolutionの計算過程
CNNの誤差逆伝播/Deconvolutionの計算過程CNNの誤差逆伝播/Deconvolutionの計算過程
CNNの誤差逆伝播/Deconvolutionの計算過程
 

Viewers also liked

Виктор Ерухимов Open VX mixar moscow sept'15
Виктор Ерухимов Open VX  mixar moscow sept'15 Виктор Ерухимов Open VX  mixar moscow sept'15
Виктор Ерухимов Open VX mixar moscow sept'15 mixARConference
 
Epoxy composites thermal conductivity enhancement
Epoxy composites thermal conductivity enhancementEpoxy composites thermal conductivity enhancement
Epoxy composites thermal conductivity enhancementrajesh kumar
 
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li..."The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...Edge AI and Vision Alliance
 
"Industrial applications of Composite Polymers" by Oan
"Industrial applications of Composite Polymers" by Oan"Industrial applications of Composite Polymers" by Oan
"Industrial applications of Composite Polymers" by OanOan Sahito
 
Advanced Composite Materials & Technologies for Defence
Advanced Composite  Materials & Technologies for  DefenceAdvanced Composite  Materials & Technologies for  Defence
Advanced Composite Materials & Technologies for DefenceDigitech Rathod
 
Epoxy/CNT nanocomposites
Epoxy/CNT nanocompositesEpoxy/CNT nanocomposites
Epoxy/CNT nanocompositeszenziyan
 
Svnit composite materials
Svnit composite materialsSvnit composite materials
Svnit composite materialsNEERAJ PARMAR
 
Epoxy resin presented by biswajit maity
Epoxy resin  presented by biswajit maityEpoxy resin  presented by biswajit maity
Epoxy resin presented by biswajit maityBiswajit Maity
 
composite materials in aerospace application seminar
 composite materials in aerospace application seminar composite materials in aerospace application seminar
composite materials in aerospace application seminarChuchu Beera
 
Composite materials
Composite materialsComposite materials
Composite materialsStudent
 

Viewers also liked (16)

Виктор Ерухимов Open VX mixar moscow sept'15
Виктор Ерухимов Open VX  mixar moscow sept'15 Виктор Ерухимов Open VX  mixar moscow sept'15
Виктор Ерухимов Open VX mixar moscow sept'15
 
Epoxy composites thermal conductivity enhancement
Epoxy composites thermal conductivity enhancementEpoxy composites thermal conductivity enhancement
Epoxy composites thermal conductivity enhancement
 
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li..."The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
 
Nano Technology Pipe Coating
Nano Technology Pipe CoatingNano Technology Pipe Coating
Nano Technology Pipe Coating
 
Composite Materials, Advanced Composite Bicycle Frame IDM12
Composite Materials, Advanced Composite Bicycle Frame IDM12Composite Materials, Advanced Composite Bicycle Frame IDM12
Composite Materials, Advanced Composite Bicycle Frame IDM12
 
Increasing temperature resistance - Highlight
Increasing temperature resistance - HighlightIncreasing temperature resistance - Highlight
Increasing temperature resistance - Highlight
 
"Industrial applications of Composite Polymers" by Oan
"Industrial applications of Composite Polymers" by Oan"Industrial applications of Composite Polymers" by Oan
"Industrial applications of Composite Polymers" by Oan
 
Epoxy resin
Epoxy resinEpoxy resin
Epoxy resin
 
Advanced Composite Materials & Technologies for Defence
Advanced Composite  Materials & Technologies for  DefenceAdvanced Composite  Materials & Technologies for  Defence
Advanced Composite Materials & Technologies for Defence
 
Epoxy/CNT nanocomposites
Epoxy/CNT nanocompositesEpoxy/CNT nanocomposites
Epoxy/CNT nanocomposites
 
Application of Composite Material in Aerospace Industry
Application of Composite Material in Aerospace IndustryApplication of Composite Material in Aerospace Industry
Application of Composite Material in Aerospace Industry
 
Svnit composite materials
Svnit composite materialsSvnit composite materials
Svnit composite materials
 
Epoxy resin presented by biswajit maity
Epoxy resin  presented by biswajit maityEpoxy resin  presented by biswajit maity
Epoxy resin presented by biswajit maity
 
composite materials in aerospace application seminar
 composite materials in aerospace application seminar composite materials in aerospace application seminar
composite materials in aerospace application seminar
 
Composite materials
Composite materialsComposite materials
Composite materials
 
Composite resin
Composite resinComposite resin
Composite resin
 

Similar to Introduction to Polyhedral Compilation

Dynamic programming
Dynamic programmingDynamic programming
Dynamic programmingShakil Ahmed
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMLinaro
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysisDr. Rajdeep Chatterjee
 
time_complexity_list_02_04_2024_22_pages.pdf
time_complexity_list_02_04_2024_22_pages.pdftime_complexity_list_02_04_2024_22_pages.pdf
time_complexity_list_02_04_2024_22_pages.pdfSrinivasaReddyPolamR
 
Yoyak ScalaDays 2015
Yoyak ScalaDays 2015Yoyak ScalaDays 2015
Yoyak ScalaDays 2015ihji
 
CS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfCS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfssuser034ce1
 
DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..KarthikeyaLanka1
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to Algorithmspppepito86
 
Event driven simulator
Event driven simulatorEvent driven simulator
Event driven simulatorSahil Abrol
 
Practical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentPractical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentRaphael Reitzig
 
Sparse Matrix and Polynomial
Sparse Matrix and PolynomialSparse Matrix and Polynomial
Sparse Matrix and PolynomialAroosa Rajput
 
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」Ken'ichi Matsui
 
Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesQuoc-Sang Phan
 
PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...Andrey Karpov
 
parallel programming.ppt
parallel programming.pptparallel programming.ppt
parallel programming.pptnazimsattar
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihmSajid Marwat
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.Tariq Khan
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 

Similar to Introduction to Polyhedral Compilation (20)

Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysis
 
time_complexity_list_02_04_2024_22_pages.pdf
time_complexity_list_02_04_2024_22_pages.pdftime_complexity_list_02_04_2024_22_pages.pdf
time_complexity_list_02_04_2024_22_pages.pdf
 
Yoyak ScalaDays 2015
Yoyak ScalaDays 2015Yoyak ScalaDays 2015
Yoyak ScalaDays 2015
 
CS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfCS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdf
 
DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to Algorithms
 
Event driven simulator
Event driven simulatorEvent driven simulator
Event driven simulator
 
Practical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentPractical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient Apportionment
 
Sparse Matrix and Polynomial
Sparse Matrix and PolynomialSparse Matrix and Polynomial
Sparse Matrix and Polynomial
 
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
 
Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo Theories
 
PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...
 
parallel programming.ppt
parallel programming.pptparallel programming.ppt
parallel programming.ppt
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihm
 
Time complexity.ppt
Time complexity.pptTime complexity.ppt
Time complexity.ppt
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.
 
Java operators
Java operatorsJava operators
Java operators
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 

More from Akihiro Hayashi

GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsAkihiro Hayashi
 
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Akihiro Hayashi
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesAkihiro Hayashi
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
LLVM-based Communication Optimizations for PGAS Programs
LLVM-based Communication Optimizations for PGAS ProgramsLLVM-based Communication Optimizations for PGAS Programs
LLVM-based Communication Optimizations for PGAS ProgramsAkihiro Hayashi
 
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...Akihiro Hayashi
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionAkihiro Hayashi
 
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...Akihiro Hayashi
 
LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...
LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...
LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...Akihiro Hayashi
 
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...Akihiro Hayashi
 
Accelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL GenerationAccelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL GenerationAkihiro Hayashi
 

More from Akihiro Hayashi (11)

GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
 
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
LLVM-based Communication Optimizations for PGAS Programs
LLVM-based Communication Optimizations for PGAS ProgramsLLVM-based Communication Optimizations for PGAS Programs
LLVM-based Communication Optimizations for PGAS Programs
 
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
 
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...
 
LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...
LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...
LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...
 
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
 
Accelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL GenerationAccelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL Generation
 

Recently uploaded

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Introduction to Polyhedral Compilation

  • 1. Introduction to Polyhedral Compilation Akihiro Hayashi, Jun Shirako Rice University 1
  • 3. HIGH LEVEL SUMMARY Introduction to Polyhedral Compilation 3
  • 4. q The first priority is “performance” 4 Supercomputers Personal Computers Smartphones Embedded Pictures Borrowed From : commons.wikimedia.org, www.hirt-japan.info Parallel Computing
  • 5. Parallel programming is hard… 5 DRAM L3 Cache Core Core Core Core L2 Cache L2 Cache SIMD SIMD SIMD SIMD L1$ L1$ L1$ L1$ DRAM(slowest)–Register(fastest) Exploiting SIMD Scheduling tasks on CPUs Optimizing Data Locality Multi-core CPUs Many-core GPUs C L2 Cache DRAM C C C C C C C C C C C C C C C C C C C C C C C Utilizing Accelerators
  • 6. A gap between domain experts and hardware 6 Application Domain
 (Domain Experts) Prog Lang. Compilers Runtime Want to get significant performance improvement easily (Performance Portability) Hard to exploit the full capability of hardware We believe Languages and Compilers are very important! Hardware (Concurrency Experts)
  • 7. A review of literature q Automatic Parallelizing Compiler §  IBM XL Compilers, Intel Compilers, OSCAR, Pluto, Polly, Polaris, R-Stream, SUIF, … q Parallel Languages §  Language-based: ü Cilk, CUDA, OpenCL, C++AMP, Java, Habanero C/Java, PGAS, … §  Directive-based: ü OpenMP, OpenACC, OmpSs, … §  Library-based: ü Charm++, TBB, Thrust, RAJA, Kokkos, UPC++, HJLib, … 7
  • 8. From the perspective of compilers… q Compilers are one of the most complicated software L §  Pointer Analysis §  Scalar Optimizations §  Loop Transformations §  Vectorization/SIMDization §  Scheduling §  Exploiting accelerators §  … 8 Credits: dragon by Cassie McKown from the Noun Project, crossed swords by anbileru adaleru from the Noun Project, https://en.wikipedia.org/
  • 9. What are compilers doing? 9 x = a + b; y = a + b; z = x + y; + a b + a b + Intermediate Representation (e.g. AST) Programs x = a + b; y = x; z = x + y; “Optimized” Code Parsing Optimizations
  • 10. What are compilers doing? 10 q Compiler can modify programs (e.g. change the execution order of statements) as long as maintaining the semantics of programs x = a + b; y = a + b; z = x + y; + a b + a b + Intermediate Representation (e.g. AST) Programs x = a + b; y = x; z = x + y; “Optimized” Code z = x + y; x = a + b;
  • 11. Examples of optimizations:
 Scalar optimizations 11 x = a + b; y = x; z = x + y; x = a + b; y = a + b; z = x + y; a = 0; if (a) { … } Constant Propaga4on a = 0; if (0) { … } Dead Code Elimina4on a = 0; CSE
  • 12. Examples of optimizations:
 loop permutation (interchange) 12 for (i = 0; i < M; i++) { for (j = 0; j < N; j++) { b[i][j] = a[i][j]; } } for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { b[i][j] = a[i][j]; } } Offset access (faster on CPUs) Stride access (slower on CPUs) Interchanged Original
  • 13. Examples of optimizations:
 loop fusion/distribution 13 for (i = 0; i < N; i++) { a[i] = b[i] + c[i]; d[i] = a[i] + e[i]; } for (i = 0; i < N; i++) { a[i] = b[i] + c[i]; } for (i = 0; i < N; i++) { d[i] = a[i] + e[i]; } Better temporal locality on CPUs Fused Distributed Good for Vectorization on CPUs Depending on the loop size “N”
  • 14. The phase-ordering problem q Which order is better? 14 a = 0; if (a) { … } Dead Code Elimina4on a = 0; if (a) { … } a = 0; if (0) { … } Constant Propaga4on a = 0; if (a) { … } Constant Propaga4on a = 0; if (0) { … } Dead Code Elimina4on a = 0;
  • 15. 15 x = a + b; y = a + b; z = x + y; + a b + a b + ASTPrograms x = a + b; y = x; z = x + y; “Optimized” Code AST vs. The Polyhedral Model i >= 0; i < N; … Polyhedron (Affine Inequalities) “Synthesized” Code TODAY AST
  • 16. Why Polyhedral Model? q One solution for tackling the phase-ordering problem q Good for performing a set of loop transformations §  Loop permutation §  Loop fusion/distribution §  Loop tiling §  … 16 “The Polyhedral Model is a convenient alternative representation 
 which combines analysis power, expressiveness and high flexibility” - OpenScop Specification and Library
  • 18. The polyhedral model in a nutshell q  The polyhedral transformation = “scheduling (determine the execution order of statements)” q  3 important things: §  Domain: A set of instances for a statement §  Scattering (Scheduling): an instance -> time stamp §  Access: an instance -> array element(s) q  Limitation: Only applicable for Static Control Part (SCoP) in general §  Loop bounds and conditionals are affine functions of the surrounding the loop iterators 18 for (i=1; …){ S1; for (j=1; …) S2; 1 ≤ iS1 ≤ 2; 1 ≤ iS2 ≤ 2; 1 ≤ jS2 ≤ 3; iS1 = iS2 ; InequalitiesProgram Constraints: Cost Function: ILP δe ( ! s, ! t) = φSj ( ! t) − φSi ( ! s) for (i=1; …){ S1; } for (i=1; …) { …; “Synthesized” Code Ci − Cj ≥ 0,!
  • 19. Representation of “Domain” q Observations: §  S1 is executed 30 times (30 instances) §  Each instance is associated with (i,j) 19 for (i=1; i <= 5; i++) for (j=1; j <= 6;j++) S1; “The key aspect of the polyhedral model is to consider statement instances.” - OpenScop Specification and Library
  • 20. Iteration Domain q  A set of constraints to represent instances of a statement §  Using iteration vectors (i,j); §  If those constraints are affine -> Polyhedron 20 for (i=1; i <= 5; i++) for (j=1; j <= 6;j++) S1; 1 ≤ i ≤ 5,1 ≤ j ≤ 6; DS1 = 1 0 −1 −1 0 5 0 1 −1 0 −1 6 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ i j 1 ⎛ ⎝ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ≥ 0 Credits: Clint (https://www.ozinenko.com/clint)
  • 21. Representation of “Scheduling”:
 1-dimensional schedules q Function T: returns the logical date of each statement 21 x = a + b; // S1 y = a + b; // S2 z = x + y; // S3 T_S1 = 0; T_S2 = 1; T_S3 = 2; LogicalTime T=0 T=1 T=2
  • 22. Representation of “Scheduling”:
 multi-dimensional schedules 22 x = a + b; // S1 for (i = 0; i < 2; i++) {  a[i] = x; // S2 } z = x + y; // S3 LogicalTime T=1 T=2 T_S1 = (0); T_S2(0) = (1, 0); T_S2(1) = (1, 1); T_S3 = (2) T=0 i=0 i=1 q Function T: returns the logical date of each statement q Logical dates may be multi-dimensional (c.f. clocks §  Lexicographical Order: §  C.f. Clocks (days, hours, minutes, seconds) TS1 ≺ TS2 ≺ TS3 ⇔ (0) ≺ (1,i) ≺ (2)
  • 23. Representation of “Scheduling”:
 multi-dimensional schedules 23 x = a + b; // S1 for (i = 0; i < 2; i++) {  a[i] = x; // S2 } z = x + y; // S3 LogicalTime T=1 T=2 T_S1 = (0); T_S2(i) = (1, i); T_S3 = (2) T=0 i=0 i=1 Parameterized: Recall “Iteration domain” 0 ≤ i < 2 q Function T: returns the logical date of each statement q Logical dates may be multi-dimensional (c.f. clocks §  Lexicographical Order: §  C.f. Clocks (days, hours, minutes, seconds) TS1 ≺ TS2 ≺ TS3 ⇔ (0) ≺ (1,i) ≺ (2)
  • 24. Representation of “Scheduling”:
 multi-dimensional schedules 24 x = a + b; // S1 for (i = 0; i < 2; i++) {  a[i] = x; // S2 } for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] += a[i]; // S3 } } LogicalTime T=1 T=2 T_S1 = (0); T_S2(i) = (1, i); T_S3(i,j) = (2, i, j); T=0 i=0 i=1 j=0 i=1 j=0 i=1 i=0 i=1
  • 25. Loop transformations with schedules 25 for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 } } for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 } } TS1 (i, j) = 1 0 0 1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ = i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ T_S1(i, j) = (i, j); T_S1(i, j) = (i, j); Original schedule New schedule New Schedule Iteration 
 Vector Original New Transformation
  • 26. Loop transformations with schedules: 
 Loop Reversal 26 for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 } } T_S1(i, j) = (i, j); T_S1(i, j) = (-i, j); Original schedule New schedule Original New TS1 (i,j) = −1 0 0 1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ = −i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ New Schedule Iteration 
 VectorTransformation for (i = -1; i <= 0; i++) { for (j = 0; j < 3; j++) { b[-i][j] = ...; // S1 } } inew = −iold ; iold → −inew ;
  • 27. Loop transformations with schedules: 
 Loop Permutation 27 for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 } } for (j = 0; j < 3; j++) {   for (i = 0; i < 2; i++) b[i][j] = ...; // S1 } } T_S1(i, j) = (i, j); T_S1(i, j) = (j, i); Original schedule New schedule Original New TS1 (i,j) = 0 1 1 0 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ = j i ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ New Schedule Iteration 
 VectorTransformation
  • 28. Loop transformations with schedules: 
 Loop Skewing 28 for (i = 1; i <= 5; i++) { for (j = 1; j <= 5; j++) { a[i][j] = a[i-1][j+1]; // S1 } } for (i = 1; i <= 5; i++) { for (j = i+1; j <= i+5; j++) { a[i][j-i] = 
 a[i-1][j-i+1]; // S1 } } T_S1(i, j) = (i, j); T_S1(i, j) = (i, i+j); Original schedule New schedule Original New TS1 (i,j) = 1 0 1 1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ = i i + j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ New Schedule Iteration 
 VectorTransformation jnew = i + jold ; jold → jnew − i;
  • 29. Loop transformations with schedules: 
 Loop Skewing (Cont’d) 29 TS1 = 1 0 1 1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ Credits: Clint (https://www.ozinenko.com/clint) (i,i+j) = (1,2); (1,3); (1,4); (1,5); (2,3); (2,4); (2,5); (2,6); (2,7); (3,4); (3,5); (3,6); (3,7); (3,8); (4,5); … (i,j) = (1,1); (1,2); (1,3); (1,4); (1,5); (2,1); (2,2); (2,3); (2,4); (2,5); (3,1); (3,2); (3,3); (3,4); (3,5); … Dependence Execution Order
  • 30. Scalar Dimensions in schedules q 2d+1 format (d+d+1) q Can represent/transform imperfectly nested loops §  e.g., Loop fusion/distribution 30 for (i = 0; i < 2; i++) s[i] = ...; // S1 for (j = 0; j < 3; j++) a[i][j] = ...; // S2 for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) b[i] = ...; // S3 T_S1(i) = (0, i, 0); T_S2(i,j) = (0, i, 1, j, 0); T_S3(i,j) = (1, i, 0, j, 0)
  • 31. Loop transformations to schedules
 loop fusion w/ scalar dimensions 31 for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) a[i] = ...; // S1 for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) b[i] = ...; // S2 for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) a[i] = ...; // S1 for (j = 0; j < 3; j++) b[i] = ...; // S2 T_S1(i,j) = (0, i, 0, j); T_S2(i,j) = (1, i, 0, j); T_S1(i,j) = (0, i, 0, j); T_S2(i,j) = (0, i, 1, j); Original schedule New schedule TS2 (i,j) = 0 0 1 0 0 0 0 1 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ + 0 0 1 0 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ = 0 i 1 j ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ New 
 Schedule Scalar DimensionsTransformation Original New
  • 32. Schedules in general 32 TS ( ! i) = φS 1 ( ! i) φS 2 ( ! i) φS 3 ( ! i) φS 4 ( ! i) " φS d ( ! i) ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ = C11 S C12 S C13 S C14 S " C1mS S C21 S C22 S C23 S C24 S " C2mS S C31 S C32 S C33 S C34 S " C3mS S C41 S C42 S C43 S C44 S " C4mS S # # # # $ # Cd 1 S Cd 2 S Cd 3 S Cd 4 S " CdmS S ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ! i( ) + C10 S C20 S C30 S C40 S ! Cd 0 S ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ Scalar DimensionsA transformation for an iteration vector d mS d 1 Schedules e.g., (0, i, 0, j) d = 2mS + 1, mS = the size of iteration vector
  • 33. Schedules in general 33 TS ( ! i) = φS 1 ( ! i) φS 2 ( ! i) φS 3 ( ! i) φS 4 ( ! i) " φS d ( ! i) ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ = C11 S C12 S C13 S C14 S " C1mS S C21 S C22 S C23 S C24 S " C2mS S C31 S C32 S C33 S C34 S " C3mS S C41 S C42 S C43 S C44 S " C4mS S # # # # $ # Cd 1 S Cd 2 S Cd 3 S Cd 4 S " CdmS S ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ! i( ) + C10 S C20 S C30 S C40 S ! Cd 0 S ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ Scalar DimensionsA transformation for an iteration vector d mS d 1 Schedules e.g., (0, i, 0, j) d = 2mS + 1, mS = the size of iteration vector Goal: 
 Compute the coefficients and offsets for each statement
  • 34. Legality of transformations q All transformations are valid? NO! 34 for (i = 1; i <= 10; i++) s[i] = ...; // S1 for (j = 0; j < 3; j++) a[i][j] = s[i]; // S2 T_S1(i) = (0, i, 0); T_S2(i,j) = (0, i, 1, j, 0); for (i = 1; i <= 10; i++) for (j = 0; j < 3; j++) a[i][j] = s[i]; // S2 s[i] = ...; // S1 T_S2(i,j) = (0, i, 0, j, 0); T_S2(i) = (0, i, 1); Original New Transforma4on
  • 35. Dependences q Three types of dependence: §  Read-After-Write: (a=1; then b=a;) §  Write-After-Read: (b=a; then a=1;) §  Write-After-Write: (a=1; then a=2;) q Dependence: computed from domain, access, and schedule §  Transformation = Find a new schedule that satisfies all dependences 35
  • 36. Dependence polyhedron q  Dependence polyhedron : a set of inequalities ( ) §  A general and accurate representation of instance-wise dependences 36 for (i = 1; i <= 10; i++) s[i] = ...; // S1 for (j = 0; j < 3; j++) a[i][j] = s[i]; // S2 iS1 = iS2 1 ≤ iS1 ≤ 10, 1 ≤ iS2 ≤ 10; 0 ≤ jS2 < 3; 1 −1 0 0 1 0 0 −1 −1 0 0 10 0 0 1 0 0 0 −1 2 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ iS1 iS2 jS2 1 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ = ≥ 0 DS1 DS2 S1 S2 Credits: Clint (https://www.ozinenko.com/clint) iS1 = iS2 ⇒ iS1 − iS2 ≥ 0 ∧ iS2 − iS1 ≥ 0
  • 37. Legality of transformations q Dependence polyhedron: q Legality: §  §  If “source” instance must happen before “target” instance in the original program, the transformed program must preserve this property (must satisfy the dependence) 37 ∀ s,t ∈ Pe ,(s ∈ DSi ,t ∈ DSj ),TSi (s) ≺ TSj (t) Pe
  • 38. Putting it all together q Goal : Compute all coefficients and offsets such that 38 TS2 (i, j) = C11 S2 C12 S2 C21 S2 C22 S2 C31 S2 C32 S2 C41 S2 C32 S2 C51 S2 C52 S2 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ iS2 jS2 ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ + C10 S2 C20 S2 C30 S2 C40 S2 C50 S2 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ TS1 (i) = C11 S1 C21 S1 C31 S1 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ iS1( ) + C10 S1 C20 S1 C30 S1 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ 1 −1 0 0 1 0 0 −1 −1 0 0 10 0 0 1 0 0 0 −1 2 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ iS1 iS2 jS2 1 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ = ≥ 0 ∀ s,t ∈ Pe ,(s ∈ DS1 ,t ∈ DS2 ),TS1 (s) ≺ TS2 (t) Dependence Polyhedron PeSchedules iS1 = iS2 1 ≤ iS1 ≤ 10, 1 ≤ iS2 ≤ 10; 0 ≤ jS2 < 3;
  • 39. Linearizing the legality condition
 (The Pluto Algorithm) q The Legality condition (for iteration vectors) q Uniform dependences : distance between two dependent iteration is a constant ( is a constant) q  Non-uniform dependences : distance between two dependence varies ( is a function of j ) §  Apply the Farkas lemma 39 δ(s,t) = (c1 Sj ,c2 Sj ,…,cmSj Sj ) ! t − (c1 Si ,c2 Si ,…,cmSi Si ) ! s ≥ 0, s,t ∈ P i → i + 1 ⇒ δ(s,t) i → i + j ⇒ δ(s,t) (c1 Sj ,c2 Sj ,…,cmSj Sj ) ! t − (c1 Si ,c2 Si ,…,cmSi Si ) ! s ≥ 0, s,t ∈ Pe ⇔ (c1 Sj ,c2 Sj ,…,cmSj Sj ) ! t − (c1 Si ,c2 Si ,…,cmSi Si ) ! s ≡ λe0 + λek k =1 me ∑ Pe k , λek ≥ 0 Each inequality in a dependence polyhedron
  • 40. Cost Function & Objective Function
 (The Pluto Algorithm) q Compute all coefficients and offsets under the legality condition : Solve an ILP problem q Cost Function = Transformation policy §  Pluto’s cost function = dependence distance ü Fuse loops as much as possible ü Push loops carrying dependence inner level §  Also used in ISL (Polly, PPCG, …) q Objective Function: §  Iteratively find linearly independent solutions 40 δ(s,t) = (c1 Sj ,c2 Sj ,…,cmSj Sj ) ! t − (c1 Si ,c2 Si ,…,cmSi Si ) ! s, s,t ∈ P minimize ≺ (u1 ,w,c1 Sj ,c2 Sj )
  • 41. Step-by-step example 41 for (i = 0; i < N; i++) { for (j = 1; j < N; j++) { a[i][j] = a[j][i] + a[i][j-1]; // S1 } } a[0][1] = a[1][0] + a[0][0]; // S1(0,1) a[0][2] = a[2][0] + a[0][1]; // S1(0,2) a[0][3] = a[3][0] + a[0][2]; // S1(0,3) ... a[1][1] = a[1][1] + a[1][0]; // S1(1,1) a[1][2] = a[2][1] + a[1][1]; // S1(1,2) a[1][3] = a[3][1] + a[1][2]; // S1(1,3) ... a[2][1] = a[1][2] + a[2][0]; // S1(2,1) a[2][2] = a[2][2] + a[2][1]; // S1(2,2) a[2][3] = a[3][2] + a[2][2]; // S1(2,3) ... a[3][1] = a[1][3] + a[3][0]; // S1(3,1) Dependence 1 (RAW) Dependence 2 (RAW) Dependence 3 (WAR)
  • 42. (is , js ) → (it , jt ) c1 S1 ,c2 S1 ( ) it jt ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ − c1 S1 ,c2 S1 ( ) is js ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ≥ 0, is , js ,it , jt ∈ Pe1 ⇒ c1 S1 it + c2 S1 jt − (c1 S1 is + c2 S1 js ) = c1 S1 it + c2 S1 jt − (c1 S1 it + c2 S1 (jt − 1)) ≥ 0 ⇒ c2 S1 ≥ 0 Step-by-step example:
 Legality Constraints 1 (The Pluto Algorithm) q Dependence 1 : RAW (flow dependence ) 42 Source: a[0][1] = a[1][0] + a[0][0]; // S1(0,1) Target: a[0][2] = a[2][0] + a[0][1]; // S1(0,2) ... Pe1 : is = it , js = jt − 1,0 ≤ it ≤ N − 1,2 ≤ jt ≤ N δ(s,t) = (c1 Sj ,c2 Sj ,…,cmSj Sj ) ! t − (c1 Si ,c2 Si ,…,cmSi Si ) ! s ≥ 0, s,t ∈ PLegality Constraints: Dependence Polyhedron Pe1
  • 43. Step-by-step example:
 Legality Constraints 2 (The Pluto Algorithm) q Dependence 2 : RAW (flow dependence ) 43 Pe2 : is = jt , js = it ,1 ≤ it ≤ N,2 ≤ jt ≤ N,it − jt ≥ 1 c1 S1 ,c2 S1 ( ) it jt ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ − c1 S1 ,c2 S1 ( ) is js ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ≥ 0, is , js ,it , jt ∈ Pe1 ⇒ c1 S1 it + c2 S1 jt − (c1 S1 is + c2 S1 js ) = c1 S1 it + c2 S1 jt − (c1 S1 jt + c2 S1 it ) ≥ 0 ⇒ (c1 S1 − c2 S1 )it + (c2 S1 − c1 S1 )jt ≥ 0,1 ≤ it ≤ N,2 ≤ jt ≤ N,it − jt ≥ 1 δ(s,t) = (c1 Sj ,c2 Sj ,…,cmSj Sj ) ! t − (c1 Si ,c2 Si ,…,cmSi Si ) ! s ≥ 0, s,t ∈ PLegality Constraints: Dependence Polyhedron Pe2 Target: a[1][2] = a[2][1] + a[1][1]; // S1(1,2) ... Source: a[2][1] = a[1][2] + a[2][0]; // S1(2,1) (is , js ) → (it , jt ) Farkas Lemma + Fourier Mozkin c1 S1 − c2 S1 ≥ 0
  • 44. Step-by-step example:
 Putting it all together (The Pluto Algorithm) q Dependence 1 q Dependence 2 & 3 q Avoiding zero vector q Objective Function: 44 c2 S1 ≥ 0,w ≥ c2 S1 c1 S1 − c2 S1 ≥ 0,u1 ≥ 0,u1 ≥ c1 S1 − c2 S1 ,3u1 + w ≥ c1 S1 − c2 S1 c1 S1 + c2 S1 ≥ 1 minimize ≺ (u1 ,w,c1 S1 ,c2 S1 ) → (0,1,1,1) Constraints using parameter N
 that bound the dependence distances Find linearly Independent answer TS1 (i,j) = 1 1 1 0 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟
  • 45. Summary q  The polyhedral transformation = “scheduling (determine the execution order of statements)” q  3 important things: §  Domain: A set of instances for a statement §  Scattering (Scheduling): an instance -> time stamp §  Access: an instance -> array element(s) q  Limitation: Only applicable for Static Control Part (SCoP) in general §  Loop bounds and conditionals are affine functions of the surrounding the loop iterators 45 for (i=1; …){ S1; for (j=1; …) S2; 1 ≤ iS1 ≤ 2; 1 ≤ iS2 ≤ 2; 1 ≤ jS2 ≤ 3; iS1 = iS2 ; InequalitiesProgram Constraints: Cost Function: ILP δe ( ! s, ! t) = φSj ( ! t) − φSi ( ! s) for (i=1; …){ S1; } for (i=1; …) { …; “Synthesized” Code Ci − Cj ≥ 0,!
  • 46. COMPILERS AND TOOLS Introduction to Polyhedral Compilation 46
  • 47. Polyhedral Compilers & Tools q PoCC (The Polyhedral Compiler Collection) §  http://web.cs.ucla.edu/~pouchet/software/pocc/ §  Clan: extract a polyhedral IR from the source code §  Candl: a dependence analyzer §  LetSee: legal transformation space explorer §  PLuTo: an automatic parallelizer and locality optimizer §  CLooG: code generation from the polyhedral IR 47
  • 48. Polyhedral Compilers & Tools q Polly §  http://polly.llvm.org/ §  ISL: Integer Set Library (including code generator) q Clay/Chrole/Clint §  https://www.ozinenko.com/projects §  Clay: “Chunky Loop Alteration wizardrY” §  Chrole: “Recovering high-level syntactic description of the automatically computed polyhedral optimization” §  Clint: “Interactive graphical interface to the manual and compiler-assisted program restructuring in the polyhedral model” 48
  • 50. Further readings q  Fundamentals §  OpenScop Specification ü  http://icps.u-strasbg.fr/people/bastoul/public_html/development/openscop/docs/openscop.html §  ISL ü  https://lirias.kuleuven.be/bitstream/123456789/270231/1/icms2010verdoolaege.pdf q  Pluto algorithm §  U. Bondhugula, “Effective Automatic Parallelization and Locality Optimization Using The Polyhedral Model” (PhD Dissertation, 2010) §  U. Bondhugula, A. Hartono, J. Ramanujam, P. Sadayappan, “A Practical Automatic Polyhedral Parallelizer and Locality Optimizer.” [PLDI’08] q  Polly §  T. Grosser, S. Verdoolaege, A. Cohen, “Polyhedral AST generation is more than scanning polyhedra” [ACM TOPLAS2015] q  Polyhedral model + AST-based Cost Function §  J. Shirako, L.N. Pouchet, V. Sarkar, “Oil and Water Can Mix: An Integration of Polyhedral and AST- based Transformations.” [SC’14] q  GPU Code Generation §  S. Verdoolaege, J.C Juega. A. Cohen, J.I Gomez, C. Tenllado, F. Catthoor, “Polyhedral parallel code generation for CUDA” [ACM TACO2013] §  J. Shirako, A. Hayashi, V. Sarkar., “Optimized Two-level Parallelization for GPU Accelerators using the Polyhedral Model” [CC’17] 50