SlideShare a Scribd company logo
1 of 13
Download to read offline
MILEPOST GCC: machine learning based research compiler
                    Grigori Fursin,              Mircea Namolaru,                     Edwin Bonilla,
                  Cupertino Miranda,              Elad Yom-Tov,                       John Thomson,
                    Olivier Temam                   Ayal Zaks,                         Hugh Leather,
                 INRIA Saclay, France            Bilha Mendelson                      Chris Williams,
                                                 IBM Haifa, Israel                   Michael O’Boyle
                                                                                University of Edinburgh, UK
                        Phil Barnard, Elton Ashton                    Eric Courtois, Francois Bodin
                         ARC International, UK                          CAPS Enterprise, France


                                                                     Contact: grigori.fursin@inria.fr


Abstract                                                             The difficulty of achieving portable compiler perfor-
                                                                     mance has led to iterative compilation [11, 16, 15, 26,
Tuning hardwired compiler optimizations for rapidly                  33, 19, 30, 24, 25, 18, 20, 22] being proposed as a
evolving hardware makes porting an optimizing com-                   means of overcoming the fundamental problem of static
piler for each new platform extremely challenging. Our               modeling. The compiler’s static model is replaced by
radical approach is to develop a modular, extensible,                a search of the space of compilation strategies to find
self-optimizing compiler that automatically learns the               the one which, when executed, best improves the pro-
best optimization heuristics based on the behavior of the            gram. Little or no knowledge of the current platform is
platform. In this paper we describe MILEPOST1 GCC,                   needed so programs can be adapted to different archi-
a machine-learning-based compiler that automatically                 tectures. It is currently used in library generators and
adjusts its optimization heuristics to improve the exe-              by some existing adaptive tools [36, 28, 31, 8, 1, 3].
cution time, code size, or compilation time of specific               However it is largely limited to searching for combina-
programs on different architectures. Our preliminary                 tions of global compiler optimization flags and tweaking
experimental results show that it is possible to consider-           a few fine-grain transformations within relatively nar-
ably reduce execution time of the MiBench benchmark                  row search spaces. The main barrier to its wider use is
suite on a range of platforms entirely automatically.                the currently excessive compilation and execution time
                                                                     needed in order to optimize each program. This prevents
                                                                     its wider adoption in general purpose compilers.
1    Introduction
                                                                     Our approach is to use machine learning which has the
                                                                     potential to reuse knowledge across iterative compila-
Current architectures and compilers continue to evolve
                                                                     tion runs, gaining the benefits of iterative compilation
bringing higher performance, lower power and smaller
                                                                     while reducing the number of executions needed.
size while attempting to keep time to market as short as
possible. Typical systems may now have multiple het-
                                                                     The MILEPOST project’s [4] objective is to develop
erogeneous reconfigurable cores and a great number of
                                                                     compiler technology that can automatically learn how to
compiler optimizations available, making manual com-
                                                                     best optimize programs for configurable heterogeneous
piler tuning increasingly infeasible. Furthermore, static
                                                                     embedded processors using machine learning. It aims to
compilers often fail to produce high-quality code due to
                                                                     dramatically reduce the time to market of configurable
a simplistic model of the underlying hardware.
                                                                     systems. Rather than developing a specialised compiler
    1 MILEPOST    - MachIne Learning for Embedded PrOgramS op-       by hand for each configuration, MILEPOST aims to
Timization [4]                                                       produce optimizing compilers automatically.

                                                                 1
A key goal of the project is to make machine learning          learning tools to correlate aspects of program structure,
based compilation a realistic technology for general-          or features, with optimizations, building a strategy that
purpose compilation. Current approaches [29, 32, 10,           predicts a good combination of optimizations.
14] are highly preliminary; limited to global compiler
flags or simple transformations considered in isolation.        In order to learn a good strategy, machine learning tools
GCC was selected as the compiler infrastructure for            need a large number of compilations and executions as
MILEPOST as it is currently the most stable and ro-            training examples. These training examples are gener-
bust open-source compiler. It supports multiple archi-         ated by a tool, the Continuous Collective Compilation
tectures and has multiple aggressive optimizations mak-        Framework[2] (CCC), which evaluates different compi-
ing it a natural vehicle for our research. In addition,        lation optimizations, storing execution time, code size
each new version usually features new transformations          and other metrics in a database. The features of the pro-
demonstrating the need for a system to automatically re-       gram are extracted from MILEPOST GCC via a plugin
tune its optimization heuristics.                              and are also stored in the database. Plugins allow fine
                                                               grained control and examination of the compiler, driven
In this paper we present early experimental results            externally through shared libraries.
showing that it is possible to improve the performance
of the well-known MiBench [23] benchmark suite on a
range of platforms including x86 and IA64. We ported           Deployment Once sufficient training data is gathered,
our tools to the new ARC GCC 4.2.1 that targets ARC            a model is created using machine learning modeling.
International’s configurable core family. Using MILE-           The model is able to predict good optimization strate-
POST GCC, after a few weeks training, we were able to          gies for a given set of program features and is built as
learn a model that automatically improves the execution        a plugin so that it can be re-inserted into MILEPOST
time of MiBench benchmark by 11% demonstrating the             GCC. On encountering a new program the plugin deter-
use of our machine learning based compiler.                    mines the program’s features, passing them to the model
                                                               which determines the optimizations to be applied.
This paper is organized as follows: the next section de-
scribes the overall MILEPOST framework and is, it-
self, followed by a section detailing our implementation       Framework In this paper we use a new version of the
of the Interactive Compilation Interface for GCC that          Interactive Compilation Interface (ICI) for GCC which
enables dynamic manipulation of optimization passes.           controls the internal optimization decisions and their pa-
Section 4 describes machine learning techniques used           rameters using external plugins. It now allows the com-
to predict good optimization passes for programs us-           plete substitution of default internal optimization heuris-
ing static program features and optimization knowledge         tics as well as the order of transformations.
reuse. Section 5 provides experimental results and is          We use the Continuous Collective Compilation Frame-
followed by concluding remarks.                                work [2] to produce a training set for machine learn-
                                                               ing models to learn how to optimize programs for the
2   MILEPOST Framework                                         best performance, code size, power consumption and
                                                               any other objective function needed by the end-user.
The MILEPOST project uses a number of components,              This framework allows knowledge of the optimization
at the heart of which is the machine learning enabled          space to be reused among different programs, architec-
MILEPOST GCC, shown in Figure 1. MILEPOST                      tures and data sets.
GCC currently proceeds in two distinct phases, in ac-          Together with additional routines needed for machine
cordance with typical machine learning practice: train-        learning, such as program feature extraction, this forms
ing and deployment.                                            the MILEPOST GCC. MILEPOST GCC transforms
                                                               the compiler suite into a powerful research tool suitable
                                                               for adaptive computing.
Training During the training phase we need to gather
information about the structure of programs and record         The next section describes the new ICI structure and ex-
how they behave when compiled under different op-              plains how program features can be extracted for later
timization settings. Such information allows machine           machine learning in Section 4.

                                                           2
MILEPOST GCC                                  CCC
                        Program1            (with ICI and ML routines)               Continuous Collective
                                                                                    Compilation Framework
                                                  IC Plugins
          Training

                                                                                    Drivers for
                           …                       Recording pass                     iterative
                                                     sequences
                                                                                    compilation
                                                                                     and model
                                                   Extracting static                  training
                        ProgramN                   program features




                                                                       MILEPOST GCC
          Deployment




                                                   Extracting static
                                                   program features                      Predicting “good”
                       New program                                                       passes to improve
                                                                                        exec.time, code size
                                                   Selecting “good”
                                                        passes                            and comp. time




Figure 1: Framework to automatically tune programs and improve default optimization heuristics using machine
learning techniques, MILEPOST GCC with Interactive Compilation Interface (ICI) and program features extractor,
and Continuous Collective Compilation Framework to train ML model and predict good optimization passes


3   Interactive Compilation Interface                             3.1 Internal structure


                                                                  To avoid the drawbacks of the first version of the ICI, we
                                                                  designed a new version, as shown in Figure 2. This ver-
This section describes the Interactive Compilation Inter-         sion can now transparently monitor execution of passes
face (ICI). The ICI provides opportunities for external           or replace the GCC Controller (Pass Manager), if de-
control and examination of the compiler. Optimization             sired. Passes can be selected by an external plugin
settings at a fine-grained level, beyond the capabilities          which may choose to drive them in a very different order
of command line options or pragmas, can be managed                to that currently used in GCC, even choosing different
through external shared libraries, leaving the compiler           pass orderings for each and every function in program
uncluttered.                                                      being compiled. Furthermore, the plugin can provide its
                                                                  own passes, implemented entirely outside of GCC.

The first version of ICI [21] was reactive and required            In an additional set of enhancements, a coherent event
minimal changes to GCC. It was, however, unable to                and data passing mechanism enables external plugins to
modify the order of optimization passes within the com-           discover the state of the compiler and to be informed as
piler and so large opportunities for speedup were closed          it changes. At various points in the compilation process
to it. The new version of ICI [9] expands on the ca-              events (IC Event) are raised indicating decisions about
pabilities of its predecessor permitting the pass order to        transformations. Auxiliary data (IC Data) is registered
be modified. This version of ICI is used in the MILE-              if needed.
POST GCC to automatically learn good sequences of
optimization passes. In replacing default optimization            Since plugins now extend GCC through external shared
heuristics, execution time, code size and compilation             libraries, experiments can be built with no further mod-
time can be improved.                                             ifications to the underlying compiler. Modifications for

                                                              3
High-level scripting
                                                                                                 (java, python, etc)
                          GCC                         GCC with ICI

                         Detect                      Detect
                                                                                 ICI                IC Plugins
                    optimization flags          optimization flags
                                                                                                    <Dynamically linked
                                                                                                     shared libraries>
                                                       IC                                            Selecting pass
                                                      Event                                           sequences
                                                                                                           ...
                    GCC Controller         GCC Controller       IC                                  Extracting static
                    (Pass Manager)         (Pass Manager)      Event                                program features

                                                                               Interactive
                                                                              Compilation
                                                                                Interface

                 Pass 1           Pass N     Pass 1            Pass N                                   CCC
                                                                                                Continuous Collective
                                                       ...      IC                             Compilation Framework
                                                               Event

                                                                                                                   ML drivers
                                                                                                                  to optimize
                                                                                                                 programs and
                                                                                                                 tune compiler
                    GCC Data Layer          GCC Data Layer       IC                                               optimization
                   AST, CFG, CF, etc       AST, CFG, CF, etc    Data                                                heuristic




                          (a)                                                        (b)


         Figure 2: GCC Interactive Compilation Interface: a) original GCC, b) GCC with ICI and plugins


different analysis, optimization and monitoring scenar-                    called pass_execution.
ios proceed in a tight engineering environment. These
plugins communicate with external drivers and can al-                      Figure 3c shows simple modifications in GCC Con-
low both high-level scripting and communication with                       troller (Pass Manager) to enable monitoring of exe-
machine learning frameworks such as MILEPOST                               cuted passes. When the GCC Controller function exe-
GCC.                                                                       cute_one_pass is invoked, we register an IC-Parameter
                                                                           called pass_name giving the real name of the executed
Note that it is not the goal of this project to develop                    pass and trigger an IC-Event pass_execution. This in
fully fledged plugin system. Rather, we show the util-                      turn invokes the plugin function executed_pass where
ity of such approaches for iterative compilation and                       we can obtain the current name of the compiled function
machine learning in compilers. We may later utilize                        using ici_get_feature("function_name") and the pass
GCC plugin systems currently in development, for ex-                       name using ici_get_parameter("pass_name").
ample [7] and [13].
                                                                           IC-Features provide read only data about the compila-
Figure 3 shows some of the modifications needed to en-                      tion state. IC-Parameters, on the other hand, can be dy-
able ICI in GCC with an example of a passive plugin                        namically changed by plugins to change the subsequent
to monitor executed passes. The plugin is invoked by                       behavior of the compiler. Such behavior modification is
the new -fici GCC flag or by setting ICI_USE environ-                        demonstrated in the next subsection using an example
ment variable to 1 (to enable non-intrusive optimiza-                      with an avoid_gate parameter needed for dynamic pass
tions without changes to Makefiles). When GCC detects                       manipulation.
these options, it loads a plugin (dynamic library) with a
name specified by ICI_PLUGIN environment variable                           Since we use the name field from the GCC pass struc-
and checks for two functions start and stop as shown in                    ture to identify passes, we have had to ensure that each
Figure 3a.                                                                 pass has a unique name. Previously, some passes have
                                                                           had no name at all and we suggest that in the future
The start function of the example plugin registers an                      a good development practice of always having unique
event handler function executed_pass on an IC-Event                        names would be sensible.

                                                                       4
x v ¦iii u§tsr¦q§pi ¤ £¡  
                                                                                                 ¢              ƒ '  y
                                                                                                                     €            ¦iii u ¤ £¡  
                                                                                                                                               ¢
      ¦© ©¨§¦¥ ¤ £¡  
                         ¢                                         w                                            ‚


                                                                                                                                  ¢ AA 5
      ¡     !¡
                                                                 £3 ¢ !¡ „
                                                                   X                …   c ‰ # b # (ˆ( ‡ £3 ¢ !¡ ‡ …
                                                                                              c ‚    '    X
                                                                                      ‘                         ††                         €
                                                                                                                                  aa$ ‰ #‰b (‰” ‰
                                                                                                                                     d ‚                        ££   X1 
                                                                                                                                                                  1        )     HA    7H2 H
      7£¡   5¢! 4321  ) ' (  %$ #
        ¢    ¡       0     '                                     £3 ¢ !¡ „
                                                                   X              …    ‰($“ ‰b ' ˆ( ‡
                                                                                             c  ‚  '    ’‡   ’‡ ‡ …
                                                                                                                                                                             6         6
                                                                                   ‘                         †† ††               8
           6
                                   8
                                                                                   9                                                     ’ ¡ f¢  ¡ !¡ ¢ AA5
                                                                                                                                          ! ¡ ¡            E H X 3
                                                                                                                                                               Y
                                                                  aa$ % ‰b (‰”‰ 3 A @
                                                                    d     €      ¡               73 A@
                                                                                                   ¡ )                                 6
            9
                                                                  8                                                                   A3A ! ¡ 3£ ’  X
                                                                                                                                                   ! ¡ !      E G F1 £ 
          3 A@
          ¡       E 5DB C 2
                     ¡   B
                                                                                                                                             6
                                                                                                                                      E     £’¢ AA5 ¡  
                                                                                                                                         X
          E G F1 A 1 £ ¢ AA5
                  1                                                               H21 
                                                                                       E £
                                                                                         0Y !                                                  6
                                                                                6
                                                                            Y !
                                                                          E £    !X  21 
                                                                                         0
                                                                                6                                                     FF £’           €
                                                                                                                                                  ) F a b $ba‰b$              £ 1 k 7 D j i
                                                                                                                                                                                  X     D
          RD Q I £¡   5¢! 43 ! £HA 3 F 5DB C
                    ¢     ¡         ¢   ¡    B          E 7UTSD                                                                              hg H                                               ¤
              P               )
                        6                           6                                                                                       E 7 £ ’
                                                                                                                                              )          
               E 7 1 A1 £ 3 1 A  W £
                      1   ¢           F 1 A1 £
                                      V   1                           £H0 ’ A 0  £  £B 2‡
                                                                              X 1        1            7£ X £   3 £ ¡ 5A
                                                                                                          1            !      )                      hg H
                 )                 0
                            6   6
                                                                      ‡21 £3 A H CC• ¢  ¡ ’1 A £@ A 2
                                                                           1               ! ¡
                                                                                                                                      ¡ ¡                    £’ )1 ££
                                                                                                                                                                …       Y  H 1 £ ’£
                                                                                                                                                                          1     €  ¡
                                                                                                                                                                             E 7a b$ba‰b$  I …   
                                                                                                                                                                                        1   l    X
          ¡                ` c    `    ¡   B        F 1  ¡
                         E 7 b $b a I 5DB C )Y 4 3 7 !X )
                                                ¢        
                                                                                                                                      6                    6                   6
                                                           6                                                                          ¡ ¡                 E 7 ‰b$  % #m $ ) ! £@£ ¢¢ 
                                                                                                                                                            `         '   ` 
          £                   E 7 1 A1 £ 3 1 A  W £
                                )    1 ¢           0   F 1 A1
                                                        V  1                     721  ) F £
                                                                                      0     Y !                 !X                   6                                            6
                                          6    6                                                               6                      ¡ ¡   E 7     £’ )1 ££
                                                                                                                                              … X             …      Y  H 1 £ ’£ !X
                                                                                                                                                                        1        ¡   1
          ¡   E 7 d #b a I 5DB C )Y 4 3 7 !X ) F HA ¡
                `      `    ¡      B      ¢                                 ` – $  # b( € “ v ‰ € b$‰“ b‰  (
                                                                          — ƒ ‰  ‚ ‚ '  ‚  `   c              ' '
                                                                                                                                        6               6                   6
                                                           6
          £                   E 7 1 A1 £ 3 1 A  W £
                                )    1 ¢           0   F 1 A1
                                                        V  1
                              6   6                                                                                                   7   £ ’) ¡
                                                                                                                                        X           n    
                                                                                           721  ) F £
                                                                                                0     Y ! H
                                                                                                                                               6
          9                                                                                               6                              E £  !1X £
                                                                                                                                            ¢         1
      e                                                                   — ƒ ‰
                                                                            ` – $ aa$ ` v ‰b‰
                                                                                 ‚    d c     – $ $ b‰  (
                                                                                                  c d     ' '

                                                                           HI £Y !   !X  I… ! ™ ˜  ˜ )  !¡1 H
                                                                                                          …              Y !
                                                                                                                       E 7£ 
                                                                          6            6                                              ƒp % #mv ` ‰
                                                                                                                                         '                       – $ $  ‰ba ‰  (
                                                                                                                                                 – $ aa$ ` v ‰b ‰
                                                                                                                                                   ‚    d c           c d c  '   c ' '
                                                                      e                                                                       o
                                                                                                                                                               — ƒƒ ‰
                                                                                                                                                                    – $ qˆa a$ v
                                                                                                                                                                        ‚     d
                                                                  e
                                                                                                                                      — ƒ ‚ # b (‰”‰a a$ ` vb ‰m‰ $( (
                                                                                                                                        `    ' €         d   ‚         ' '

                                                                                                                                      — ƒ ‰              – $ $  ‰ba ‰ ‚€  (
                                                                                                                                        ` – $ aa$ ` v ‰b‰
                                                                                                                                            ‚    d c       c d c    ' c     ' '
                                                                      b $ba1 
                                                                      c       0    73 A @
                                                                                     ¡  )

                                                                  8
                                                                                                                                       9

                                                                      I`‚ # b (‰”‰aa$ ` v ! £@£ 1 £ ’£ ¡ ¡
                                                                           ' €         d            ¡   1
                                                                                                6          6
                                                                                    E 7aa$  %‰b (‰ ”‰d
                                                                                         d     €

                                                                  e
                                                                                                                                                                              f   h
                                      hg f                                                          hef                                                                      r




Figure 3: Some GCC modifications to enable ICI and an example of a plugin to monitor executed passes:
a) IC Framework within GCC, b) IC Plugin to monitor executed passes, c) GCC Controller (pass manager) modifi-
cation

3.2       Dynamic Manipulation of GCC Passes                                                                 performance to be gained within GCC from such actions
                                                                                                             otherwise there is no point in trying to learn. By using
Previous research shows a great potential to improve                                                         the Continuous Collective Compilation Framework [2]
program execution time or reduce code size by carefully                                                      to random search though the optimization flag space
selecting global compiler flags or transformation param-                                                      (50% probability of selecting each optimization flag)
eters using iterative compilation. The quality of gener-                                                     and MILEPOST GCC 4.2.2 on AMD Athlon64 3700+
ated code can also be improved by selecting different                                                        and Intel Xeon 2800MHz we could improve execution
optimization orders as shown in [16, 15, 17, 26]. Our                                                        time of susan_corners by around 16%, compile time by
approach combine the selection of optimal optimization                                                       22% and code size by 13% using Pareto optimal points
orders and tuning parameters of transformations at the                                                       as described in the previous work [24, 25]. Note, that
same time.                                                                                                   the same combination of flags degrade execution time
                                                                                                             of this benchmark on Itanium-2 1.3GHz by 80% thus
The new version of ICI enables arbitrary selection of le-                                                    demonstrating the importance of adapting compilers to
gal optimization passes and has a mechanism to change                                                        each new architecture. Figure 4a shows the combina-
parameters or transformations within passes. Since                                                           tion of flags found for this benchmark on AMD platform
GCC currently does not provide enough information                                                            while Figures 4b,c show the passes invoked and moni-
about dependencies between passes to detect legal or-                                                        tored by MILEPOST GCC for the default -O3 level and
ders, and the optimization space is too large to check                                                       for the best combination of flags respectively.
all possible combinations, we focused on detecting in-
fluential passes and legal orders of optimizations. We
                                                                                                             Given that there is good performance to be gained
examined the pass orders generated by compiler flags
                                                                                                             by searching for good compiler flags, we now wish
that improved program execution time or code size us-
                                                                                                             to automatically select good optimization passes and
ing iterative compilation.
                                                                                                             transformation parameters. These should enable fine-
Before we attempt to learn good optimization settings                                                        grained program and compiler tuning as well as non-
and pass orders we first confirmed that there is indeed                                                        intrusive continuous program optimizations without

                                                                                                         5
6
a program needed by the model to distinguish between                                               avoid_gate so that we can set gate_status to TRUE
representation and characterize the essential aspects of                                           trol we use IC-Parameter gate_status and IC-Event
tures are typically a summary of the internal program                                              optimization flags is selected. To avoid this gate con-
gram structure or program features. The program fea-                                               (pass-gate()) to execute the pass only if the associate
mization to apply to an input program based on its pro-                                            execute_one_pass shown in Figure 3c has gate control
Our machine learnt model predicts the best GCC opti-                                               Figure 4c. However, note that the GCC internal function
                                                                                                   mark with the -O3 flag but selecting passes shown in
         3.3 Adding program feature extractor pass                                                 pass orders using ICI, we recompiled the same bench-
                                                                                                   To verify that we can change the default optimization
future plan to add support for such cases explicitly.                                              their interaction and dependencies for future work.
pile programs with such flags always enabled, and in the                                            optimizations. However, we leave the determination of
generation in other passes. Thus, at this point we recom-                                          shown in Figure 2b or search for new good orders of
fine-grain transformation parameters and influence code                                              of passes previously found by the CCC Framework as
ated pass such as -funroll-loops but also select specific                                           default GCC Pass Manager and execute good sequences
son is that some compiler flags not only invoke associ-                                             execute_one_pass. Therefore, we can circumvent the
its execution time by 13% instead of 16% and the rea-                                              ici_run_pass function that in turn invokes GCC function
cution of the generated binary shows that we improve                                               of ICI allows passes to be called directly using the
within plugins and thus force its execution. The exe-                                              modifications to Makefiles, etc. The current version
-O3 using ICI, c) recorded compiler passes for the good selection of flags (a)
improve execution and compilation time for susan_corners benchmark over -O3, b) recorded compiler passes for
Figure 4: a) Selection of compiler flags found using CCC Framework with uniform random search strategy that
                                                                                         5e
                                                                                            3                                                7        
                                                                                                                      § © © 8 §¥7 £ ¤ £8  ©  £8  ¥                  §
                                                                                                                          ¤                                        H ( ¦ ©  8© ¤
                           7    (                                                          ¥ §      7                   
                              § © ¦¤7¤§(  ¦§7¤(§(('7¤§ © §7¦¥$7 £ 8 §( £ ¤ © $ ¥7¤ © §$ 8 © $ ¥70¥ © 7¨§¦¥¤7¤§( 8£ §
                                                                                                                                     §                    ¤                   
              7 (''7§((7 §¥7§ ¦§§7                7       7         7       ¨               7             7 7 §( T IDP9U7§  7H  §¨                   7       7     
                            ¢                     ¤¥ H  £ §¤¥ §¤¥ 8  §( ©  ¨ §( ©  §(
                                                                                             ¤             ¤                  Y                 £ ¤         $ % ©¤ § 1 $ §(
            7§¥7§'        7     7        7     7  ¨           7                    7                   7                7          7        7    7      7
                       $ ¥ % § £ §¤¥ `Dd §                8        S S@CEWcbA@@S ¦¥ ©H¤  8                  ©  ( 1 8          ©      8             %§¥ %§¤¥  $ #
       7        7         7         7          7 ' 7  ¥           7                        7       6 7                 ¥ §       7        ¥           §
        ¤ §( 1 ¨§( ¦¤  ¤ 1©                 
                                          ¤( © ¥          ¤       ©  £ 8©           
                                                                                       ©   $ ¥ 8£ 8© § ( ¨   § ¤
                                                                                                         ¤                      © ©
                                                                                                                                         8 £ 8 §( £ ¤§( © ( ©  © ¨ 8 §( £
                                                                                                                                                                     ¤ 
                               7( §    ¥                    7                7         '7 1 7¨§)   7  ¦7 (H( 7
                                                                                                    (                                                7      7           7  ¨
                                   © (       8    ©  £8 (H  §¥ 8             £ ¤0¥                $ ©        ©                    £ §¥¨¨¥ ( 1 ¥ ¤¤§( §                 
         7 ( ¥7  ¥ 17 ¨             ' ¨( ¥§(7  §7¥¥ 7¦¥ H  7 (  ¥7                           7       7  7     7     7        7    7' 7                  §
                           ¤           8         2© $          ¤       © ¤             2       ©             0 ¤ ¤  § ( ¨§ ©(¥ ¤   £  (2 ¥ 8 ( ©              ¤
           7       §         '                         7                                                                           
             ¥¥ 8 ( © 7)¤ # 7 © ¦7  (H( £ ¤7§¥¨ 7¥ ¤¤§(7¤7(¤7(§H 6¥7¦¥7§ £ ( 7¤7 © ¦7§¥¨7( 1  ¦§(§$7 (2 ¥              7
                       ¤
         7 (H( 7§¥¨7§( 7¥¥7                 § 7   7      7        7       ¥     §       7        ¨                                             ¥                    ¥
                     £          £        © ¤ © ( ¤  ¤¤ ¤  ¤ ¤  £8 ¥ 8© ¤§( ¤( 18 §¥ §(§ £ ( ¤                     
                                                                                                                  § 7   © ) $ ©       
                                                                                                                                     87¤§( © ( ©  © ¨ 8©7 £ 6 £
                                                                                                                                                             ¤ 
                                                                                               5 a
                                                                                                  3                                                                                 
                                                                                                                                                                          § © © 8  § ¥
                                                                                                                                                                                ¤
                  7  7              ¥                      §         (                                                       ¥ §     7                  
                         £ ¤  £ 8     ©  £8    H (¦ ©  8© ¤7§ © ¦¤7¤§ ¦§7¤(§(('7¤§ © §7¦¥$7 £ 8 §( £ ¤ © $ ¥7¤ © §$ 8 © $ ¥                             § 
             7          7           7              7 ''7                                                 7               ¨                                               
                                               
               0 ¥  ©  ¨ § ¦ ¥ ¤ ¤ § ( 8£ § (
                      ¤                                        §((7¢§¥7§ ¦§§7¤¥ 7H  £ §¤¥7§¤¥ 8  §( © 7¨ §( © 7§(7§(7§ £ 7H¤ §¨ $
                                                                                                                                   ¤               ¤
                       7                                                                           7                  7               7         7        7       7 UUFAB
                        % © ¤7§ 1 $§ (7§¥7§'$ ¥ 7%§ £7§¤¥7§ ¨ 8                             ¦¥ ©H ¤  8         ©  ( 1 8         © 8               %§¥            `
           7 DU9X7%§ ¥7               7         7           7        7 (            7 ' 7  ¥          7                     7        6 7                 ¥ §
            Y                ¤   $ # ¤ §( 1 ¨§(¦¤  ¤ 1©                 ¤     © ¥ 
                                                                                              ¤         ©  £ 8©         
                                                                                                                           ©   $ ¥ 8£ 8© § ( ¨   § ¤
                                                                                                                                            ¤                     © ©
                                                                                                                                                                           8 £ 8 §( £
      7     ¥                  §    7      §             ¥                                  7              (                                                     
                                                                                        
       §( © ( ©  © ¨ 8 §( £ ( © (  8  ©  £8 (H 7§¥ 8 £ ¤0¥ '7 1 7¨§)$ © 7 A@CA9EWTDGFEDCBA@9T9SQFR7 © ¦7 (H( £
                   ¤ 
      7DUI7§¥¨¨¥7 A@CA9QPA TG@I7( 1 ¥                  7          7  ¨          7 URA@VQ7                       
                                                                                                         7  ¥ 17 ¨        ' ¨( ¥§(7  §7 ¥¥ 7¦¥ H  7 GQS7 (  ¥
                                                            ¤¤§( §                           (  ¥               ¤        8       2© $             ¤   © ¤                   2
      7       7        7  7      7       7       7     7' 7                    §       7       §       7) ' 7  ¦7 (H( 7           7 UI7§¥¨7¥                  7
        ©                0 ¤ ¤  §( ¨§ ©(¥ ¤   £  (2 ¥ 8 ( © ¥¥ 8 ( ©          ¤                 ¤ ¤#     ©                  £ ¤  D               ¤¤§( A@CA9QPA
       TG@ITDGFEDCBA@97                   7        7        6    77   7         7     R7    7  ¦7 A@CA9QPA7 G@I7§¥¨7( 1 ¦§(§ 7  7                           7              7
                                      ¤    (¤ (§H  ¥ ¦¥ § £ ( CSQF ¤  ©                                                                     $  (2 ¥  (H( £ §¥¨
          7§( 7¥¥7 DGFEDCBA@97                     § 7     7       7       7        ¥     §      7          ¨                                         ¥                      ¥
              £                              © ¤ © ( ¤  ¤¤ ¤  ¤ ¤  £8 ¥ 8© ¤§( ¤( 18 §¥ §(§ £ ( ¤                       
                                                                                                                          § 7   © ) $ ©    
                                                                                                                                           87¤§( © ( ©  © ¨ 8©7 £ 6 £
                                                                                                                                                                 ¤ 
                                                                                         54
                                                                                              3         (§ 1          ¥                 (   §§(   §( §§(   (  §§(     
                                                                                                           ¤            © § 1 §§( ©        £  ¤         ©£            ©£ ¤          © £
                                                                                          § ¨ §§(                                                                                  
             §)$ ©        §§( © £ (§          §§( ©        £ $            §§( ©       £ ¤           ©      £ ¤©         
                                                                                                                            ( © $ ¨  §§( ©  £ §$§(2 ¥ §§( ©  £
                           ¥                                                                                               §                 §                
                         ¤  © ( © £ §¥¨§( ¦ © §( © £ (§ ( 
                                     ¤                        ¤                         ¤ 1 ©  ¤        £ ¤               ¤       £ ¤ ¤          ¨§¦¥¤ £ ¤ ¤            ¨§¦¥¤ £
                     '                               ¨                                          ¨ ¨                                     '                                        
               ¤0¥  (§ ¤ §¤ ¨§¦¥¤               £   ¥§¤ ¨§ ¦¥¤ £ ¤ (§                               ¥§¤ ¨§¦¥¤ £ 0¥  (§ © ¨§¦¥¤                       £ ©             (§( £
                     '                                                                                                     '   § )        § 1                    )    
               ¤0¥  (§¨( §( £ § 1 $ §( £ ¤2((                      ¦¥ © £ (
                                                                               § §    £ ¤            §§ £ ¤¥           ¤        $ ©            £    $ (§ ©   ¤ §( § $ ©         £
                                  ((§ ¦                                           ¥                                                                                                  
                                            
                                            © $        £ ¤© 1   £ ¤  ©  £ § £  ¤(§ 1 ¥  £ £ §¤¥   £ ¤ §¤¥   £ ¨ §( (§ ©£  §¤¥ £                            
                                                     §                          §                                   
                     (¨¨ §¥( £  £    (§ £ ¨   £   (§ £ ¨ £ ¤0¥ ' 0¤ §¤¥   £ $ # ¤ (¥   £ §)$ ©  ¨   © ( © ¦¥(' £                             §                  
                                                                                                                                 ¤
                     §(               '     %                                                           ¥                                ¨§                         
                           ©  ¥ ¦¥ (            £       ¤                  £ !¤$ #             £ !¤         © £         £ §¨ ¤ ¤                 © ¨§¦¥¤ £ ¢¡
                                                                                                                                                                        ¤
good and bad optimizations.                                         Each such struct data type may introduce an entity, and
                                                                    its f ields may introduce relations over the entity rep-
The current version of ICI allows invoking auxiliary
                                                                    resenting the including struct data type and the entity
passes that are not a part of default GCC compiler.
                                                                    representing the data type of the f ield. This data is col-
These passes can monitor and profile the compilation
                                                                    lected using ml-feat pass.
process or extract data structures needed to generate pro-
gram features.                                                      In the second stage, we provide a Prolog program defin-
                                                                    ing the features to be computed from the Datalog re-
During the compilation, the program is represented by
                                                                    lational extracted from the compiler internal data struc-
several data structures, implementing the intermediate
                                                                    tures in the first stage. The extract_program_static _fea-
representation (tree-SSA, RTL etc), control flow graph
                                                                    tures plugin invokes a Prolog compiler to execute this
(CFG), def-use chains, the loop hierarchy, etc. The data
                                                                    program, the result being the vector of features shown
structures available depend on the compilation pass cur-
                                                                    in Table 1 that can later be used by the Continuous Col-
rently being performed. For statistical machine learn-
                                                                    lective Compilation Framework to build machine learn-
ing, the information about these data structures is en-
                                                                    ing models and predict best sequences of passes for new
coded as a vector of constant size of numbers (i.e fea-
                                                                    programs. This example shows the flexibility and capa-
tures) - this process is called feature extraction and is
                                                                    bilities of the new version of ICI.
needed to enable optimization knowledge reuse among
different programs.
                                                                    4 Using Machine Learning to Select Good Op-
Therefore, we implemented an additional GCC pass ml-
                                                                      timization Passes
feat to extract static program features. This pass is not
invoked during default compilation but can be called us-
                                                                    The previous sections have described the infrastructure
ing a extract_program_static_features plugin after any
                                                                    necessary to build a learning compiler. In this section
arbitrary pass starting from FRE when all the GCC data
                                                                    we describe how this infrastructure is used in building a
necessary to produce features is ready.
                                                                    model.
In the MILEPOST GCC, the feature extraction is per-
formed in two stages. In the first stage, a relational rep-          Our approach to selecting good passes for programs is
resentation of the program is extracted; in the second              based upon the construction of a probabilistic model on
stage, the vector of features is computed from this rep-            a set of M training programs and the use of this model in
resentation.                                                        order to make predictions of “good” optimization passes
                                                                    on unseen programs.
In the first stage, the program is considered to be charac-
terized by a number of entities and relations over these            Our specific machine learning method is similar to that
entities. The entities are a direct mapping of similar en-          of [10] where a probability distribution over “good” so-
tities defined by the language reference, or generated               lutions (i.e. optimization passes or compiler flags) is
during the compilation. Such examples of entities are               learnt across different programs. This approach has
variables, types, instructions, basic blocks, temporary             been referred in the literature to as Predictive Search
variables, etc.                                                     Distributions (PSD) [12]. However, unlike [10, 12]
                                                                    where such a distribution is used to focus the search of
A relation over a set of entities is a subset of their Carte-       compiler optimizations on a new program, we use the
sian product. The relations specify properties of the en-           distribution learned to make one-shot predictions on un-
tities or the connections between them. For describing              seen programs. Thus we do not search for the best opti-
the relations we used a notation based on logic - Datalog           mization, we automatically predict it.
is a Prolog-like language but with a simpler semantics,
suitable for expressing relations and operations between            4.1 The Machine Learning Model
them [35, 34]
For extracting the relational representation of the pro-            Given a set of training programs T 1 , . . . , T M , which can
gram, we used a simple method based on the examina-                 be described by (vectors of) features t1 . . . , tM , and for
tion of the include files. The compiler main data struc-             which we have evaluated different sequences of opti-
tures are struct data types, having a number of f ields.            mization passes (x) and their corresponding execution

                                                                7
times (or speed-ups y) so that we have for each pro-             Let us denote the distribution over good solutions on
                                                    j
gram M j an associated dataset D j = {(xi , yi )}N , with
                                                  i=1            each training program by P(x|T j ) with j = 1, . . . , M. In
 j = 1, . . . M, our goal is to predict a good sequence of       principle, these distributions can belong to any paramet-
optimization passes x∗ when a new program T ∗ is pre-            ric family. However, In our experiments we use an IID
sented.                                                          model where each of the elements of the sequence are
                                                                 considered independently. In other words, the probabil-
We approach this problem by learning the mapping from            ity of a “good” sequence of passes is simply the product
the features of a program t to a distribution over good          of each of the individual probabilities corresponding to
solutions q(x|t, θ ), where θ are the parameters of the          how likely each pass is to belong to a good solution:
distribution. Once this distribution has been learnt, pre-
dictions on a new program T ∗ is straightforward and it is                                     L
achieved by sampling at the mode of the distribution. In                          P(x|T j ) = ∏ P(x |T j ),              (2)
                                                                                              =1
other words, we obtain the predicted sequence of passes
by computing:                                                    where L is the length of the sequence.
                 x∗ = argmax q(x|t, θ ).              (1)
                          x                                      As proposed in [12], once the individual training distri-
                                                                 butions P(x|T j ) have been obtained, the predictive dis-
4.2   Continuous Collective Compilation Frame-                   tribution q(x|t, θ ) can be learnt by maximization of the
      work                                                       conditional likelihood or by using k-nearest neighbor
                                                                 methods. In our experiments we use a 1-nearest neigh-
We used Continuous Collective Compilation Frame-                 bor approach. In other words, we set the predictive dis-
work [2] and MILEPOST GCC shown in Figure 1 to                   tribution q(x|t, θ ) to be the distribution corresponding
generate a training set of programs together with com-           to the training program that is closest in feature space to
piler flags selected uniformly at random, associated se-          the new (test) program.
quences of passes, program features and speedups (code
size, compilation time) that is stored in the externally         Note that we currently predict “good” sequences of opti-
accessible Global Optimization Database. We use this             mization passes that are associated to the best combina-
training set to build machine learning model described           tion of compiler flags. We will investigate in future work
in the next section which in turn is used to predict the         the application of our models to the general problem of
best sequence of passes for a new program given its              determining “optimal” order of optimization passes for
feature vector. Current version of CCC Framework re-             programs.
quires minimal changes to the Makefile to pass opti-
mization flags or sequences of passes and has a support
to verify the correctness of the binary by comparing pro-
                                                                 5 Experiments
gram output with the reference one to avoid illegal com-
binations of optimizations.                                      We performed our experiments on four different plat-
                                                                 forms:
4.3   Learning and Predicting

In order to learn the model it is necessary to fit a dis-           • AMD – a cluster with 16 AMD Athlon 64 3700+
tribution over good solutions to each training program               processors running at 2.4GHz
beforehand. These solutions can be obtained, for ex-
                                                                   • IA32 – a cluster with 4 Intel Xeon processors run-
ample, by using uniform sampling or by running an es-
                                                                     ning at 2.8GHz
timation of distribution algorithm (EDA, see [27] for
an overview) on each of the training programs. In our              • IA64 – a server with Itanium2 processor running at
experiments we use uniform sampling and we choose                    1.3GHz
the set of good solutions to be those optimization set-
tings that achieve at least 98% of the maximum speed-              • ARC – FPGA implementation of the ARC 725D
up available in the corresponding program-dependent                  processor running GNU/Linux with a 2.4.29 ker-
dataset.                                                             nel.

                                                             8
2.24                                                   2.31 2.31                                               1.97
                      1.6
                      1.5



            speedup
                      1.4
                      1.3
                      1.2
                      1.1
                       1




                                                                                                                                                                 gsm
                                 n_c

                                            n_e

                                                       n_s

                                                                   _c

                                                                              _d

                                                                                         tra

                                                                                                    cia




                                                                                                                     d
                                                                                                                 ael_e

                                                                                                                                 sha

                                                                                                                                         _c

                                                                                                                                                 _d

                                                                                                                                                            32



                                                                                                                                                                              t1
                                 unt




                                                                                                                     d

                                                                                                                     e




                                                                                                                                                                                        h1
                                                                                                               fish_

                                                                                                                ael_
                                                                                                               fish_




                                                                                                                                                                          qsor
                                                               jpeg




                                                                                                                                                         CRC
                                                                                                                                           m
                                                                          jpeg




                                                                                                                                                   m




                                                                                                                                                                                    earc
                                                                                    dijks

                                                                                               patri
                            bitco

                            susa

                                       susa

                                                  susa




                                                                                                                                       adpc

                                                                                                                                               adpc
                                                                                                           rijnd
                                                                                                           rijnd
                                                                                                          blow
                                                                                                          blow




                                                                                                                                                                                      gs
                                                                                                                                                                                 strin
                                                                                                          AMD      IA32   IA64

                                                                                                             (a)

                      1.4
                      1.3
            speedup




                      1.2
                      1.1
                       1
                      0.9
                      0.8




                                                                                                                                                                              age
                                                                                                                                                           gsm
                                 n_c

                                 n_e

                                 n_s

                                                                 _c

                                                                            _d

                                                                                      tra

                                                                                                     icia




                                                                                                        d

                                                                                                        e
                                                                                                                                 sha

                                                                                                                                 m_c

                                                                                                                                m_d

                                                                                                                                                    32



                                                                                                                                                                     t1
                                 unt




                                                                                                        d

                                                                                                        e




                                                                                                                                                                               h1
                                                                                                   ael_
                                                                                                 fish_

                                                                                                  ael_
                                                                                                 fish_




                                                                                                                                                                 qsor
                                                             jpeg




                                                                                                                                                 CRC
                                                                        jpeg




                                                                                                                                                                           earc
                                                                                 dijks
                            bitco




                                                                                                patr




                                                                                                                                                                          Aver
                            susa

                            susa

                            susa




                                                                                                                            adpc

                                                                                                                            adpc
                                                                                             rijnd
                                                                                             rijnd
                                                                                            blow
                                                                                            blow




                                                                                                                                                                            gs
                                                                                                                                                                       strin
                                       Iterative compilation                                Predicted optimization passes using ML and MILEPOST GCC

                                                                                                             (b)

Figure 5: Experimental results when using iterative compilation with random search strategy (500 iterations; 50%
probability to select each flags; AMD,IA32,IA64) and when predicting best optimization passes based on program
features and ML model (ARC)


In all case the compiler used is MILEPOST GCC 4.2.x                                                                sufficient information can be gleaned from this to allow
with the ICI version 0.9.6 We decided to use open-                                                                 significant speedup. Indeed, the size of the space left
source MiBench benchmark with MiDataSets [20, 6]                                                                   unexplored serves to highlight our lack of knowledge in
(dataset No1 in all cases) due to its applicability to both                                                        this area, and the need for further work.
general purpose and embedded domains.
                                                                                                                   Firstly, features for each benchmark were extracted
5.1   Generating Training Data                                                                                     from programs using the new pass within MILEPOST
                                                                                                                   GCC, and these features then sent to the Global Op-
In order to build a machine learning model, we need                                                                timization Database within CCC Framework. An ML
training data. This was generated by a random explo-                                                               model for each benchmark was built, using the execu-
ration of a vast optimization search space using the CCC                                                           tion time gathered from 500 separate runs using differ-
Framework. It generated 500 random sequences of flags                                                               ent random sequences of passes, and a fixed data set.
either turned on or off. These flag settings can be read-                                                           Each run was repeated 5 times so speedups were not
ily associated with different sequences of optimization                                                            caused by cache priming etc.). After each run, the ex-
passes. Although such a number of runs is very small                                                               perimental results including execution time, compila-
in respect to the optimisation space, we have shown that                                                           tion time, code size and program features are sent to the

                                                                                                              9
database where they are stored for future reference.               6 Conclusions and Future Work
Figure 5a shows that considerable speedups can be al-              In this paper we have shown that MILEPOST GCC has
ready obtained after iterative compilation on all plat-            significant potential in the automatic tuning of GCC op-
forms. However, this is a time-consuming process and               timization. We plan to use these techniques and tools
different speedups across different platforms motivates            to further investigate the automatic selection of opti-
the use of machine learning to automatically build spe-            mal orders of optimization passes and fine-grain tun-
cialized compilers and predict the best optimization               ing of transformation parameters. The overall frame-
flags or sequences of passes for different architectures.           work will also allow the analysis of interactions be-
                                                                   tween optimizations and investigation of the influence
5.2   Evaluating Model Performance                                 of program inputs and run-time state on program opti-
                                                                   mizations. Future work will also include fine-grain run-
Once a model has been built for each of our bench-                 time adaptation for multiple program inputs on hetero-
marks, we can evaluate the results by introducing a                geneous multi-core architectures.
new program to the system, and measuring how well
the prediction provided by our model performs. In this             7 Acknowledgments
work we did not use a separate testing benchmark suite
due to time constraints, so instead leave-one-out-cross-           The authors would like to thank Sebastian Pop for the
validation was used. Using this method, all training data          help with modifying and debugging GCC and John
relating to the benchmark being tested is excluded from            W.Lockhart for the help with editing this paper. This
the training process, and the models rebuilt. When a sec-          research is supported by the European Project MILE-
ond benchmark is tested, the training data pertaining to           POST (MachIne Learning for Embedded PrOgramS
the first benchmark is returned to the training set, that of        opTimization) [4] and by the HiPEAC European Net-
the second benchmark excluded, and so on. In this way              work of Excellence (High-Performance Embedded Ar-
we ensure that each benchmark is tested as a new pro-              chitecture and Compilation) [5].
gram entering the system for the first time—of course,
in real-world usage, this process is unnecessary.
                                                                   References
When a new program is compiled, features are first gen-
erated using MILEPOST GCC. These features are then                  [1] ACOVEA: Using Natural Selection to Investigate
sent to our ML model within CCC Framework (imple-                       Software Complexities. http://www.
mented as a MATLAB server), which processes them                        coyotegulch.com/products/acovea.
and returns a predicted sequence of passes which should             [2] Continuous Collective Compilation Framework.
either improve execution time or reduce code size or                    http://cccpf.sourceforge.net.
both. We then evaluate the prediction by compiling the
program with the suggested sequence of passes, mea-                 [3] ESTO: Expert System for Tuning Optimizations.
sure the execution time and compare with the origi-                     http:
nal time for the default ’-O3’ optimization level. It                   //www.haifa.ibm.com/projects/
is important to note that only one compilation occurs                   systems/cot/esto/index.html.
at evaluation—there is no search involved. Figure 5b
                                                                    [4] EU Milepost project (MachIne Learning for
shows these results for the ARC725D. It demonstrates
                                                                        Embedded PrOgramS opTimization).
that except a few pathological cases where predicted
                                                                        http://www.milepost.eu.
flags degraded performance and which analysis we leave
for future work, using CCC Framework, MILEPOST                      [5] European Network of Excellence on
GCC and Machine Learning Models we can improve                          High-Performance Embedded Architecture and
original ARC GCC by around 11%.                                         Compilation (HiPEAC).
                                                                        http://www.hipeac.net.
This suggests that our techniques and tools can be effi-
cient to build future iterative adaptive specialized com-           [6] MiDataSets SourceForge development site.
pilers.                                                                 http://midatasets.sourceforge.net.

                                                              10
[7] Plugin project.                                              Languages, Compilers, and Tools for Embedded
     http://libplugin.sourceforge.net.                            Systems (LCTES), pages 1–9, 1999.

 [8] QLogic PathScale EKOPath Compilers.                     [17] B. Elliston. Studying optimisation sequences in
     http://www.pathscale.com.                                    the gnu compiler collection. Technical Report
                                                                  ZITE8199, School of Information Technology
 [9] GCC Interactive Compilation Interface.                       and Electical Engineering, Australian Defence
     http://gcc-ici.sourceforge.net,                              Force Academy, 2005.
     2006.
                                                             [18] B. Franke, M. O’Boyle, J. Thomson, and
[10] F. Agakov, E. Bonilla, J.Cavazos, B.Franke,                  G. Fursin. Probabilistic source-level optimisation
     G. Fursin, M. O’Boyle, J. Thomson,                           of embedded programs. In Proceedings of the
     M. Toussaint, and C. Williams. Using machine                 Conference on Languages, Compilers, and Tools
     learning to focus iterative optimization. In                 for Embedded Systems (LCTES), 2005.
     Proceedings of the International Symposium on
     Code Generation and Optimization (CGO), 2006.           [19] G. Fursin. Iterative Compilation and Performance
                                                                  Prediction for Numerical Applications. PhD
[11] F. Bodin, T. Kisuki, P. Knijnenburg, M. O’Boyle,             thesis, University of Edinburgh, United Kingdom,
     and E. Rohou. Iterative compilation in a                     2004.
     non-linear optimisation space. In Proceedings of
     the Workshop on Profile and Feedback Directed            [20] G. Fursin, J. Cavazos, M. O’Boyle, and
     Compilation, 1998.                                           O. Temam. Midatasets: Creating the conditions
                                                                  for a more realistic evaluation of iterative
[12] E. V. Bonilla, C. K. I. Williams, F. V. Agakov,              optimization. In Proceedings of the International
     J. Cavazos, J. Thomson, and M. F. P. O’Boyle.                Conference on High Performance Embedded
     Predictive search distributions. In W. W. Cohen              Architectures  Compilers (HiPEAC 2007),
     and A. Moore, editors, Proceedings of the 23rd               January 2007.
     International Conference on Machine learning,
     pages 121–128, New York, NY, USA, 2006.                 [21] G. Fursin and A. Cohen. Building a practical
     ACM.                                                         iterative interactive compiler. In 1st Workshop on
                                                                  Statistical and Machine Learning Approaches
[13] S. Callanan, D. J. Dean, and E. Zadok. Extending             Applied to Architectures and Compilation
     gcc with modular gimple optimizations. In                    (SMART’07), colocated with HiPEAC 2007
     Proceedings of the GCC Developers’                           conference, January 2007.
     Summit’2007, 2007.
                                                             [22] G. Fursin, A. Cohen, M. O’Boyle, and O. Temam.
[14] J. Cavazos, G. Fursin, F. Agakov, E. Bonilla,                A practical method for quickly evaluating
     M. O’Boyle, and O. Temam. Rapidly selecting                  program optimizations. In Proceedings of the
     good compiler optimizations using performance                International Conference on High Performance
     counters. In Proceedings of the 5th Annual                   Embedded Architectures  Compilers (HiPEAC
     International Symposium on Code Generation                   2005), pages 29–46, November 2005.
     and Optimization (CGO), March 2007.
                                                             [23] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M.
[15] K. Cooper, A. Grosul, T. Harvey, S. Reeves,                  Austin, T. Mudge, and R. B. Brown. Mibench: A
     D. Subramanian, L. Torczon, and T. Waterman.                 free, commercially representative embedded
     ACME: adaptive compilation made efficient. In                 benchmark suite. In IEEE 4th Annual Workshop
     Proceedings of the Conference on Languages,                  on Workload Characterization, Austin, TX,
     Compilers, and Tools for Embedded Systems                    December 2001.
     (LCTES), 2005.
                                                             [24] K. Heydemann and F. Bodin. Iterative
[16] K. Cooper, P. Schielke, and D. Subramanian.                  compilation for two antagonistic criteria:
     Optimizing for reduced code space using genetic              Application to code size and performance. In
     algorithms. In Proceedings of the Conference on              Proceedings of the 4th Workshop on

                                                        11
Optimizations for DSP and Embedded Systems,                   optimization-space exploration. In Proceedings of
     colocated with CGO, 2006.                                     the International Symposium on Code Generation
                                                                   and Optimization (CGO), pages 204–215, 2003.
[25] K. Hoste and L. Eeckhout. Cole: Compiler
     optimization level exploration. In Proceedings of        [34] J. Ullman. Principles of database and knowledge
     the International Symposium on Code Generation                systems. Computer Science Press, 1, 1988.
     and Optimization (CGO), 2008.
                                                              [35] J. Whaley and M. S. Lam. Cloning based context
[26] P. Kulkarni, W. Zhao, H. Moon, K. Cho,                        sensitive pointer alias analysis using binary
     D. Whalley, J. Davidson, M. Bailey, Y. Paek, and              decision diagrams. In Proceedings of the
     K. Gallivan. Finding effective optimization phase             Conference on Programming Language Design
     sequences. In Proc. Languages, Compilers, and                 and Implementation (PLDI), 2004.
     Tools for Embedded Systems (LCTES), pages
     12–23, 2003.                                             [36] R. Whaley and J. Dongarra. Automatically tuned
                                                                   linear algebra software. In Proc. Alliance, 1998.
[27] P. Larrañaga and J. A. Lozano. Estimation of
     Distribution Algorithms: A New Tool for
     Evolutionary Computation. Kluwer Academic
     Publishers, Norwell, MA, USA, 2001.

[28] F. Matteo and S. Johnson. FFTW: An adaptive
     software architecture for the FFT. In Proceedings
     of the IEEE International Conference on
     Acoustics, Speech, and Signal Processing,
     volume 3, pages 1381–1384, Seattle, WA, May
     1998.

[29] A. Monsifrot, F. Bodin, and R. Quiniou. A
     machine learning approach to automatic
     production of compiler heuristics. In Proceedings
     of the International Conference on Artificial
     Intelligence: Methodology, Systems, Applications,
     LNCS 2443, pages 41–50, 2002.

[30] Z. Pan and R. Eigenmann. Fast and effective
     orchestration of compiler optimizations for
     automatic performance tuning. In Proceedings of
     the International Symposium on Code Generation
     and Optimization (CGO), pages 319–332, 2006.

[31] B. Singer and M. Veloso. Learning to predict
     performance from formula modeling and training
     data. In Proceedings of the Conference on
     Machine Learning, 2000.

[32] M. Stephenson and S. Amarasinghe. Predicting
     unroll factors using supervised classification. In
     Proceedings of International Symposium on Code
     Generation and Optimization (CGO), pages
     123–134, 2005.

[33] S. Triantafyllis, M. Vachharajani,
     N. Vachharajani, and D. August. Compiler

                                                         12
MILEPOST GCC: machine learning based research compiler

More Related Content

Viewers also liked

Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
슬라이드 1
슬라이드 1슬라이드 1
슬라이드 1butest
 
MIS 542 Syllabus 08.doc
MIS 542 Syllabus 08.docMIS 542 Syllabus 08.doc
MIS 542 Syllabus 08.docbutest
 
Designed by Identity MLP
Designed by Identity MLP Designed by Identity MLP
Designed by Identity MLP butest
 
Julie Acker, M.S.W., CMHA Lambton Julie Acker holds a Masters ...
Julie Acker, M.S.W., CMHA Lambton Julie Acker holds a Masters ...Julie Acker, M.S.W., CMHA Lambton Julie Acker holds a Masters ...
Julie Acker, M.S.W., CMHA Lambton Julie Acker holds a Masters ...butest
 
Basic Notions of Learning, Introduction to Learning ...
Basic Notions of Learning, Introduction to Learning ...Basic Notions of Learning, Introduction to Learning ...
Basic Notions of Learning, Introduction to Learning ...butest
 
Machine Learning Based Botnet Detection
Machine Learning Based Botnet DetectionMachine Learning Based Botnet Detection
Machine Learning Based Botnet Detectionbutest
 

Viewers also liked (7)

Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
슬라이드 1
슬라이드 1슬라이드 1
슬라이드 1
 
MIS 542 Syllabus 08.doc
MIS 542 Syllabus 08.docMIS 542 Syllabus 08.doc
MIS 542 Syllabus 08.doc
 
Designed by Identity MLP
Designed by Identity MLP Designed by Identity MLP
Designed by Identity MLP
 
Julie Acker, M.S.W., CMHA Lambton Julie Acker holds a Masters ...
Julie Acker, M.S.W., CMHA Lambton Julie Acker holds a Masters ...Julie Acker, M.S.W., CMHA Lambton Julie Acker holds a Masters ...
Julie Acker, M.S.W., CMHA Lambton Julie Acker holds a Masters ...
 
Basic Notions of Learning, Introduction to Learning ...
Basic Notions of Learning, Introduction to Learning ...Basic Notions of Learning, Introduction to Learning ...
Basic Notions of Learning, Introduction to Learning ...
 
Machine Learning Based Botnet Detection
Machine Learning Based Botnet DetectionMachine Learning Based Botnet Detection
Machine Learning Based Botnet Detection
 

Similar to MILEPOST GCC: machine learning based research compiler

The Impact of Compiler Auto-Optimisation on Arm-based HPC Microarchitectures
The Impact of Compiler Auto-Optimisation on Arm-based HPC MicroarchitecturesThe Impact of Compiler Auto-Optimisation on Arm-based HPC Microarchitectures
The Impact of Compiler Auto-Optimisation on Arm-based HPC MicroarchitecturesNECST Lab @ Politecnico di Milano
 
Developing Real-Time Systems on Application Processors
Developing Real-Time Systems on Application ProcessorsDeveloping Real-Time Systems on Application Processors
Developing Real-Time Systems on Application ProcessorsToradex
 
Collective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the massesCollective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the massesGrigori Fursin
 
Advanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPCAdvanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPCIJSRD
 
An Optimising Compiler For Generated Tiny Virtual Machines
An Optimising Compiler For Generated Tiny Virtual MachinesAn Optimising Compiler For Generated Tiny Virtual Machines
An Optimising Compiler For Generated Tiny Virtual MachinesLeslie Schulte
 
A REVIEW ON PARALLEL COMPUTING
A REVIEW ON PARALLEL COMPUTINGA REVIEW ON PARALLEL COMPUTING
A REVIEW ON PARALLEL COMPUTINGAmy Roman
 
Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]Atmel Corporation
 
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowKostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowIT Arena
 
186 devlin p-poster(2)
186 devlin p-poster(2)186 devlin p-poster(2)
186 devlin p-poster(2)vaidehi87
 
Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2Marcirio Chaves
 
CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...Grigori Fursin
 
A FRAMEWORK STUDIO FOR COMPONENT REUSABILITY
A FRAMEWORK STUDIO FOR COMPONENT REUSABILITYA FRAMEWORK STUDIO FOR COMPONENT REUSABILITY
A FRAMEWORK STUDIO FOR COMPONENT REUSABILITYcscpconf
 
Concurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsConcurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsCSCJournals
 
just in time JIT compiler
just in time JIT compilerjust in time JIT compiler
just in time JIT compilerMohit kumar
 
IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code OptimizationIRJET Journal
 
Deepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningDeepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningIRJET Journal
 

Similar to MILEPOST GCC: machine learning based research compiler (20)

The Impact of Compiler Auto-Optimisation on Arm-based HPC Microarchitectures
The Impact of Compiler Auto-Optimisation on Arm-based HPC MicroarchitecturesThe Impact of Compiler Auto-Optimisation on Arm-based HPC Microarchitectures
The Impact of Compiler Auto-Optimisation on Arm-based HPC Microarchitectures
 
Developing Real-Time Systems on Application Processors
Developing Real-Time Systems on Application ProcessorsDeveloping Real-Time Systems on Application Processors
Developing Real-Time Systems on Application Processors
 
Collective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the massesCollective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the masses
 
Advanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPCAdvanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPC
 
An Optimising Compiler For Generated Tiny Virtual Machines
An Optimising Compiler For Generated Tiny Virtual MachinesAn Optimising Compiler For Generated Tiny Virtual Machines
An Optimising Compiler For Generated Tiny Virtual Machines
 
A REVIEW ON PARALLEL COMPUTING
A REVIEW ON PARALLEL COMPUTINGA REVIEW ON PARALLEL COMPUTING
A REVIEW ON PARALLEL COMPUTING
 
Test PDF file
Test PDF fileTest PDF file
Test PDF file
 
Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
 
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowKostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
 
01 overview
01 overview01 overview
01 overview
 
01 overview
01 overview01 overview
01 overview
 
186 devlin p-poster(2)
186 devlin p-poster(2)186 devlin p-poster(2)
186 devlin p-poster(2)
 
Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2
 
CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...
 
A FRAMEWORK STUDIO FOR COMPONENT REUSABILITY
A FRAMEWORK STUDIO FOR COMPONENT REUSABILITYA FRAMEWORK STUDIO FOR COMPONENT REUSABILITY
A FRAMEWORK STUDIO FOR COMPONENT REUSABILITY
 
Concurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsConcurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core Processors
 
just in time JIT compiler
just in time JIT compilerjust in time JIT compiler
just in time JIT compiler
 
IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code Optimization
 
VIRTUAL LAB
VIRTUAL LABVIRTUAL LAB
VIRTUAL LAB
 
Deepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningDeepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine Learning
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

MILEPOST GCC: machine learning based research compiler

  • 1. MILEPOST GCC: machine learning based research compiler Grigori Fursin, Mircea Namolaru, Edwin Bonilla, Cupertino Miranda, Elad Yom-Tov, John Thomson, Olivier Temam Ayal Zaks, Hugh Leather, INRIA Saclay, France Bilha Mendelson Chris Williams, IBM Haifa, Israel Michael O’Boyle University of Edinburgh, UK Phil Barnard, Elton Ashton Eric Courtois, Francois Bodin ARC International, UK CAPS Enterprise, France Contact: grigori.fursin@inria.fr Abstract The difficulty of achieving portable compiler perfor- mance has led to iterative compilation [11, 16, 15, 26, Tuning hardwired compiler optimizations for rapidly 33, 19, 30, 24, 25, 18, 20, 22] being proposed as a evolving hardware makes porting an optimizing com- means of overcoming the fundamental problem of static piler for each new platform extremely challenging. Our modeling. The compiler’s static model is replaced by radical approach is to develop a modular, extensible, a search of the space of compilation strategies to find self-optimizing compiler that automatically learns the the one which, when executed, best improves the pro- best optimization heuristics based on the behavior of the gram. Little or no knowledge of the current platform is platform. In this paper we describe MILEPOST1 GCC, needed so programs can be adapted to different archi- a machine-learning-based compiler that automatically tectures. It is currently used in library generators and adjusts its optimization heuristics to improve the exe- by some existing adaptive tools [36, 28, 31, 8, 1, 3]. cution time, code size, or compilation time of specific However it is largely limited to searching for combina- programs on different architectures. Our preliminary tions of global compiler optimization flags and tweaking experimental results show that it is possible to consider- a few fine-grain transformations within relatively nar- ably reduce execution time of the MiBench benchmark row search spaces. The main barrier to its wider use is suite on a range of platforms entirely automatically. the currently excessive compilation and execution time needed in order to optimize each program. This prevents its wider adoption in general purpose compilers. 1 Introduction Our approach is to use machine learning which has the potential to reuse knowledge across iterative compila- Current architectures and compilers continue to evolve tion runs, gaining the benefits of iterative compilation bringing higher performance, lower power and smaller while reducing the number of executions needed. size while attempting to keep time to market as short as possible. Typical systems may now have multiple het- The MILEPOST project’s [4] objective is to develop erogeneous reconfigurable cores and a great number of compiler technology that can automatically learn how to compiler optimizations available, making manual com- best optimize programs for configurable heterogeneous piler tuning increasingly infeasible. Furthermore, static embedded processors using machine learning. It aims to compilers often fail to produce high-quality code due to dramatically reduce the time to market of configurable a simplistic model of the underlying hardware. systems. Rather than developing a specialised compiler 1 MILEPOST - MachIne Learning for Embedded PrOgramS op- by hand for each configuration, MILEPOST aims to Timization [4] produce optimizing compilers automatically. 1
  • 2. A key goal of the project is to make machine learning learning tools to correlate aspects of program structure, based compilation a realistic technology for general- or features, with optimizations, building a strategy that purpose compilation. Current approaches [29, 32, 10, predicts a good combination of optimizations. 14] are highly preliminary; limited to global compiler flags or simple transformations considered in isolation. In order to learn a good strategy, machine learning tools GCC was selected as the compiler infrastructure for need a large number of compilations and executions as MILEPOST as it is currently the most stable and ro- training examples. These training examples are gener- bust open-source compiler. It supports multiple archi- ated by a tool, the Continuous Collective Compilation tectures and has multiple aggressive optimizations mak- Framework[2] (CCC), which evaluates different compi- ing it a natural vehicle for our research. In addition, lation optimizations, storing execution time, code size each new version usually features new transformations and other metrics in a database. The features of the pro- demonstrating the need for a system to automatically re- gram are extracted from MILEPOST GCC via a plugin tune its optimization heuristics. and are also stored in the database. Plugins allow fine grained control and examination of the compiler, driven In this paper we present early experimental results externally through shared libraries. showing that it is possible to improve the performance of the well-known MiBench [23] benchmark suite on a range of platforms including x86 and IA64. We ported Deployment Once sufficient training data is gathered, our tools to the new ARC GCC 4.2.1 that targets ARC a model is created using machine learning modeling. International’s configurable core family. Using MILE- The model is able to predict good optimization strate- POST GCC, after a few weeks training, we were able to gies for a given set of program features and is built as learn a model that automatically improves the execution a plugin so that it can be re-inserted into MILEPOST time of MiBench benchmark by 11% demonstrating the GCC. On encountering a new program the plugin deter- use of our machine learning based compiler. mines the program’s features, passing them to the model which determines the optimizations to be applied. This paper is organized as follows: the next section de- scribes the overall MILEPOST framework and is, it- self, followed by a section detailing our implementation Framework In this paper we use a new version of the of the Interactive Compilation Interface for GCC that Interactive Compilation Interface (ICI) for GCC which enables dynamic manipulation of optimization passes. controls the internal optimization decisions and their pa- Section 4 describes machine learning techniques used rameters using external plugins. It now allows the com- to predict good optimization passes for programs us- plete substitution of default internal optimization heuris- ing static program features and optimization knowledge tics as well as the order of transformations. reuse. Section 5 provides experimental results and is We use the Continuous Collective Compilation Frame- followed by concluding remarks. work [2] to produce a training set for machine learn- ing models to learn how to optimize programs for the 2 MILEPOST Framework best performance, code size, power consumption and any other objective function needed by the end-user. The MILEPOST project uses a number of components, This framework allows knowledge of the optimization at the heart of which is the machine learning enabled space to be reused among different programs, architec- MILEPOST GCC, shown in Figure 1. MILEPOST tures and data sets. GCC currently proceeds in two distinct phases, in ac- Together with additional routines needed for machine cordance with typical machine learning practice: train- learning, such as program feature extraction, this forms ing and deployment. the MILEPOST GCC. MILEPOST GCC transforms the compiler suite into a powerful research tool suitable for adaptive computing. Training During the training phase we need to gather information about the structure of programs and record The next section describes the new ICI structure and ex- how they behave when compiled under different op- plains how program features can be extracted for later timization settings. Such information allows machine machine learning in Section 4. 2
  • 3. MILEPOST GCC CCC Program1 (with ICI and ML routines) Continuous Collective Compilation Framework IC Plugins Training Drivers for … Recording pass iterative sequences compilation and model Extracting static training ProgramN program features MILEPOST GCC Deployment Extracting static program features Predicting “good” New program passes to improve exec.time, code size Selecting “good” passes and comp. time Figure 1: Framework to automatically tune programs and improve default optimization heuristics using machine learning techniques, MILEPOST GCC with Interactive Compilation Interface (ICI) and program features extractor, and Continuous Collective Compilation Framework to train ML model and predict good optimization passes 3 Interactive Compilation Interface 3.1 Internal structure To avoid the drawbacks of the first version of the ICI, we designed a new version, as shown in Figure 2. This ver- This section describes the Interactive Compilation Inter- sion can now transparently monitor execution of passes face (ICI). The ICI provides opportunities for external or replace the GCC Controller (Pass Manager), if de- control and examination of the compiler. Optimization sired. Passes can be selected by an external plugin settings at a fine-grained level, beyond the capabilities which may choose to drive them in a very different order of command line options or pragmas, can be managed to that currently used in GCC, even choosing different through external shared libraries, leaving the compiler pass orderings for each and every function in program uncluttered. being compiled. Furthermore, the plugin can provide its own passes, implemented entirely outside of GCC. The first version of ICI [21] was reactive and required In an additional set of enhancements, a coherent event minimal changes to GCC. It was, however, unable to and data passing mechanism enables external plugins to modify the order of optimization passes within the com- discover the state of the compiler and to be informed as piler and so large opportunities for speedup were closed it changes. At various points in the compilation process to it. The new version of ICI [9] expands on the ca- events (IC Event) are raised indicating decisions about pabilities of its predecessor permitting the pass order to transformations. Auxiliary data (IC Data) is registered be modified. This version of ICI is used in the MILE- if needed. POST GCC to automatically learn good sequences of optimization passes. In replacing default optimization Since plugins now extend GCC through external shared heuristics, execution time, code size and compilation libraries, experiments can be built with no further mod- time can be improved. ifications to the underlying compiler. Modifications for 3
  • 4. High-level scripting (java, python, etc) GCC GCC with ICI Detect Detect ICI IC Plugins optimization flags optimization flags <Dynamically linked shared libraries> IC Selecting pass Event sequences ... GCC Controller GCC Controller IC Extracting static (Pass Manager) (Pass Manager) Event program features Interactive Compilation Interface Pass 1 Pass N Pass 1 Pass N CCC Continuous Collective ... IC Compilation Framework Event ML drivers to optimize programs and tune compiler GCC Data Layer GCC Data Layer IC optimization AST, CFG, CF, etc AST, CFG, CF, etc Data heuristic (a) (b) Figure 2: GCC Interactive Compilation Interface: a) original GCC, b) GCC with ICI and plugins different analysis, optimization and monitoring scenar- called pass_execution. ios proceed in a tight engineering environment. These plugins communicate with external drivers and can al- Figure 3c shows simple modifications in GCC Con- low both high-level scripting and communication with troller (Pass Manager) to enable monitoring of exe- machine learning frameworks such as MILEPOST cuted passes. When the GCC Controller function exe- GCC. cute_one_pass is invoked, we register an IC-Parameter called pass_name giving the real name of the executed Note that it is not the goal of this project to develop pass and trigger an IC-Event pass_execution. This in fully fledged plugin system. Rather, we show the util- turn invokes the plugin function executed_pass where ity of such approaches for iterative compilation and we can obtain the current name of the compiled function machine learning in compilers. We may later utilize using ici_get_feature("function_name") and the pass GCC plugin systems currently in development, for ex- name using ici_get_parameter("pass_name"). ample [7] and [13]. IC-Features provide read only data about the compila- Figure 3 shows some of the modifications needed to en- tion state. IC-Parameters, on the other hand, can be dy- able ICI in GCC with an example of a passive plugin namically changed by plugins to change the subsequent to monitor executed passes. The plugin is invoked by behavior of the compiler. Such behavior modification is the new -fici GCC flag or by setting ICI_USE environ- demonstrated in the next subsection using an example ment variable to 1 (to enable non-intrusive optimiza- with an avoid_gate parameter needed for dynamic pass tions without changes to Makefiles). When GCC detects manipulation. these options, it loads a plugin (dynamic library) with a name specified by ICI_PLUGIN environment variable Since we use the name field from the GCC pass struc- and checks for two functions start and stop as shown in ture to identify passes, we have had to ensure that each Figure 3a. pass has a unique name. Previously, some passes have had no name at all and we suggest that in the future The start function of the example plugin registers an a good development practice of always having unique event handler function executed_pass on an IC-Event names would be sensible. 4
  • 5. x v ¦iii u§tsr¦q§pi ¤ £¡   ¢ ƒ '  y € ¦iii u ¤ £¡   ¢ ¦© ©¨§¦¥ ¤ £¡   ¢ w ‚ ¢ AA 5 ¡ !¡ £3 ¢ !¡ „ X … c ‰ # b # (ˆ( ‡ £3 ¢ !¡ ‡ … c ‚ ' X ‘ †† € aa$ ‰ #‰b (‰” ‰ d ‚ ££ X1 1 ) HA 7H2 H 7£¡   5¢! 4321 ) ' ( %$ # ¢ ¡ 0 ' £3 ¢ !¡ „ X … ‰($“ ‰b ' ˆ( ‡ c ‚ ' ’‡ ’‡ ‡ … 6 6 ‘ †† †† 8 6 8 9 ’ ¡ f¢ ¡ !¡ ¢ AA5 ! ¡ ¡ E H X 3 Y aa$ % ‰b (‰”‰ 3 A @ d € ¡ 73 A@ ¡ ) 6 9 8 A3A ! ¡ 3£ ’ X ! ¡ ! E G F1 £  3 A@ ¡ E 5DB C 2 ¡ B 6 E £’¢ AA5 ¡ X E G F1 A 1 £ ¢ AA5 1 H21 E £ 0Y ! 6 6 Y ! E £ !X  21 0 6 FF £’ € ) F a b $ba‰b$ £ 1 k 7 D j i X D RD Q I £¡   5¢! 43 ! £HA 3 F 5DB C ¢ ¡ ¢ ¡ B E 7UTSD hg H ¤ P ) 6 6 E 7 £ ’ ) E 7 1 A1 £ 3 1 A  W £ 1 ¢ F 1 A1 £ V 1 £H0 ’ A 0 £ £B 2‡ X 1 1 7£ X £  3 £ ¡ 5A 1 ! ) hg H ) 0 6 6 ‡21 £3 A H CC• ¢ ¡ ’1 A £@ A 2 1 ! ¡ ¡ ¡ £’ )1 ££ … Y H 1 £ ’£ 1 € ¡ E 7a b$ba‰b$  I … 1 l X ¡ ` c ` ¡ B F 1 ¡ E 7 b $b a I 5DB C )Y 4 3 7 !X ) ¢   6 6 6 6 ¡ ¡ E 7 ‰b$  % #m $ ) ! £@£ ¢¢ ` ' ` £ E 7 1 A1 £ 3 1 A  W £ ) 1 ¢ 0 F 1 A1 V 1 721 ) F £ 0 Y ! !X   6 6 6 6 6 ¡ ¡ E 7 £’ )1 ££ … X … Y H 1 £ ’£ !X 1 ¡ 1 ¡ E 7 d #b a I 5DB C )Y 4 3 7 !X ) F HA ¡ ` ` ¡ B ¢   ` – $ # b( € “ v ‰ € b$‰“ b‰  ( — ƒ ‰ ‚ ‚ ' ‚ ` c ' ' 6 6 6 6 £ E 7 1 A1 £ 3 1 A  W £ ) 1 ¢ 0 F 1 A1 V 1 6 6 7 £ ’) ¡ X n   721 ) F £ 0 Y ! H 6 9 6 E £  !1X £ ¢ 1 e — ƒ ‰ ` – $ aa$ ` v ‰b‰ ‚ d c – $ $ b‰  ( c d ' ' HI £Y ! !X  I… ! ™ ˜ ˜ )  !¡1 H … Y ! E 7£ 6 6 ƒp % #mv ` ‰ ' – $ $ ‰ba ‰ ( – $ aa$ ` v ‰b ‰ ‚ d c c d c ' c ' ' e o — ƒƒ ‰ – $ qˆa a$ v ‚ d e — ƒ ‚ # b (‰”‰a a$ ` vb ‰m‰ $( ( ` ' € d ‚ ' ' — ƒ ‰ – $ $ ‰ba ‰ ‚€ ( ` – $ aa$ ` v ‰b‰ ‚ d c c d c ' c ' ' b $ba1 c 0 73 A @ ¡ ) 8 9 I`‚ # b (‰”‰aa$ ` v ! £@£ 1 £ ’£ ¡ ¡ ' € d ¡ 1 6 6 E 7aa$ %‰b (‰ ”‰d d € e f h hg f hef r Figure 3: Some GCC modifications to enable ICI and an example of a plugin to monitor executed passes: a) IC Framework within GCC, b) IC Plugin to monitor executed passes, c) GCC Controller (pass manager) modifi- cation 3.2 Dynamic Manipulation of GCC Passes performance to be gained within GCC from such actions otherwise there is no point in trying to learn. By using Previous research shows a great potential to improve the Continuous Collective Compilation Framework [2] program execution time or reduce code size by carefully to random search though the optimization flag space selecting global compiler flags or transformation param- (50% probability of selecting each optimization flag) eters using iterative compilation. The quality of gener- and MILEPOST GCC 4.2.2 on AMD Athlon64 3700+ ated code can also be improved by selecting different and Intel Xeon 2800MHz we could improve execution optimization orders as shown in [16, 15, 17, 26]. Our time of susan_corners by around 16%, compile time by approach combine the selection of optimal optimization 22% and code size by 13% using Pareto optimal points orders and tuning parameters of transformations at the as described in the previous work [24, 25]. Note, that same time. the same combination of flags degrade execution time of this benchmark on Itanium-2 1.3GHz by 80% thus The new version of ICI enables arbitrary selection of le- demonstrating the importance of adapting compilers to gal optimization passes and has a mechanism to change each new architecture. Figure 4a shows the combina- parameters or transformations within passes. Since tion of flags found for this benchmark on AMD platform GCC currently does not provide enough information while Figures 4b,c show the passes invoked and moni- about dependencies between passes to detect legal or- tored by MILEPOST GCC for the default -O3 level and ders, and the optimization space is too large to check for the best combination of flags respectively. all possible combinations, we focused on detecting in- fluential passes and legal orders of optimizations. We Given that there is good performance to be gained examined the pass orders generated by compiler flags by searching for good compiler flags, we now wish that improved program execution time or code size us- to automatically select good optimization passes and ing iterative compilation. transformation parameters. These should enable fine- Before we attempt to learn good optimization settings grained program and compiler tuning as well as non- and pass orders we first confirmed that there is indeed intrusive continuous program optimizations without 5
  • 6. 6 a program needed by the model to distinguish between avoid_gate so that we can set gate_status to TRUE representation and characterize the essential aspects of trol we use IC-Parameter gate_status and IC-Event tures are typically a summary of the internal program optimization flags is selected. To avoid this gate con- gram structure or program features. The program fea- (pass-gate()) to execute the pass only if the associate mization to apply to an input program based on its pro- execute_one_pass shown in Figure 3c has gate control Our machine learnt model predicts the best GCC opti- Figure 4c. However, note that the GCC internal function mark with the -O3 flag but selecting passes shown in 3.3 Adding program feature extractor pass pass orders using ICI, we recompiled the same bench- To verify that we can change the default optimization future plan to add support for such cases explicitly. their interaction and dependencies for future work. pile programs with such flags always enabled, and in the optimizations. However, we leave the determination of generation in other passes. Thus, at this point we recom- shown in Figure 2b or search for new good orders of fine-grain transformation parameters and influence code of passes previously found by the CCC Framework as ated pass such as -funroll-loops but also select specific default GCC Pass Manager and execute good sequences son is that some compiler flags not only invoke associ- execute_one_pass. Therefore, we can circumvent the its execution time by 13% instead of 16% and the rea- ici_run_pass function that in turn invokes GCC function cution of the generated binary shows that we improve of ICI allows passes to be called directly using the within plugins and thus force its execution. The exe- modifications to Makefiles, etc. The current version -O3 using ICI, c) recorded compiler passes for the good selection of flags (a) improve execution and compilation time for susan_corners benchmark over -O3, b) recorded compiler passes for Figure 4: a) Selection of compiler flags found using CCC Framework with uniform random search strategy that 5e 3 7 § © © 8 §¥7 £ ¤ £8 © £8 ¥ § ¤ H ( ¦ © 8© ¤ 7 ( ¥ § 7 § © ¦¤7¤§(  ¦§7¤(§(('7¤§ © §7¦¥$7 £ 8 §( £ ¤ © $ ¥7¤ © §$ 8 © $ ¥70¥ © 7¨§¦¥¤7¤§( 8£ § § ¤ 7 (''7§((7 §¥7§ ¦§§7 7 7 7 ¨ 7 7 7 §( T IDP9U7§ 7H  §¨ 7 7 ¢ ¤¥ H £ §¤¥ §¤¥ 8 §( © ¨ §( © §( ¤ ¤ Y £ ¤ $ % ©¤ § 1 $ §( 7§¥7§' 7 7 7 7 ¨ 7 7 7 7 7 7 7 7 $ ¥ % § £ §¤¥ `Dd § 8 S S@CEWcbA@@S ¦¥ ©H¤ 8 © ( 1 8 © 8 %§¥ %§¤¥ $ # 7 7 7 7 7 ' 7 ¥ 7 7 6 7 ¥ § 7 ¥ § ¤ §( 1 ¨§( ¦¤ ¤ 1© ¤( © ¥ ¤ © £ 8© © $ ¥ 8£ 8© § ( ¨ § ¤ ¤ © © 8 £ 8 §( £ ¤§( © ( © © ¨ 8 §( £ ¤ 7( § ¥ 7 7 '7 1 7¨§) 7 ¦7 (H( 7 ( 7 7 7 ¨ © ( 8 © £8 (H §¥ 8 £ ¤0¥ $ © © £ §¥¨¨¥ ( 1 ¥ ¤¤§( § 7 ( ¥7 ¥ 17 ¨ ' ¨( ¥§(7 §7¥¥ 7¦¥ H 7 ( ¥7 7 7 7 7 7 7 7' 7 § ¤ 8 2© $ ¤ © ¤ 2 © 0 ¤ ¤ § ( ¨§ ©(¥ ¤ £ (2 ¥ 8 ( © ¤ 7 § ' 7 ¥¥ 8 ( © 7)¤ # 7 © ¦7 (H( £ ¤7§¥¨ 7¥ ¤¤§(7¤7(¤7(§H 6¥7¦¥7§ £ ( 7¤7 © ¦7§¥¨7( 1 ¦§(§$7 (2 ¥ 7 ¤ 7 (H( 7§¥¨7§( 7¥¥7 § 7 7 7 7 ¥ § 7 ¨ ¥ ¥ £ £ © ¤ © ( ¤ ¤¤ ¤ ¤ ¤ £8 ¥ 8© ¤§( ¤( 18 §¥ §(§ £ ( ¤ § 7 © ) $ © 87¤§( © ( © © ¨ 8©7 £ 6 £ ¤ 5 a 3 § © © 8 § ¥ ¤ 7 7 ¥ § ( ¥ § 7 £ ¤ £ 8 © £8 H (¦ © 8© ¤7§ © ¦¤7¤§ ¦§7¤(§(('7¤§ © §7¦¥$7 £ 8 §( £ ¤ © $ ¥7¤ © §$ 8 © $ ¥ § 7 7 7 7 ''7 7 ¨ 0 ¥ © ¨ § ¦ ¥ ¤ ¤ § ( 8£ § ( ¤ §((7¢§¥7§ ¦§§7¤¥ 7H £ §¤¥7§¤¥ 8 §( © 7¨ §( © 7§(7§(7§ £ 7H¤ §¨ $ ¤ ¤ 7 7 7 7 7 7 7 UUFAB % © ¤7§ 1 $§ (7§¥7§'$ ¥ 7%§ £7§¤¥7§ ¨ 8 ¦¥ ©H ¤ 8 © ( 1 8 © 8 %§¥ ` 7 DU9X7%§ ¥7 7 7 7 7 ( 7 ' 7 ¥ 7 7 6 7 ¥ § Y ¤ $ # ¤ §( 1 ¨§(¦¤ ¤ 1© ¤ © ¥ ¤ © £ 8© © $ ¥ 8£ 8© § ( ¨ § ¤ ¤ © © 8 £ 8 §( £ 7 ¥ § 7 § ¥ 7 ( §( © ( © © ¨ 8 §( £ ( © ( 8 © £8 (H 7§¥ 8 £ ¤0¥ '7 1 7¨§)$ © 7 A@CA9EWTDGFEDCBA@9T9SQFR7 © ¦7 (H( £ ¤ 7DUI7§¥¨¨¥7 A@CA9QPA TG@I7( 1 ¥ 7 7 ¨ 7 URA@VQ7 7 ¥ 17 ¨ ' ¨( ¥§(7 §7 ¥¥ 7¦¥ H 7 GQS7 ( ¥ ¤¤§( § ( ¥ ¤ 8 2© $ ¤ © ¤ 2 7 7 7 7 7 7 7 7' 7 § 7 § 7) ' 7 ¦7 (H( 7 7 UI7§¥¨7¥ 7 © 0 ¤ ¤ §( ¨§ ©(¥ ¤ £ (2 ¥ 8 ( © ¥¥ 8 ( © ¤ ¤ ¤# © £ ¤ D ¤¤§( A@CA9QPA TG@ITDGFEDCBA@97 7 7 6 77 7 7 R7 7 ¦7 A@CA9QPA7 G@I7§¥¨7( 1 ¦§(§ 7 7 7 7 ¤ (¤ (§H ¥ ¦¥ § £ ( CSQF ¤ © $ (2 ¥ (H( £ §¥¨ 7§( 7¥¥7 DGFEDCBA@97 § 7 7 7 7 ¥ § 7 ¨ ¥ ¥ £ © ¤ © ( ¤ ¤¤ ¤ ¤ ¤ £8 ¥ 8© ¤§( ¤( 18 §¥ §(§ £ ( ¤ § 7 © ) $ © 87¤§( © ( © © ¨ 8©7 £ 6 £ ¤ 54 3 (§ 1    ¥       (   §§(   §( §§(   (  §§(     ¤ © § 1 §§( © £ ¤ ©£ ©£ ¤ © £                     § ¨ §§(             §)$ ©   §§( © £ (§ §§( © £ $ §§( © £ ¤ © £ ¤©  ( © $ ¨  §§( © £ §$§(2 ¥ §§( © £   ¥                         §    §   ¤ © ( © £ §¥¨§( ¦ © §( © £ (§ ( ¤ ¤ ¤ 1 © ¤ £ ¤ ¤ £ ¤ ¤ ¨§¦¥¤ £ ¤ ¤ ¨§¦¥¤ £ '         ¨       ¨ ¨       '             ¤0¥ (§ ¤ §¤ ¨§¦¥¤ £ ¥§¤ ¨§ ¦¥¤ £ ¤ (§ ¥§¤ ¨§¦¥¤ £ 0¥ (§ © ¨§¦¥¤ £ © (§( £ '                  '   § )     § 1     )   ¤0¥ (§¨( §( £ § 1 $ §( £ ¤2(( ¦¥ © £ ( § §   £ ¤ §§ £ ¤¥ ¤ $ © £ $ (§ © ¤ §( § $ © £ ((§ ¦       ¥             © $ £ ¤© 1  £ ¤ © £ § £ ¤(§ 1 ¥  £ £ §¤¥  £ ¤ §¤¥  £ ¨ §( (§ ©£  §¤¥ £     §   §     (¨¨ §¥( £ £  (§ £ ¨  £  (§ £ ¨ £ ¤0¥ ' 0¤ §¤¥  £ $ # ¤ (¥  £ §)$ ©  ¨   © ( © ¦¥(' £   §     ¤ §(    '    %         ¥        ¨§       © ¥ ¦¥ ( £ ¤ £ !¤$ # £ !¤ © £ £ §¨ ¤ ¤ © ¨§¦¥¤ £ ¢¡ ¤
  • 7. good and bad optimizations. Each such struct data type may introduce an entity, and its f ields may introduce relations over the entity rep- The current version of ICI allows invoking auxiliary resenting the including struct data type and the entity passes that are not a part of default GCC compiler. representing the data type of the f ield. This data is col- These passes can monitor and profile the compilation lected using ml-feat pass. process or extract data structures needed to generate pro- gram features. In the second stage, we provide a Prolog program defin- ing the features to be computed from the Datalog re- During the compilation, the program is represented by lational extracted from the compiler internal data struc- several data structures, implementing the intermediate tures in the first stage. The extract_program_static _fea- representation (tree-SSA, RTL etc), control flow graph tures plugin invokes a Prolog compiler to execute this (CFG), def-use chains, the loop hierarchy, etc. The data program, the result being the vector of features shown structures available depend on the compilation pass cur- in Table 1 that can later be used by the Continuous Col- rently being performed. For statistical machine learn- lective Compilation Framework to build machine learn- ing, the information about these data structures is en- ing models and predict best sequences of passes for new coded as a vector of constant size of numbers (i.e fea- programs. This example shows the flexibility and capa- tures) - this process is called feature extraction and is bilities of the new version of ICI. needed to enable optimization knowledge reuse among different programs. 4 Using Machine Learning to Select Good Op- Therefore, we implemented an additional GCC pass ml- timization Passes feat to extract static program features. This pass is not invoked during default compilation but can be called us- The previous sections have described the infrastructure ing a extract_program_static_features plugin after any necessary to build a learning compiler. In this section arbitrary pass starting from FRE when all the GCC data we describe how this infrastructure is used in building a necessary to produce features is ready. model. In the MILEPOST GCC, the feature extraction is per- formed in two stages. In the first stage, a relational rep- Our approach to selecting good passes for programs is resentation of the program is extracted; in the second based upon the construction of a probabilistic model on stage, the vector of features is computed from this rep- a set of M training programs and the use of this model in resentation. order to make predictions of “good” optimization passes on unseen programs. In the first stage, the program is considered to be charac- terized by a number of entities and relations over these Our specific machine learning method is similar to that entities. The entities are a direct mapping of similar en- of [10] where a probability distribution over “good” so- tities defined by the language reference, or generated lutions (i.e. optimization passes or compiler flags) is during the compilation. Such examples of entities are learnt across different programs. This approach has variables, types, instructions, basic blocks, temporary been referred in the literature to as Predictive Search variables, etc. Distributions (PSD) [12]. However, unlike [10, 12] where such a distribution is used to focus the search of A relation over a set of entities is a subset of their Carte- compiler optimizations on a new program, we use the sian product. The relations specify properties of the en- distribution learned to make one-shot predictions on un- tities or the connections between them. For describing seen programs. Thus we do not search for the best opti- the relations we used a notation based on logic - Datalog mization, we automatically predict it. is a Prolog-like language but with a simpler semantics, suitable for expressing relations and operations between 4.1 The Machine Learning Model them [35, 34] For extracting the relational representation of the pro- Given a set of training programs T 1 , . . . , T M , which can gram, we used a simple method based on the examina- be described by (vectors of) features t1 . . . , tM , and for tion of the include files. The compiler main data struc- which we have evaluated different sequences of opti- tures are struct data types, having a number of f ields. mization passes (x) and their corresponding execution 7
  • 8. times (or speed-ups y) so that we have for each pro- Let us denote the distribution over good solutions on j gram M j an associated dataset D j = {(xi , yi )}N , with i=1 each training program by P(x|T j ) with j = 1, . . . , M. In j = 1, . . . M, our goal is to predict a good sequence of principle, these distributions can belong to any paramet- optimization passes x∗ when a new program T ∗ is pre- ric family. However, In our experiments we use an IID sented. model where each of the elements of the sequence are considered independently. In other words, the probabil- We approach this problem by learning the mapping from ity of a “good” sequence of passes is simply the product the features of a program t to a distribution over good of each of the individual probabilities corresponding to solutions q(x|t, θ ), where θ are the parameters of the how likely each pass is to belong to a good solution: distribution. Once this distribution has been learnt, pre- dictions on a new program T ∗ is straightforward and it is L achieved by sampling at the mode of the distribution. In P(x|T j ) = ∏ P(x |T j ), (2) =1 other words, we obtain the predicted sequence of passes by computing: where L is the length of the sequence. x∗ = argmax q(x|t, θ ). (1) x As proposed in [12], once the individual training distri- butions P(x|T j ) have been obtained, the predictive dis- 4.2 Continuous Collective Compilation Frame- tribution q(x|t, θ ) can be learnt by maximization of the work conditional likelihood or by using k-nearest neighbor methods. In our experiments we use a 1-nearest neigh- We used Continuous Collective Compilation Frame- bor approach. In other words, we set the predictive dis- work [2] and MILEPOST GCC shown in Figure 1 to tribution q(x|t, θ ) to be the distribution corresponding generate a training set of programs together with com- to the training program that is closest in feature space to piler flags selected uniformly at random, associated se- the new (test) program. quences of passes, program features and speedups (code size, compilation time) that is stored in the externally Note that we currently predict “good” sequences of opti- accessible Global Optimization Database. We use this mization passes that are associated to the best combina- training set to build machine learning model described tion of compiler flags. We will investigate in future work in the next section which in turn is used to predict the the application of our models to the general problem of best sequence of passes for a new program given its determining “optimal” order of optimization passes for feature vector. Current version of CCC Framework re- programs. quires minimal changes to the Makefile to pass opti- mization flags or sequences of passes and has a support to verify the correctness of the binary by comparing pro- 5 Experiments gram output with the reference one to avoid illegal com- binations of optimizations. We performed our experiments on four different plat- forms: 4.3 Learning and Predicting In order to learn the model it is necessary to fit a dis- • AMD – a cluster with 16 AMD Athlon 64 3700+ tribution over good solutions to each training program processors running at 2.4GHz beforehand. These solutions can be obtained, for ex- • IA32 – a cluster with 4 Intel Xeon processors run- ample, by using uniform sampling or by running an es- ning at 2.8GHz timation of distribution algorithm (EDA, see [27] for an overview) on each of the training programs. In our • IA64 – a server with Itanium2 processor running at experiments we use uniform sampling and we choose 1.3GHz the set of good solutions to be those optimization set- tings that achieve at least 98% of the maximum speed- • ARC – FPGA implementation of the ARC 725D up available in the corresponding program-dependent processor running GNU/Linux with a 2.4.29 ker- dataset. nel. 8
  • 9. 2.24 2.31 2.31 1.97 1.6 1.5 speedup 1.4 1.3 1.2 1.1 1 gsm n_c n_e n_s _c _d tra cia d ael_e sha _c _d 32 t1 unt d e h1 fish_ ael_ fish_ qsor jpeg CRC m jpeg m earc dijks patri bitco susa susa susa adpc adpc rijnd rijnd blow blow gs strin AMD IA32 IA64 (a) 1.4 1.3 speedup 1.2 1.1 1 0.9 0.8 age gsm n_c n_e n_s _c _d tra icia d e sha m_c m_d 32 t1 unt d e h1 ael_ fish_ ael_ fish_ qsor jpeg CRC jpeg earc dijks bitco patr Aver susa susa susa adpc adpc rijnd rijnd blow blow gs strin Iterative compilation Predicted optimization passes using ML and MILEPOST GCC (b) Figure 5: Experimental results when using iterative compilation with random search strategy (500 iterations; 50% probability to select each flags; AMD,IA32,IA64) and when predicting best optimization passes based on program features and ML model (ARC) In all case the compiler used is MILEPOST GCC 4.2.x sufficient information can be gleaned from this to allow with the ICI version 0.9.6 We decided to use open- significant speedup. Indeed, the size of the space left source MiBench benchmark with MiDataSets [20, 6] unexplored serves to highlight our lack of knowledge in (dataset No1 in all cases) due to its applicability to both this area, and the need for further work. general purpose and embedded domains. Firstly, features for each benchmark were extracted 5.1 Generating Training Data from programs using the new pass within MILEPOST GCC, and these features then sent to the Global Op- In order to build a machine learning model, we need timization Database within CCC Framework. An ML training data. This was generated by a random explo- model for each benchmark was built, using the execu- ration of a vast optimization search space using the CCC tion time gathered from 500 separate runs using differ- Framework. It generated 500 random sequences of flags ent random sequences of passes, and a fixed data set. either turned on or off. These flag settings can be read- Each run was repeated 5 times so speedups were not ily associated with different sequences of optimization caused by cache priming etc.). After each run, the ex- passes. Although such a number of runs is very small perimental results including execution time, compila- in respect to the optimisation space, we have shown that tion time, code size and program features are sent to the 9
  • 10. database where they are stored for future reference. 6 Conclusions and Future Work Figure 5a shows that considerable speedups can be al- In this paper we have shown that MILEPOST GCC has ready obtained after iterative compilation on all plat- significant potential in the automatic tuning of GCC op- forms. However, this is a time-consuming process and timization. We plan to use these techniques and tools different speedups across different platforms motivates to further investigate the automatic selection of opti- the use of machine learning to automatically build spe- mal orders of optimization passes and fine-grain tun- cialized compilers and predict the best optimization ing of transformation parameters. The overall frame- flags or sequences of passes for different architectures. work will also allow the analysis of interactions be- tween optimizations and investigation of the influence 5.2 Evaluating Model Performance of program inputs and run-time state on program opti- mizations. Future work will also include fine-grain run- Once a model has been built for each of our bench- time adaptation for multiple program inputs on hetero- marks, we can evaluate the results by introducing a geneous multi-core architectures. new program to the system, and measuring how well the prediction provided by our model performs. In this 7 Acknowledgments work we did not use a separate testing benchmark suite due to time constraints, so instead leave-one-out-cross- The authors would like to thank Sebastian Pop for the validation was used. Using this method, all training data help with modifying and debugging GCC and John relating to the benchmark being tested is excluded from W.Lockhart for the help with editing this paper. This the training process, and the models rebuilt. When a sec- research is supported by the European Project MILE- ond benchmark is tested, the training data pertaining to POST (MachIne Learning for Embedded PrOgramS the first benchmark is returned to the training set, that of opTimization) [4] and by the HiPEAC European Net- the second benchmark excluded, and so on. In this way work of Excellence (High-Performance Embedded Ar- we ensure that each benchmark is tested as a new pro- chitecture and Compilation) [5]. gram entering the system for the first time—of course, in real-world usage, this process is unnecessary. References When a new program is compiled, features are first gen- erated using MILEPOST GCC. These features are then [1] ACOVEA: Using Natural Selection to Investigate sent to our ML model within CCC Framework (imple- Software Complexities. http://www. mented as a MATLAB server), which processes them coyotegulch.com/products/acovea. and returns a predicted sequence of passes which should [2] Continuous Collective Compilation Framework. either improve execution time or reduce code size or http://cccpf.sourceforge.net. both. We then evaluate the prediction by compiling the program with the suggested sequence of passes, mea- [3] ESTO: Expert System for Tuning Optimizations. sure the execution time and compare with the origi- http: nal time for the default ’-O3’ optimization level. It //www.haifa.ibm.com/projects/ is important to note that only one compilation occurs systems/cot/esto/index.html. at evaluation—there is no search involved. Figure 5b [4] EU Milepost project (MachIne Learning for shows these results for the ARC725D. It demonstrates Embedded PrOgramS opTimization). that except a few pathological cases where predicted http://www.milepost.eu. flags degraded performance and which analysis we leave for future work, using CCC Framework, MILEPOST [5] European Network of Excellence on GCC and Machine Learning Models we can improve High-Performance Embedded Architecture and original ARC GCC by around 11%. Compilation (HiPEAC). http://www.hipeac.net. This suggests that our techniques and tools can be effi- cient to build future iterative adaptive specialized com- [6] MiDataSets SourceForge development site. pilers. http://midatasets.sourceforge.net. 10
  • 11. [7] Plugin project. Languages, Compilers, and Tools for Embedded http://libplugin.sourceforge.net. Systems (LCTES), pages 1–9, 1999. [8] QLogic PathScale EKOPath Compilers. [17] B. Elliston. Studying optimisation sequences in http://www.pathscale.com. the gnu compiler collection. Technical Report ZITE8199, School of Information Technology [9] GCC Interactive Compilation Interface. and Electical Engineering, Australian Defence http://gcc-ici.sourceforge.net, Force Academy, 2005. 2006. [18] B. Franke, M. O’Boyle, J. Thomson, and [10] F. Agakov, E. Bonilla, J.Cavazos, B.Franke, G. Fursin. Probabilistic source-level optimisation G. Fursin, M. O’Boyle, J. Thomson, of embedded programs. In Proceedings of the M. Toussaint, and C. Williams. Using machine Conference on Languages, Compilers, and Tools learning to focus iterative optimization. In for Embedded Systems (LCTES), 2005. Proceedings of the International Symposium on Code Generation and Optimization (CGO), 2006. [19] G. Fursin. Iterative Compilation and Performance Prediction for Numerical Applications. PhD [11] F. Bodin, T. Kisuki, P. Knijnenburg, M. O’Boyle, thesis, University of Edinburgh, United Kingdom, and E. Rohou. Iterative compilation in a 2004. non-linear optimisation space. In Proceedings of the Workshop on Profile and Feedback Directed [20] G. Fursin, J. Cavazos, M. O’Boyle, and Compilation, 1998. O. Temam. Midatasets: Creating the conditions for a more realistic evaluation of iterative [12] E. V. Bonilla, C. K. I. Williams, F. V. Agakov, optimization. In Proceedings of the International J. Cavazos, J. Thomson, and M. F. P. O’Boyle. Conference on High Performance Embedded Predictive search distributions. In W. W. Cohen Architectures Compilers (HiPEAC 2007), and A. Moore, editors, Proceedings of the 23rd January 2007. International Conference on Machine learning, pages 121–128, New York, NY, USA, 2006. [21] G. Fursin and A. Cohen. Building a practical ACM. iterative interactive compiler. In 1st Workshop on Statistical and Machine Learning Approaches [13] S. Callanan, D. J. Dean, and E. Zadok. Extending Applied to Architectures and Compilation gcc with modular gimple optimizations. In (SMART’07), colocated with HiPEAC 2007 Proceedings of the GCC Developers’ conference, January 2007. Summit’2007, 2007. [22] G. Fursin, A. Cohen, M. O’Boyle, and O. Temam. [14] J. Cavazos, G. Fursin, F. Agakov, E. Bonilla, A practical method for quickly evaluating M. O’Boyle, and O. Temam. Rapidly selecting program optimizations. In Proceedings of the good compiler optimizations using performance International Conference on High Performance counters. In Proceedings of the 5th Annual Embedded Architectures Compilers (HiPEAC International Symposium on Code Generation 2005), pages 29–46, November 2005. and Optimization (CGO), March 2007. [23] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. [15] K. Cooper, A. Grosul, T. Harvey, S. Reeves, Austin, T. Mudge, and R. B. Brown. Mibench: A D. Subramanian, L. Torczon, and T. Waterman. free, commercially representative embedded ACME: adaptive compilation made efficient. In benchmark suite. In IEEE 4th Annual Workshop Proceedings of the Conference on Languages, on Workload Characterization, Austin, TX, Compilers, and Tools for Embedded Systems December 2001. (LCTES), 2005. [24] K. Heydemann and F. Bodin. Iterative [16] K. Cooper, P. Schielke, and D. Subramanian. compilation for two antagonistic criteria: Optimizing for reduced code space using genetic Application to code size and performance. In algorithms. In Proceedings of the Conference on Proceedings of the 4th Workshop on 11
  • 12. Optimizations for DSP and Embedded Systems, optimization-space exploration. In Proceedings of colocated with CGO, 2006. the International Symposium on Code Generation and Optimization (CGO), pages 204–215, 2003. [25] K. Hoste and L. Eeckhout. Cole: Compiler optimization level exploration. In Proceedings of [34] J. Ullman. Principles of database and knowledge the International Symposium on Code Generation systems. Computer Science Press, 1, 1988. and Optimization (CGO), 2008. [35] J. Whaley and M. S. Lam. Cloning based context [26] P. Kulkarni, W. Zhao, H. Moon, K. Cho, sensitive pointer alias analysis using binary D. Whalley, J. Davidson, M. Bailey, Y. Paek, and decision diagrams. In Proceedings of the K. Gallivan. Finding effective optimization phase Conference on Programming Language Design sequences. In Proc. Languages, Compilers, and and Implementation (PLDI), 2004. Tools for Embedded Systems (LCTES), pages 12–23, 2003. [36] R. Whaley and J. Dongarra. Automatically tuned linear algebra software. In Proc. Alliance, 1998. [27] P. Larrañaga and J. A. Lozano. Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer Academic Publishers, Norwell, MA, USA, 2001. [28] F. Matteo and S. Johnson. FFTW: An adaptive software architecture for the FFT. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 3, pages 1381–1384, Seattle, WA, May 1998. [29] A. Monsifrot, F. Bodin, and R. Quiniou. A machine learning approach to automatic production of compiler heuristics. In Proceedings of the International Conference on Artificial Intelligence: Methodology, Systems, Applications, LNCS 2443, pages 41–50, 2002. [30] Z. Pan and R. Eigenmann. Fast and effective orchestration of compiler optimizations for automatic performance tuning. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pages 319–332, 2006. [31] B. Singer and M. Veloso. Learning to predict performance from formula modeling and training data. In Proceedings of the Conference on Machine Learning, 2000. [32] M. Stephenson and S. Amarasinghe. Predicting unroll factors using supervised classification. In Proceedings of International Symposium on Code Generation and Optimization (CGO), pages 123–134, 2005. [33] S. Triantafyllis, M. Vachharajani, N. Vachharajani, and D. August. Compiler 12