SlideShare a Scribd company logo
1 of 105
Download to read offline
Physical Design Flow




Mohammad reza Kakoee
micrellab
m.kakoee@unibo.it
          @
Agenda
Introduction to design flow and Backend
Introduction to design planning
Floorplanning / Hierarchical design
Power planning
P       l    i
Summary
Agenda
 Introduction to design flow and Backend
 Introduction to design planning
 Floorplanning / Hierarchical design
 Power planning
 P       l    i
 Summary
The Physical Design Task

                  Physical Design
Verilog netlist        Flow
                                         GDSII


SDC constraints




    Front End                 Back End
Example Physical Design Flow
        Design/Constraints Import

               Floorplanning
                    p      g

                Placement

           Clock Tree S th i
           Cl k T     Synthesis

                  Routing

         Post Route Optimization

       Layout Verification / Finishing
Fullchip Design Overview

      Core placement
           area


The location of the core,
I/O areas P/G pads and
      the P/G grid
                                 RAM
                            IP
                Rings
        P/G                      ROM
        Grid
                Straps

         Periphery
         (I/O) area
Where Do We Start? - Design
Planning

 Verilog netlist                        Physical Design
                                             Flow




                   How do we handle?
 SDC constraints
                     Die size
                     IO / Hard-IP placement
                     Global clock distribution
                     Power planning
                     P        l   i
                     Flat versus hierarchical design
Design Planning
 Floorplanning
    Determine die size
    Shape and arrange hierarchical blocks
    Integrate hard-IP efficiently
    Predict and prevent congestion hotspots and critical timing
    paths
 Power planning
    Create power distribution grid
        Consider IR drop and Electromigration
    Implement power saving techniques
        Power gating
               g    g
        Multi-Voltage design / Voltage islands
Agenda
 Introduction to design planning
 Floorplanning
       p      g
   Setup/configuration
   Die size, utilization, metallization scheme
       size utilization
   IO-ring and macro placement
   Flat versus hierarchical design
   Hierarchical design planning issues
 Power planning
 Summary
Setup/configuration
S t /    fi    ti
                           check netlist
 Read netlist
                                   High fanout
 Read SDC                          Unique
                                   U i
 Read .lib files                   Unconnected inputs
                                   Standard cell area
 Read footprint for P&R
           p
                                   Check timing ith t i load
                                   Ch k ti i without wire l d
   LEF : SOC encounter
   Fram : Synopsys tools
 Read technology file
   Metal width … (DRC
   rules)
Floorplanning – Die Size
                     Size,
Utilization & Metal Stack-up
 Choosing the die size, initial standard cell utilization and
 metallization scheme involves several design tradeoffs (
 Schedule, Cost, Performance)
    Larger die
         Easier to route, less congestion, lower cap (decrease
         signal/power integrity related problems) faster design
                                        problems),
         cycle
         Higher cost, higher power
    More d
    M      dense power grid id
         Reduce risk of power related failures
         Increase number of metal layer masks, reduce signal
         route tracks
Floorplanning – Utilization




    Low standard-cell   High standard-cell
       utilization           tili ti
                            utilization
Floorplanning – Utilization
  Utilization refers to the percentage of core area that is taken
  up by standard cells.
      A typical starting utilization might be 70%
      This can very a lot depending on the design
  High utilization can make it difficult to close a design
      Routing congestion,
      Negative impact during optimization legalization stages.
  Utilization changes should be examined after each stage of
                    g                                        g
  the flow
      Avoid having large increases after placement optimization
      Feedback should be given to front-end designers
                                       front end
      Topographical synthesis is now possible
Initialize Floorplan
Define globals (VDD1,VDD2,GND1,….)
Define
D fi core area : ( ll + utilization f
                  (cells  ili i factor)
                                      )
      IO                       [
                               [Analog] macro
                                     g]


                                    core
           core


                          IO
                      Shape can be implied by a macro

 Place IO (fixed, equidistant,..)
   Take macro’s and power domains into
   account already
IO Ring and Large Macro
Placement
IO Ring is often decided by front-end designers, with input from
physical design and packaging engineers.
When placing large macros we must consider impacts on routing,
timing and power.
    For wire-bond place power hungry macros away from the chip
    center.                                           Possible routing
                                                       congestion hotspots
Flat Versus Hierarchical
Design
What happens if the design is too big to be
handled by the EDA tools?
         y
   Hierarchical Design
     Fullchip Design                  I/O Pad
                                      IP Macro
 Blk 1    Blk 2    Blk 3
                                      Block / Tile
 P&R      P&R       P&R
 Flow     Flow      Flow


    Fullchip Timing &
       Verification
Flat Versus Hierarchical
Design
Hierarchical Design
  Advantages
    Faster runtime, less memory needed for EDA tools
    Faster eco turn-around time
    Ability to do design re use
                         re-use
  Disadvantages
    Much more difficult for fullchip timing closure
    (ILMs)
    More intensive design planning needed,
    feedthrough generation repeater insertion timing
                 generation,            insertion,
    constraint budgeting.
Hierarchical Design : Specify
Partitions / Plan Groups
Netlist must have partitions as top level modules.
Partitions generally sized according to a target initial utilization
 ~70% utilization, ~300k-700k instances
Channels or abutment
Ch      l      b t    t
Rectilinear block shapes are possible               Abutment
        Channels

                              Rectilinear
                              Blocks
Hierarchical Design : Pin
Assignment
Pin constraints include parameters such as,
                                                  Pin guide 1
     Layers, spacing, size, overlap
     Net groups, pin guides
                                                                Pin guide 2
Pins can be assigned placement-based
                        placement based
(flightlines) or route-based (trial route,
boundary crossings).                              Partition
Pin guides can be used to influence automatic
pin placement of particular net groups

                              Pins at partition
                              corners can make
                              routing diffi lt
                                 ti difficult
Hierarchical Design : Timing
Budgeting
Chip level constraints must be mapped correctly to block
level constraints
The d i
Th design must b placed, t i l routed and h
                 t be l     d trial   t d d have pinsi
assigned before running budgeting
Block level constraints will be assigned input or output
delays on I/O ports based off of the estimated timing
slack.
                  IN1                    set_input_delay 1.5 get port
                                         set input delay 1 5 [ get_port IN1 ]
          1.5ns




                        Block Boundary
Hierarchical Design : Timing
Budgeting & Fullchip Timing Closure
      Fullchip timing closure is typically a bottleneck for design cycles.
          Block-level P&R flow does not emphasize io-to-flop, flop-to-io, io-to-io
          timing paths because budgeted constraints are only estimates
                  paths,
      Interface logic models (ILMs) can be used
          To speed-up timing analysis runs when fullchip design is too large.
          Required clock and datapaths are p
             q                      p          preserved, net/cell names are
                                                          ,
          identical



A                                      X        A                                    X

B                                      Y        B                                    Y

Clk
                                               Clk


              Original Netlist                        Interface Logic Model (ILM)
Agenda
 Introduction to design planning
 Floorplanning
 Power planning
   Intro to power issues in IC design
   Basic power grid creation
   Multi-voltage
   Multi voltage design & power gating
   Automated power grid design flows
 Summary
Power Consumption and Reliability

   Dynamic Power                             IR Drop
                                             IR-Drop /
                                            Voltage Drop
                         Average Power
                         p ob e
                         problem

    Static Power                          Fail
  (Leakage Power)
                                          Electromigration
                         Power density          (EM)
     Floorplan           problem in the
         +               Long run
  Design of the grid


  1 out of 5 chips fail due to excessive power consumption
Power Consumption and Reliability :
IR-Drop
 The drop in supply voltage over the length of the supply line
    A resistance matrix of the power grid is constructed
    The average current of each g
              g                   gate is considered
    The matrix is solved for the current at each node, to determine
    the IR-drop.

        VDD Pad                     VDD
Where does the all power go
to?
                         Total Power



                  Core       +         I/O

                                             •Separate supply ring
                                             •Often higher voltage
                                             •Fixed, no optimization
 Standard Cells   +        Macros

•Clock network
Agenda
  Introduction to design planning
  Floorplanning
  Power planning
    Intro to power issues in IC design
             p                      g
    Basic power grid creation
    Multi-voltage design & power gating
    Automated power grid design flows
  Summary
Power Grid Creation : Macro
Placement


                      Blocks with the
                      highest
                      performance and
                      highest power
                      consumption


                      Close to border power
                      pads (IR drop)
                      Away from each other
                      (EM)
Agenda
 Introduction to design planning
 Floorplanning
 Power planning
   Intro to power issues in IC design
            p                      g
   Basic power grid creation
   Multi-voltage design & power gating
   Automated power grid design flows
 Summary
Agenda
  Introduction to design planning
  Floorplanning
        p      g
  Power planning
    Intro to power issues in IC design
    Basic power grid creation
    Multi-voltage design & p
               g      g     power g
                                  gating
                                       g
    Automated power grid design flows
  Summary
        y
Automated Power Grid Design:
PNS & PNA
 Power grid creation has usually done by hand using
 rules of thumb for widths and number of straps
    Analysis often done late in the design flow
    Grid is typically over-designed to prevent time-
    intensive power grid changes.
 When incorporating advanced low-power strategies,
 there are too many variables to achieve an optimal result
 manually.
 For more complex designs an automated strategy is
 preferred.
    e.g
    e g Power Network Synthesis (PNS) and Power
    Network Analysis (PNA) from Synopsys
    Allows designers to anticipate affects of floorplanning
Power Network Analysis (PNA)
P     N t   kA l i
 There are EDA tools that allow early power network
 analysis for designs in the early floorplaning stage.
   Not i
   N t signoff quality, b t good enough f i iti l d i
              ff    lit but     d        h for initial design.
   e.g. Synopsys Power Network Analysis (PNA)
       VDD Pad                  VDD
Power Network Synthesis:
PNS – What?
 Goal is to QUICKLY find minimum routing resource
 required to meet specified IR drop target
    More power routing => easier to reach IR-drop
    target, but harder to route clock and signals with
    remaining tracks
                                           Power straps
                                           (in Red)

Power pads

                                           Power trunks


                                           Power rings
PNS : Running PNS Trials

               Run PNS
PNS : C t P
      Create Power R ti
                   Routing
 After running trials, an optimal p
               g     ,     p      power g can be chosen and the
                                        grid
 actual rails can be laid out.
 Virtual rails => actual rails
     Outside main PNS : memory footprint + cpu time
     Many options : eg. % Via penetration , order of routing …
 Check legal cell/pin placement (grid aligned ?)
 Depending on the design p
    p       g             g phase
     What cells, nets and layers
     eg. First macros and pads, then high voltage areas, …
 Seco da y G ports on e e shftrs, so cells, et egs
 Secondary PG po ts o level s t s, isol. ce s, ret. Regs
     Later after placement during routing : same as the follow pins for
     the normal vdd and gnd of the std cells.
PNS : Create P
      C t Power Routing
                R ti
Summary
The goal of design p
     g           g planning is to arrange the chip so that the “Place and
                              g           g        p
Route” flow can converge quickly and easily.
    Design experience is needed
Floorplan is driven by :
    Power
    P
    Timing
    Congestion
    Minimum area
There is no 1 way to create a floorplan
    Flat – hierarchical
    Regions, p
       g     , position of the macro’s
    Order of placement IO versus macros versus core
This phase can take a significant portion of the complete backend design
time.
Early
E l analysis of power grid i essential f avoiding major problems near
          l i f              id is     ti l for  idi    j       bl
the end of the design cycle.
Automated power grid tools may help reduce necessary safety margins.
Placement
Placement in the Flow
                      Design Specification




                                                         Front-End
                                                                 d
                  Logic Design and Verification




                                                         F
                        Logic Synthesis
   Physical
   Libraries
                        Floorplanning




                                                         ack-End
                                             Physical
    Netlist               Placement          Design
                                              Stage
                                                 g




                                                        Ba
                            Routing
Physical Design
  Constraints
Definition f Placement
D fi iti of Pl       t
Placement : Exact placement of the
  modules (modules can be gates, standard
  cells, macros…).
  cells macros ) The general goal is to
  minimize the total area and interconnect
  cost.
  cost
             The quality of the attainable routing is highly
             determined b th placement.
             d t   i d by the l            t

             Circuit placement becomes very critical in 90nm
             and below technologies.
Cost Function for Placement
C tF     ti f Pl          t
 Cost components          Methods of consideration
      Area
  Wire length        Traditional methods of Placement

    Overlap
     Timing               Timing-driven
                          Timing driven Placement

  Congestion           Congestion-driven Placement

     Clock                     Clock Gating

     Power         Multivoltage and Multisupply Placement
Placement Steps
             p
                           Input information:
                                 Netlist
                     Mapped and floorplanned design
                      Logical and physical libraries
                           Design constraints

                Reading Gate level netlists from synthesis
                        Gate-level

                             Global placement

                            Detailed l
                            D il d placement

                          Placement optimization

                             Output information:
                         Physical layout information
                           Cell placement locations
  Physical layout timing and technology information of reference libraries
           layout, timing,
Inputs for the Placement Tool
Gate-level netlist

     Design
   constraints        Logical


                      Target
                                                 Placement
Design libraries                                    tool
                     Physical

                                  Macro cell
                     Reference
 Floorplanned                    Standard cell
    design

 Technology file
Inside A Physical Library
   MACRO AN2D0
                                 Example
       CLASS CORE ;
       FOREIGN AN2D0 0.000 0.000 ;
       ORIGIN 0.000 0.000 ;
                                   .lef
                                    l f                              VDD
                                                                                    Dimension
                                                                                  “bounding box”
       SIZE 1.400 BY 2.520 ;
                                                                     A   B
       SYMMETRY x y ;
       SITE core ;                                Blockage
       PIN Z                                                                           Pins
           ANTENNADIFFAREA 0.1680 ;                                              (direction, layer
           DIRECTION OUTPUT ;
           PORT                                 Symmetry                     Y     and shape)
           LAYER M1 ;                          (X, Y, or 90º)    F
           RECT 1.300 0.640 1.330 1.675   ;
                                                                      NAND_1
           RECT 1.190 0.640 1.300 1.780   ;                          GND
           RECT 1.140 0.640 1.190 0.900   ;
                                              reference point    Abstract View
           RECT 1.140 1.520 1.190 1.780   ;
           END
                                               (typically 0,0)
       END Z
       PIN A2
           ANTENNAGATEAREA 0.0704 ;
           DIRECTION INPUT ;
           PORT
           LAYER M1 ;
           RECT 0.610 0.975 0.770 1.545   ;
           END
       …
Technology I f
T h l      Information
                  ti
  For each tool, a specific set of files are required to
  provide details about the metal layers for the chosen
  process technology…
     Number and name designations for each layer/via
     Physical d l t i l h
     Ph i l and electrical characteristics f each l
                                      t i ti for      h layer
     Dielectric constant
     Design rules for each layer (min spacing, min width,
     etc…) )
     Units and precision for numerical values
  Example filetypes
         p       yp
     .lefhdr, .tf -> contain layer and design rule
     information
  Also, there are files that enable improved RC estimation
  that can be read by the placement engines.
     .captable, .tluplus -> store RC coefficients.
Physical Technology D t
Ph i l T h l        Data
The technology files contain
                               LAYER M1


                                                                     Example
                                   TYPE ROUTING ;
                                   DIRECTION HORIZONTAL ;

design rule information that
                                   OFFSET 0 ;
                                   PITCH 0.280 ;



can be read by the tools
                                   WIDTH 0.120 ;
                                   MAXWIDTH 12.000 ;
                                   AREA 0.058 ;
                                                                      .lefhdr
                                   MINENCLOSEDAREA 0.200 ;
                                   THICKNESS 0.240 ;

  For example, the
       example                     HEIGHT 0.765 ;

                                   SPACINGTABLE

  spacing table constrains         PARALLELRUNLENGTH
                                   WIDTH
                                   WIDTH
                                            0.00
                                            0.30
                                                              0.00
                                                              0.12
                                                              0.12
                                                                         0.52
                                                                         0.12
                                                                         0.17
                                                                                 1.50
                                                                                 0.12
                                                                                 0.17
                                                                                        4.50
                                                                                        0.12
                                                                                        0.17

  the parallel runlength of    ;
                                   WIDTH
                                   WIDTH
                                            1.50
                                            4.50
                                                              0.12
                                                              0.12
                                                                         0.17
                                                                         0.17
                                                                                 0.50
                                                                                 0.50
                                                                                        0.50
                                                                                        1.50


  adjacent wires on the
    dj     t i        th           MINIMUMCUT
                                   MINIMUMCUT
                                                2
                                                4
                                                    WIDTH
                                                    WIDTH
                                                            0.42
                                                            0.98
                                                                   ;
                                                                   FROMABOVE ;

  same layer.
                                   MINIMUMCUT   2   WIDTH   0.70   LENGTH 0.70 WITHIN 1.001 ;
                                   MINIMUMCUT   2   WIDTH   2.00   LENGTH 2.00 WITHIN 2.001 ;
                                   MINIMUMCUT   2   WIDTH   3.00   LENGTH 10.0 WITHIN 5.001 ;


  Wire width and pitch are         MINIMUMDENSITY 15 ;
                                   MAXIMUMDENSITY 70 ;
                                   DENSITYCHECKWINDOW 50 50 ;

  also described, as well          DENSITYCHECKSTEP 50 ;
                                   FILLACTIVESPACING 0.60 ;


  as any more complex
  design rules for routing
                    routing.
Global d Detail Placement
Gl b l and D t il Pl    t
    Reading Gate-Level
               Gate Level
    Netlist from synthesis



      Global Placement

     Detailed Placement



   Placement optimization
   Pl      t ti i ti
Global Placement
Gl b l Pl      t

 Standard cells are placed into groups such
 that the number of connections between
 groups is minimized.
 This is solved through circuit partitioning
                                partitioning.




     Bad Placement          Good Placement
Detail Placement : Coarse
Placement
   Coarse Pl
   C      Placement
                  t

    All the cells are placed in the
   approximate locations b t th
           i t l      ti    but they
         are not legally placed



    No logic optimization is done
Detail Placement : L
D t il Pl      t Legalization
                     li ti


Legalization: Ensures that the
final placement is legal before
saving the design.




      Legal placement of cells is not required for analyzing routing
                    congestion at an early stage
                             ti    t        l t
Hard Macro Pl
H dM       Placement
                   t
 Hard macros are placed during the
 floorplanning stage and th marked as
 fl    l    i   t       d then   k d
 FIXED for placement.
 Typically, hard macros are placed near the
 sides of the core area.
Some Guidelines f Pl
  S    G id li    for Placement (2)
                              t
                              RAM 1   RAM 2   RAM 3




                              RAM 4   RAM 5   RAM 6

                                                                Avoid
                                                              constrictive
                                                               channels

Avoid many pins in
    the narrow                                        RAM 8
channel. Rotate for   RAM 7

 pin accessibility                                            Use blockage
                                                              to i
                                                              t improve pini
                                                               accessibility
Review of Placement Cost
Function
 Cost components          Methods of consideration
      Area
  Wire length        Traditional methods of Placement

    Overlap
     Timing               Timing-driven Placement

  Congestion           Congestion-driven Placement

     Clock                     Clock Gating

     Power         Multivoltage and Multisupply Placement
Timing Driven Pl
Ti i D i      Placement
                      t
 Critical paths are determined using static timing
          p                        g             g
 analysis (STA).
 Tool attempts to minimize wire length of critical
 paths to meet setup timing.
       Net RCs are based on Virtual
         Routing (VR) estimates
Virtual R t T i l R t
Vi t l Route / Trial Route
   Manhatten geometry        Virtual
                             Route

     Horizontal – Vertical
     NO diagonal routing
Congestion Driven Placement:
Detouring Routes
                                                 Congestion Map
  Issues with Congestion
                                                                Congestion
      If congestion is not too                                   hot spot
    severe, the actual route can
      be detoured around the
          congested area

   The detoured nets will have                    Detour
   worse RC delay compared to
        the VR estimates
                                       ≥2   ≥3     ≥4      ≥5     ≥6     ≥7



 In highly congested areas delay estimates during placement will
                     areas,
                          be optimistic.
Congestion M
C     ti Map
 No need to use -congestion                   Causes high local
       unnecessarily                              utilization
   By default, physical synthesis tools
  perform some congestion optimization
    which has a reasonable chance of
     providing acceptable congestion


  Congestion driven placement increases        Gives uniform density
                                               G        f
  the effort of algorithm to fix congestion

            On average –congestion option
              increases runtime by 20%


    For better correlation to post-route,
  congestion-driven placement s enabled
  co gest o d e p ace e t is e ab ed
      based on GR congestion map
Congestion Driven Placement:
Options
 Some Congestion: using medium effort congestion-
 driven
    Max
    M routing congestion > 90%
            ti          ti
    Large hot spots
 Bad Congestion: using high effort congestion-driven
    Max routing congestion >> 90%
    Very large hot spots
        y g         p
 Congestion-driven might affect timing negatively but
    Post-route numbers will not create surprises
    Lower congestion will speed up the detailed router
Modifying Physical Constraints
M dif i Ph i l C       t i t
Modifying Physical Constraints:
         Cell Density

       Cell density can be up to
                  y         p
            95% by default                       x2 y2

           Density level can also be
          applied to a specific region


        Lower cell density in
                                         x1 y1
      congested areas using –
         coordinate option
Modifying the Floorplan
M dif i th Fl       l
 Top level
 Top-level ports
   Changing to a different metal layer
   Spreading them out, re-ordering or moving to other
   sides
 Macro location or orientation
   Alignment of bus signal pins
   Increase of spacing between macros
 Core aspect ratio and size
        p
   Making block taller to add more horizontal routing
   resource
   Increase of the block size t reduce overall congestion
   I           f th bl k i to d               ll       ti
 Power grid: Fixing any routed or non-preferred layers
Congestion Driven vs. Timing
Driven Placement
 In general there is a direct trade-off
 between congestion and timing
              g                  g
   Timing-driven placement tries to shorten nets
   whereas congestion driven p
                g              placement tries to
   spread cells, thus lengthing nets.
 Iterative placement trials should be
           p
 performed to find a balance between the
 different tool options/settings.
                 p            g
Timing and Congestion
Optimization
 Some things that can be done for timing optimization…
      Adding deleting buffers
      Addi / d l ti b ff
      Resizing gates
      Restructuring the netlist
      Swapping pins
      Moving instances
            g
      Area recovery
 Congestion optimization tries to reduce local congestion
 hotspots.
      Generally if congestion exists after placement, little
      more can be done if area recovery is not significant
                    done,                          significant.
 It is essential that sufficient area is available for any
 optimizations that are required
Clock Tree S th i
Cl k T     Synthesis
            CTS
General Concept of Clock tree
synthesis
 y

    CLK                       CLK




    Unbuffered clock tree   Buffered/balanced clock tree


          Skew                 Area (#buffers)
          Power                     Slew rates
  + Minimize total insertion delay (latency)               71
Sources of skew
S        f k
 Not perfectly balanced clock tree
     p       y
   Different levels of buffering
   Different cells
   Different load due to routing
   Different RC delays
 Setting a skew constraint = 0 ps
 S
   Makes no sense
   Insertion delay (latency) will increase
   Power consumption will increase
   Area will increase
   Rule of thumb : skew values : 100 – 150 ps for 90 nm
Extra sources of clock skew : variability
                                        y

    Unwanted Skew Variations

     Process variations in clock buffers                        T             W
                                                                     S
             Power supply noise
                                                                H
           Temperature variations
                                                                Ground plane
.    part of the OCV (lecture 15)
.                                                 L effective
.                                   Gate length
    Gate width                            tox




                                                                         73
CTS in a design flow
  VLSI Design Steps                Simplified CTS Design Flow
         RTL             Clock
                         gating    Logical      Sequentials
                                  Clock Tree       ( ,y)
                                                   (x,y)
         Logic
       Synthesis
                                           Clock
                                          Buffering
       Physical
 Synthesis (Placement)
                                          Routing
                                         Clock Nets
         CTS
                                           Sizing
                                        Clock Buffers
       Routing
Prepare the netlist for CTS

 Analyze the clock trees
 Check the clocks
 Remove unwanted buffering
Remove unwanted b ff i
R           t d buffering

 Unnecessary pre-existing clock
 buffers/inverters
   remove_clock_tree
CTS : Goals
 Meeting the clock tree design rule
 constraints
                                             Constraints are upper
       Maximum transition delay
                                             bound goals. If constraints
       Maximum load capacitance              are not met, violations will
                                                   t   t i l ti       ill
       Maximum fanout                        be reported.
       [
       [Maximum buffer levels]
                             ]


                                           defaults

 Meeting the clock tree targets
       Maximum skew                            Highest priority
       Min/Max insertion delay (latency)
                                                               77
Effect of Clock Tree Synthesis
on placement
       Clock buffers added

    Congestion may increase

  Non clock cells may have been
   moved to less ideal locations

      Inserting clock trees can
  introduce new timing and max
         tran/cap violations
        “real” skew taken into
              account
Summary
Clock tree synthesis is one of the most
important steps of IC design and can have
a significant impact on timing power area
                        timing, power, area,
etc.
The l ki
Th clocking strategy h t b di
                t t    has to be discussedd
with the frontend people before CTS is
started
 t t d
  Clocks identification
  Clock dependencies
  Clock balancing
Routing
Overview

Routing fundamentals / Advanced issues
intro
The routing flow
Special topics for 90nm and below
Additional routing considerations
Summary
Physical Design Flow
          Physical Design Flow
            Design/Constraints Import

                 Floorplanning

                   Placement

              Clock Tree Synthesis

                    Routing
                          g

            Post Route Optimization

                    Finishing
                    Fi i hi
                                        82
Routing Fundamentals
Goal is to realize the metal/copper connections between the pins of
standard cells and macros
    Input :
        placed design
        fixed number of metal/copper layers
    Goal:
        routed design that is DRC clean and meets setup/hold timing

Consists of two phases
 1. Global route
         Standard
         cell pin
 2. Detail route


                                                                      Horizontal
                                                                      routing
                                                                      tracks
                                              Vertical
                                              routing
                                              tracks
Routing Fundamentals :
Advanced Issues
 Timing driven routing
   Timing budget for each net
   Minimize critical paths
 Signal integrity aware : 90nm and below !!!!
   Minimize crosstalk
 DFM / DFY
   DRC clean
   Rule based versus Model based
General Flow for Routing
           Placement and CTS


            Route Clock Nets


         Global Route Signal Nets


         Detail Route Signal Nets


         Design for Manufacturing
                  (DFM)


                    Geert Vanwijnsberghe - Affiliation   85
Global Route
                      Vertical routing
                      capacity = 9 tracks



                                            Y


Horizontal routing
capacity = 9 tracks



                                                X




                                  X
                                      Y




                                                    86
Global Route
 Input:
   Cell and macro placement
   Routing channel capacity per layer / per direction
 Goal:
   Perform fast, coarse grid routing through global routing
   cells (GCells) while considering the following:
      Wire length
      Congestion
      Timing
      Noise / SI
 Often used by placement engines to predict
 congestion in the form of a “trial ro te” or
                                    route”
 “virtual route”                              87
Global Route
        Global Route
   Assigns nets to specific metal layers
     and global routing cells (Gcells)                           global route

  Tries to avoid congested Gcells while
            minimizing detours
         Congestion exists when more tracks
             are needed than available
         Detours increase wire length (delay)

  Also avoids P/G (rings/straps/rails) and
            routing blockages                   Y
                                                      virtual route
                                                    X           congested area




                                                                    88
Global Route




       Preroute   Global route
                                 89
Detail Route
 Using global route plan, within each
 global route cell
   Assign nets to tracks
   Lay down wires
   L d        i
   Connect pins to corresponding nets
 Solve DRC violations
 Reduce cross couple cap
                    p     p
 Apply special routing rules

                                        90
Detail Route: Track Assignment

 For nets that
 traverse multiple
 GCells
 Assigns each net to
 a specific track and
 lays down the actual
 metal traces
 Makes long, straight
 traces and
 Reduces the number
                        Preroute TA metal traces Jog reduces via count
 of vias                                                     91
Detail route : Solve DRC Violations
 Solve
 shorts                Detail Route Boxes

 Notch
 Spacing




 Notch
 Spacing



 Thin&Fat
 Spacing




 Min
 Mi
 Spacing




                                            92
Detail Route: Analysis of Routing
DRC Errors




                                93
Timing Driven Routing

 At 90 Quality of route can effect timing
 nm net delay becomes significant

 Optimize critical paths
 Route some nets first
    Most routing freedom at start
    Use shortest paths possible
 Net weights
    Order of routing (priorities : eg. Default : Clocks 50,
    others 2)
 Wire id i
 Wi widening
    Reduce resistance
What is Signal Integrity or SI? (1)
         Signal delay caused by crosstalk noise
       Possible in 2 directions : push-out pull-down
                                  p        p




                                     net 1   Aggressor


                                     net 2   Victim




              Speed Up             Delay


                                                         95
What is SI? (2)
             Glitch caused by crosstalk noise



                 Aggressor

                                      Extra clock cycle!
                                       Functional Failure

       Vdd
                                                D     Q
                                  ^
                                                Clk
        Victim


                                                            96
Crosstalk Prevention : Design
Optimization
 Noise depends on
   Coupling capacitance
   Total net capacitance
   Strength of the driver (Rd of the victim net)
 Design optimization
   Increase drive strength often easier (only
                  strength,
   local effect)
   Buffer long nets
Crosstalk Prevention : Routing
         Routing solution
    Limit length of parallel nets (H&V)
    Wire spreading (skip track - clocks)
    Shield special nets

 Coupling free routing




                                           98
Crosstalk Prevention : Reduce
Cross Coupling Cap
                                             Critical Nets
 Extra space   Grounded shields




  Spacing          Shielding
                Same layer (H)
               Adjacent layers (V)   Net Ordering



                                                99
Effect of Floorplanning on Routing
Congestion
For hierarchical designs, good pin
p
placement is essential to p
                          preventing
                                   g
routing congestion.
  Can use pin guides during partitioning
Routing around blockages and over
macros
 By default routing tool will:

        Route over macros
                                              M1- M4 Routing Blockage
 Not route where there is a routing
             blockage

    Not route through a narrow                M1- M3 Routing Blockage
   channel in the non-preferred
                  non preferred
         routing direction

                                              M1- M4 Routing Blockage

              M4 has a horizontal routing
               channel but its preferred
              routing direction is vertical
                                                      Macro

       The preferred routing direction needs to be changed
Clock Tree Routing
 For SI prevention we generally want to route
 our clocks with extra spacing
                       spacing.
 Global H-trees are often routed manually
 before placement
   Htree nets may be routed with wide-metal and
   shielding.                Wide metal H Tree
                             Wide-metal H-Tree net




                                                     102
                                  Grounded shields
Post Route Clock Tree
Optimization (CTO)
improve the skew on clock nets


       Detail Routed                  Before CTO
          Design

Yes
        Skew OK?                             Short
                                             path
              No
      Postroute CTO
        ECO Route
                                     After CTO


                                 Increased
                                   delay
Options for CPU effort
O ti    f        ff t

 # processors
   Routing in parallel on # processors
   Superthreading, multithreading
   Some routers are better a threading than
   others

 # iterations for detail route
   # of iteration steps done to get a DRC free
   design
Summary
Starting from 90 nm technologies
  Timing Driven Route
    net delay is becoming more of a factor
  SI Aware Route
    Small geometries make SI timing closure much
    more difficult
  DFM / DFY
    Now a crucial part of the routing flow
  DRC
    Number and complexity of DRC rules has
    increased dramatically

More Related Content

What's hot

VLSI Physical Design Flow(http://www.vlsisystemdesign.com)
VLSI Physical Design Flow(http://www.vlsisystemdesign.com)VLSI Physical Design Flow(http://www.vlsisystemdesign.com)
VLSI Physical Design Flow(http://www.vlsisystemdesign.com)VLSI SYSTEM Design
 
Multi mode multi corner (mmmc)
Multi mode multi corner (mmmc)Multi mode multi corner (mmmc)
Multi mode multi corner (mmmc)shaik sharief
 
Study of inter and intra chip variations
Study of inter and intra chip variationsStudy of inter and intra chip variations
Study of inter and intra chip variationsRajesh M
 
Flip Chip technology
Flip Chip technologyFlip Chip technology
Flip Chip technologyMantra VLSI
 
ASIC Design Flow | Physical Design | VLSI
ASIC Design Flow | Physical Design | VLSI ASIC Design Flow | Physical Design | VLSI
ASIC Design Flow | Physical Design | VLSI Jayant Suthar
 
vlsi design flow
vlsi design flowvlsi design flow
vlsi design flowAnish Gupta
 
Timing and Design Closure in Physical Design Flows
Timing and Design Closure in Physical Design Flows Timing and Design Closure in Physical Design Flows
Timing and Design Closure in Physical Design Flows Olivier Coudert
 
Implementing Useful Clock Skew Using Skew Groups
Implementing Useful Clock Skew Using Skew GroupsImplementing Useful Clock Skew Using Skew Groups
Implementing Useful Clock Skew Using Skew GroupsM Mei
 
Logic synthesis with synopsys design compiler
Logic synthesis with synopsys design compilerLogic synthesis with synopsys design compiler
Logic synthesis with synopsys design compilernaeemtayyab
 
Power Reduction Techniques
Power Reduction TechniquesPower Reduction Techniques
Power Reduction TechniquesRajesh M
 
Timing closure document
Timing closure documentTiming closure document
Timing closure documentAlan Tran
 
Placement in VLSI Design
Placement in VLSI DesignPlacement in VLSI Design
Placement in VLSI DesignTeam-VLSI-ITMU
 
Sta by usha_mehta
Sta by usha_mehtaSta by usha_mehta
Sta by usha_mehtaUsha Mehta
 
Understanding cts log_messages
Understanding cts log_messagesUnderstanding cts log_messages
Understanding cts log_messagesMujahid Mohammed
 

What's hot (20)

VLSI Physical Design Flow(http://www.vlsisystemdesign.com)
VLSI Physical Design Flow(http://www.vlsisystemdesign.com)VLSI Physical Design Flow(http://www.vlsisystemdesign.com)
VLSI Physical Design Flow(http://www.vlsisystemdesign.com)
 
Eco
EcoEco
Eco
 
STA.pdf
STA.pdfSTA.pdf
STA.pdf
 
Multi mode multi corner (mmmc)
Multi mode multi corner (mmmc)Multi mode multi corner (mmmc)
Multi mode multi corner (mmmc)
 
Physical design
Physical design Physical design
Physical design
 
Study of inter and intra chip variations
Study of inter and intra chip variationsStudy of inter and intra chip variations
Study of inter and intra chip variations
 
Clock Tree Synthesis.pdf
Clock Tree Synthesis.pdfClock Tree Synthesis.pdf
Clock Tree Synthesis.pdf
 
Flip Chip technology
Flip Chip technologyFlip Chip technology
Flip Chip technology
 
ASIC Design Flow | Physical Design | VLSI
ASIC Design Flow | Physical Design | VLSI ASIC Design Flow | Physical Design | VLSI
ASIC Design Flow | Physical Design | VLSI
 
Powerplanning
PowerplanningPowerplanning
Powerplanning
 
vlsi design flow
vlsi design flowvlsi design flow
vlsi design flow
 
Timing and Design Closure in Physical Design Flows
Timing and Design Closure in Physical Design Flows Timing and Design Closure in Physical Design Flows
Timing and Design Closure in Physical Design Flows
 
Pd flow i
Pd flow iPd flow i
Pd flow i
 
Implementing Useful Clock Skew Using Skew Groups
Implementing Useful Clock Skew Using Skew GroupsImplementing Useful Clock Skew Using Skew Groups
Implementing Useful Clock Skew Using Skew Groups
 
Logic synthesis with synopsys design compiler
Logic synthesis with synopsys design compilerLogic synthesis with synopsys design compiler
Logic synthesis with synopsys design compiler
 
Power Reduction Techniques
Power Reduction TechniquesPower Reduction Techniques
Power Reduction Techniques
 
Timing closure document
Timing closure documentTiming closure document
Timing closure document
 
Placement in VLSI Design
Placement in VLSI DesignPlacement in VLSI Design
Placement in VLSI Design
 
Sta by usha_mehta
Sta by usha_mehtaSta by usha_mehta
Sta by usha_mehta
 
Understanding cts log_messages
Understanding cts log_messagesUnderstanding cts log_messages
Understanding cts log_messages
 

Similar to Physical Design Flow - Power Planning

Implementation strategies for digital ics
Implementation strategies for digital icsImplementation strategies for digital ics
Implementation strategies for digital icsaroosa khan
 
ASIC DESIGN OF MINI-STEREO DIGITAL AUDIO PROCESSOR UNDER SMIC 180NM TECHNOLOGY
ASIC DESIGN OF MINI-STEREO DIGITAL AUDIO PROCESSOR UNDER SMIC 180NM TECHNOLOGYASIC DESIGN OF MINI-STEREO DIGITAL AUDIO PROCESSOR UNDER SMIC 180NM TECHNOLOGY
ASIC DESIGN OF MINI-STEREO DIGITAL AUDIO PROCESSOR UNDER SMIC 180NM TECHNOLOGYIlango Jeyasubramanian
 
ASIC design Flow (Digital Design)
ASIC design Flow (Digital Design)ASIC design Flow (Digital Design)
ASIC design Flow (Digital Design)Sudhanshu Janwadkar
 
Trends and challenges in IP based SOC design
Trends and challenges in IP based SOC designTrends and challenges in IP based SOC design
Trends and challenges in IP based SOC designAishwaryaRavishankar8
 
Chapter_01 Course Introduction.pdf
Chapter_01 Course Introduction.pdfChapter_01 Course Introduction.pdf
Chapter_01 Course Introduction.pdfVoThanhPhong3
 
Summer training vhdl
Summer training vhdlSummer training vhdl
Summer training vhdlArshit Rai
 
Placement and routing in full custom physical design
Placement and routing in full custom physical designPlacement and routing in full custom physical design
Placement and routing in full custom physical designDeiptii Das
 
Summer training vhdl
Summer training vhdlSummer training vhdl
Summer training vhdlArshit Rai
 
An Introduction to Field Programmable Gate Arrays
An Introduction to Field Programmable Gate ArraysAn Introduction to Field Programmable Gate Arrays
An Introduction to Field Programmable Gate ArraysKingshukDas35
 
Ankita Gloria Kerketta (3)
Ankita Gloria Kerketta (3)Ankita Gloria Kerketta (3)
Ankita Gloria Kerketta (3)rbvrfbv fbv gf
 
Complete ASIC design flow - VLSI UNIVERSE
Complete ASIC design flow - VLSI UNIVERSEComplete ASIC design flow - VLSI UNIVERSE
Complete ASIC design flow - VLSI UNIVERSEVLSIUNIVERSE
 
IC reverse engineering
IC reverse engineeringIC reverse engineering
IC reverse engineeringhelloseo1
 

Similar to Physical Design Flow - Power Planning (20)

Implementation strategies for digital ics
Implementation strategies for digital icsImplementation strategies for digital ics
Implementation strategies for digital ics
 
ASIC DESIGN OF MINI-STEREO DIGITAL AUDIO PROCESSOR UNDER SMIC 180NM TECHNOLOGY
ASIC DESIGN OF MINI-STEREO DIGITAL AUDIO PROCESSOR UNDER SMIC 180NM TECHNOLOGYASIC DESIGN OF MINI-STEREO DIGITAL AUDIO PROCESSOR UNDER SMIC 180NM TECHNOLOGY
ASIC DESIGN OF MINI-STEREO DIGITAL AUDIO PROCESSOR UNDER SMIC 180NM TECHNOLOGY
 
ASIC design Flow (Digital Design)
ASIC design Flow (Digital Design)ASIC design Flow (Digital Design)
ASIC design Flow (Digital Design)
 
Trends and challenges in IP based SOC design
Trends and challenges in IP based SOC designTrends and challenges in IP based SOC design
Trends and challenges in IP based SOC design
 
Chapter_01 Course Introduction.pdf
Chapter_01 Course Introduction.pdfChapter_01 Course Introduction.pdf
Chapter_01 Course Introduction.pdf
 
Summer training vhdl
Summer training vhdlSummer training vhdl
Summer training vhdl
 
Placement and routing in full custom physical design
Placement and routing in full custom physical designPlacement and routing in full custom physical design
Placement and routing in full custom physical design
 
Digital_system_design_A (1).ppt
Digital_system_design_A (1).pptDigital_system_design_A (1).ppt
Digital_system_design_A (1).ppt
 
Summer training vhdl
Summer training vhdlSummer training vhdl
Summer training vhdl
 
An Introduction to Field Programmable Gate Arrays
An Introduction to Field Programmable Gate ArraysAn Introduction to Field Programmable Gate Arrays
An Introduction to Field Programmable Gate Arrays
 
CASFPGA1.ppt
CASFPGA1.pptCASFPGA1.ppt
CASFPGA1.ppt
 
Ankita Gloria Kerketta (3)
Ankita Gloria Kerketta (3)Ankita Gloria Kerketta (3)
Ankita Gloria Kerketta (3)
 
Chapter1.slides
Chapter1.slidesChapter1.slides
Chapter1.slides
 
Complete ASIC design flow - VLSI UNIVERSE
Complete ASIC design flow - VLSI UNIVERSEComplete ASIC design flow - VLSI UNIVERSE
Complete ASIC design flow - VLSI UNIVERSE
 
Vlsi design-styles
Vlsi design-stylesVlsi design-styles
Vlsi design-styles
 
Vlsi design flow
Vlsi design flowVlsi design flow
Vlsi design flow
 
Introduction to EDA Tools
Introduction to EDA ToolsIntroduction to EDA Tools
Introduction to EDA Tools
 
Asic
AsicAsic
Asic
 
IC reverse engineering
IC reverse engineeringIC reverse engineering
IC reverse engineering
 
ASCIC.ppt
ASCIC.pptASCIC.ppt
ASCIC.ppt
 

Physical Design Flow - Power Planning

  • 1. Physical Design Flow Mohammad reza Kakoee micrellab m.kakoee@unibo.it @
  • 2. Agenda Introduction to design flow and Backend Introduction to design planning Floorplanning / Hierarchical design Power planning P l i Summary
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. Agenda Introduction to design flow and Backend Introduction to design planning Floorplanning / Hierarchical design Power planning P l i Summary
  • 12. The Physical Design Task Physical Design Verilog netlist Flow GDSII SDC constraints Front End Back End
  • 13. Example Physical Design Flow Design/Constraints Import Floorplanning p g Placement Clock Tree S th i Cl k T Synthesis Routing Post Route Optimization Layout Verification / Finishing
  • 14. Fullchip Design Overview Core placement area The location of the core, I/O areas P/G pads and the P/G grid RAM IP Rings P/G ROM Grid Straps Periphery (I/O) area
  • 15. Where Do We Start? - Design Planning Verilog netlist Physical Design Flow How do we handle? SDC constraints Die size IO / Hard-IP placement Global clock distribution Power planning P l i Flat versus hierarchical design
  • 16. Design Planning Floorplanning Determine die size Shape and arrange hierarchical blocks Integrate hard-IP efficiently Predict and prevent congestion hotspots and critical timing paths Power planning Create power distribution grid Consider IR drop and Electromigration Implement power saving techniques Power gating g g Multi-Voltage design / Voltage islands
  • 17. Agenda Introduction to design planning Floorplanning p g Setup/configuration Die size, utilization, metallization scheme size utilization IO-ring and macro placement Flat versus hierarchical design Hierarchical design planning issues Power planning Summary
  • 18. Setup/configuration S t / fi ti check netlist Read netlist High fanout Read SDC Unique U i Read .lib files Unconnected inputs Standard cell area Read footprint for P&R p Check timing ith t i load Ch k ti i without wire l d LEF : SOC encounter Fram : Synopsys tools Read technology file Metal width … (DRC rules)
  • 19. Floorplanning – Die Size Size, Utilization & Metal Stack-up Choosing the die size, initial standard cell utilization and metallization scheme involves several design tradeoffs ( Schedule, Cost, Performance) Larger die Easier to route, less congestion, lower cap (decrease signal/power integrity related problems) faster design problems), cycle Higher cost, higher power More d M dense power grid id Reduce risk of power related failures Increase number of metal layer masks, reduce signal route tracks
  • 20. Floorplanning – Utilization Low standard-cell High standard-cell utilization tili ti utilization
  • 21. Floorplanning – Utilization Utilization refers to the percentage of core area that is taken up by standard cells. A typical starting utilization might be 70% This can very a lot depending on the design High utilization can make it difficult to close a design Routing congestion, Negative impact during optimization legalization stages. Utilization changes should be examined after each stage of g g the flow Avoid having large increases after placement optimization Feedback should be given to front-end designers front end Topographical synthesis is now possible
  • 22. Initialize Floorplan Define globals (VDD1,VDD2,GND1,….) Define D fi core area : ( ll + utilization f (cells ili i factor) ) IO [ [Analog] macro g] core core IO Shape can be implied by a macro Place IO (fixed, equidistant,..) Take macro’s and power domains into account already
  • 23. IO Ring and Large Macro Placement IO Ring is often decided by front-end designers, with input from physical design and packaging engineers. When placing large macros we must consider impacts on routing, timing and power. For wire-bond place power hungry macros away from the chip center. Possible routing congestion hotspots
  • 24. Flat Versus Hierarchical Design What happens if the design is too big to be handled by the EDA tools? y Hierarchical Design Fullchip Design I/O Pad IP Macro Blk 1 Blk 2 Blk 3 Block / Tile P&R P&R P&R Flow Flow Flow Fullchip Timing & Verification
  • 25. Flat Versus Hierarchical Design Hierarchical Design Advantages Faster runtime, less memory needed for EDA tools Faster eco turn-around time Ability to do design re use re-use Disadvantages Much more difficult for fullchip timing closure (ILMs) More intensive design planning needed, feedthrough generation repeater insertion timing generation, insertion, constraint budgeting.
  • 26. Hierarchical Design : Specify Partitions / Plan Groups Netlist must have partitions as top level modules. Partitions generally sized according to a target initial utilization ~70% utilization, ~300k-700k instances Channels or abutment Ch l b t t Rectilinear block shapes are possible Abutment Channels Rectilinear Blocks
  • 27. Hierarchical Design : Pin Assignment Pin constraints include parameters such as, Pin guide 1 Layers, spacing, size, overlap Net groups, pin guides Pin guide 2 Pins can be assigned placement-based placement based (flightlines) or route-based (trial route, boundary crossings). Partition Pin guides can be used to influence automatic pin placement of particular net groups Pins at partition corners can make routing diffi lt ti difficult
  • 28. Hierarchical Design : Timing Budgeting Chip level constraints must be mapped correctly to block level constraints The d i Th design must b placed, t i l routed and h t be l d trial t d d have pinsi assigned before running budgeting Block level constraints will be assigned input or output delays on I/O ports based off of the estimated timing slack. IN1 set_input_delay 1.5 get port set input delay 1 5 [ get_port IN1 ] 1.5ns Block Boundary
  • 29. Hierarchical Design : Timing Budgeting & Fullchip Timing Closure Fullchip timing closure is typically a bottleneck for design cycles. Block-level P&R flow does not emphasize io-to-flop, flop-to-io, io-to-io timing paths because budgeted constraints are only estimates paths, Interface logic models (ILMs) can be used To speed-up timing analysis runs when fullchip design is too large. Required clock and datapaths are p q p preserved, net/cell names are , identical A X A X B Y B Y Clk Clk Original Netlist Interface Logic Model (ILM)
  • 30. Agenda Introduction to design planning Floorplanning Power planning Intro to power issues in IC design Basic power grid creation Multi-voltage Multi voltage design & power gating Automated power grid design flows Summary
  • 31. Power Consumption and Reliability Dynamic Power IR Drop IR-Drop / Voltage Drop Average Power p ob e problem Static Power Fail (Leakage Power) Electromigration Power density (EM) Floorplan problem in the + Long run Design of the grid 1 out of 5 chips fail due to excessive power consumption
  • 32. Power Consumption and Reliability : IR-Drop The drop in supply voltage over the length of the supply line A resistance matrix of the power grid is constructed The average current of each g g gate is considered The matrix is solved for the current at each node, to determine the IR-drop. VDD Pad VDD
  • 33. Where does the all power go to? Total Power Core + I/O •Separate supply ring •Often higher voltage •Fixed, no optimization Standard Cells + Macros •Clock network
  • 34. Agenda Introduction to design planning Floorplanning Power planning Intro to power issues in IC design p g Basic power grid creation Multi-voltage design & power gating Automated power grid design flows Summary
  • 35. Power Grid Creation : Macro Placement Blocks with the highest performance and highest power consumption Close to border power pads (IR drop) Away from each other (EM)
  • 36. Agenda Introduction to design planning Floorplanning Power planning Intro to power issues in IC design p g Basic power grid creation Multi-voltage design & power gating Automated power grid design flows Summary
  • 37. Agenda Introduction to design planning Floorplanning p g Power planning Intro to power issues in IC design Basic power grid creation Multi-voltage design & p g g power g gating g Automated power grid design flows Summary y
  • 38. Automated Power Grid Design: PNS & PNA Power grid creation has usually done by hand using rules of thumb for widths and number of straps Analysis often done late in the design flow Grid is typically over-designed to prevent time- intensive power grid changes. When incorporating advanced low-power strategies, there are too many variables to achieve an optimal result manually. For more complex designs an automated strategy is preferred. e.g e g Power Network Synthesis (PNS) and Power Network Analysis (PNA) from Synopsys Allows designers to anticipate affects of floorplanning
  • 39. Power Network Analysis (PNA) P N t kA l i There are EDA tools that allow early power network analysis for designs in the early floorplaning stage. Not i N t signoff quality, b t good enough f i iti l d i ff lit but d h for initial design. e.g. Synopsys Power Network Analysis (PNA) VDD Pad VDD
  • 40. Power Network Synthesis: PNS – What? Goal is to QUICKLY find minimum routing resource required to meet specified IR drop target More power routing => easier to reach IR-drop target, but harder to route clock and signals with remaining tracks Power straps (in Red) Power pads Power trunks Power rings
  • 41. PNS : Running PNS Trials Run PNS
  • 42. PNS : C t P Create Power R ti Routing After running trials, an optimal p g , p power g can be chosen and the grid actual rails can be laid out. Virtual rails => actual rails Outside main PNS : memory footprint + cpu time Many options : eg. % Via penetration , order of routing … Check legal cell/pin placement (grid aligned ?) Depending on the design p p g g phase What cells, nets and layers eg. First macros and pads, then high voltage areas, … Seco da y G ports on e e shftrs, so cells, et egs Secondary PG po ts o level s t s, isol. ce s, ret. Regs Later after placement during routing : same as the follow pins for the normal vdd and gnd of the std cells.
  • 43. PNS : Create P C t Power Routing R ti
  • 44. Summary The goal of design p g g planning is to arrange the chip so that the “Place and g g p Route” flow can converge quickly and easily. Design experience is needed Floorplan is driven by : Power P Timing Congestion Minimum area There is no 1 way to create a floorplan Flat – hierarchical Regions, p g , position of the macro’s Order of placement IO versus macros versus core This phase can take a significant portion of the complete backend design time. Early E l analysis of power grid i essential f avoiding major problems near l i f id is ti l for idi j bl the end of the design cycle. Automated power grid tools may help reduce necessary safety margins.
  • 46. Placement in the Flow Design Specification Front-End d Logic Design and Verification F Logic Synthesis Physical Libraries Floorplanning ack-End Physical Netlist Placement Design Stage g Ba Routing Physical Design Constraints
  • 47. Definition f Placement D fi iti of Pl t Placement : Exact placement of the modules (modules can be gates, standard cells, macros…). cells macros ) The general goal is to minimize the total area and interconnect cost. cost The quality of the attainable routing is highly determined b th placement. d t i d by the l t Circuit placement becomes very critical in 90nm and below technologies.
  • 48. Cost Function for Placement C tF ti f Pl t Cost components Methods of consideration Area Wire length Traditional methods of Placement Overlap Timing Timing-driven Timing driven Placement Congestion Congestion-driven Placement Clock Clock Gating Power Multivoltage and Multisupply Placement
  • 49. Placement Steps p Input information: Netlist Mapped and floorplanned design Logical and physical libraries Design constraints Reading Gate level netlists from synthesis Gate-level Global placement Detailed l D il d placement Placement optimization Output information: Physical layout information Cell placement locations Physical layout timing and technology information of reference libraries layout, timing,
  • 50. Inputs for the Placement Tool Gate-level netlist Design constraints Logical Target Placement Design libraries tool Physical Macro cell Reference Floorplanned Standard cell design Technology file
  • 51. Inside A Physical Library MACRO AN2D0 Example CLASS CORE ; FOREIGN AN2D0 0.000 0.000 ; ORIGIN 0.000 0.000 ; .lef l f VDD Dimension “bounding box” SIZE 1.400 BY 2.520 ; A B SYMMETRY x y ; SITE core ; Blockage PIN Z Pins ANTENNADIFFAREA 0.1680 ; (direction, layer DIRECTION OUTPUT ; PORT Symmetry Y and shape) LAYER M1 ; (X, Y, or 90º) F RECT 1.300 0.640 1.330 1.675 ; NAND_1 RECT 1.190 0.640 1.300 1.780 ; GND RECT 1.140 0.640 1.190 0.900 ; reference point Abstract View RECT 1.140 1.520 1.190 1.780 ; END (typically 0,0) END Z PIN A2 ANTENNAGATEAREA 0.0704 ; DIRECTION INPUT ; PORT LAYER M1 ; RECT 0.610 0.975 0.770 1.545 ; END …
  • 52. Technology I f T h l Information ti For each tool, a specific set of files are required to provide details about the metal layers for the chosen process technology… Number and name designations for each layer/via Physical d l t i l h Ph i l and electrical characteristics f each l t i ti for h layer Dielectric constant Design rules for each layer (min spacing, min width, etc…) ) Units and precision for numerical values Example filetypes p yp .lefhdr, .tf -> contain layer and design rule information Also, there are files that enable improved RC estimation that can be read by the placement engines. .captable, .tluplus -> store RC coefficients.
  • 53. Physical Technology D t Ph i l T h l Data The technology files contain LAYER M1 Example TYPE ROUTING ; DIRECTION HORIZONTAL ; design rule information that OFFSET 0 ; PITCH 0.280 ; can be read by the tools WIDTH 0.120 ; MAXWIDTH 12.000 ; AREA 0.058 ; .lefhdr MINENCLOSEDAREA 0.200 ; THICKNESS 0.240 ; For example, the example HEIGHT 0.765 ; SPACINGTABLE spacing table constrains PARALLELRUNLENGTH WIDTH WIDTH 0.00 0.30 0.00 0.12 0.12 0.52 0.12 0.17 1.50 0.12 0.17 4.50 0.12 0.17 the parallel runlength of ; WIDTH WIDTH 1.50 4.50 0.12 0.12 0.17 0.17 0.50 0.50 0.50 1.50 adjacent wires on the dj t i th MINIMUMCUT MINIMUMCUT 2 4 WIDTH WIDTH 0.42 0.98 ; FROMABOVE ; same layer. MINIMUMCUT 2 WIDTH 0.70 LENGTH 0.70 WITHIN 1.001 ; MINIMUMCUT 2 WIDTH 2.00 LENGTH 2.00 WITHIN 2.001 ; MINIMUMCUT 2 WIDTH 3.00 LENGTH 10.0 WITHIN 5.001 ; Wire width and pitch are MINIMUMDENSITY 15 ; MAXIMUMDENSITY 70 ; DENSITYCHECKWINDOW 50 50 ; also described, as well DENSITYCHECKSTEP 50 ; FILLACTIVESPACING 0.60 ; as any more complex design rules for routing routing.
  • 54. Global d Detail Placement Gl b l and D t il Pl t Reading Gate-Level Gate Level Netlist from synthesis Global Placement Detailed Placement Placement optimization Pl t ti i ti
  • 55. Global Placement Gl b l Pl t Standard cells are placed into groups such that the number of connections between groups is minimized. This is solved through circuit partitioning partitioning. Bad Placement Good Placement
  • 56. Detail Placement : Coarse Placement Coarse Pl C Placement t All the cells are placed in the approximate locations b t th i t l ti but they are not legally placed No logic optimization is done
  • 57. Detail Placement : L D t il Pl t Legalization li ti Legalization: Ensures that the final placement is legal before saving the design. Legal placement of cells is not required for analyzing routing congestion at an early stage ti t l t
  • 58. Hard Macro Pl H dM Placement t Hard macros are placed during the floorplanning stage and th marked as fl l i t d then k d FIXED for placement. Typically, hard macros are placed near the sides of the core area.
  • 59. Some Guidelines f Pl S G id li for Placement (2) t RAM 1 RAM 2 RAM 3 RAM 4 RAM 5 RAM 6 Avoid constrictive channels Avoid many pins in the narrow RAM 8 channel. Rotate for RAM 7 pin accessibility Use blockage to i t improve pini accessibility
  • 60. Review of Placement Cost Function Cost components Methods of consideration Area Wire length Traditional methods of Placement Overlap Timing Timing-driven Placement Congestion Congestion-driven Placement Clock Clock Gating Power Multivoltage and Multisupply Placement
  • 61. Timing Driven Pl Ti i D i Placement t Critical paths are determined using static timing p g g analysis (STA). Tool attempts to minimize wire length of critical paths to meet setup timing. Net RCs are based on Virtual Routing (VR) estimates
  • 62. Virtual R t T i l R t Vi t l Route / Trial Route Manhatten geometry Virtual Route Horizontal – Vertical NO diagonal routing
  • 63. Congestion Driven Placement: Detouring Routes Congestion Map Issues with Congestion Congestion If congestion is not too hot spot severe, the actual route can be detoured around the congested area The detoured nets will have Detour worse RC delay compared to the VR estimates ≥2 ≥3 ≥4 ≥5 ≥6 ≥7 In highly congested areas delay estimates during placement will areas, be optimistic.
  • 64. Congestion M C ti Map No need to use -congestion Causes high local unnecessarily utilization By default, physical synthesis tools perform some congestion optimization which has a reasonable chance of providing acceptable congestion Congestion driven placement increases Gives uniform density G f the effort of algorithm to fix congestion On average –congestion option increases runtime by 20% For better correlation to post-route, congestion-driven placement s enabled co gest o d e p ace e t is e ab ed based on GR congestion map
  • 65. Congestion Driven Placement: Options Some Congestion: using medium effort congestion- driven Max M routing congestion > 90% ti ti Large hot spots Bad Congestion: using high effort congestion-driven Max routing congestion >> 90% Very large hot spots y g p Congestion-driven might affect timing negatively but Post-route numbers will not create surprises Lower congestion will speed up the detailed router
  • 66. Modifying Physical Constraints M dif i Ph i l C t i t Modifying Physical Constraints: Cell Density Cell density can be up to y p 95% by default x2 y2 Density level can also be applied to a specific region Lower cell density in x1 y1 congested areas using – coordinate option
  • 67. Modifying the Floorplan M dif i th Fl l Top level Top-level ports Changing to a different metal layer Spreading them out, re-ordering or moving to other sides Macro location or orientation Alignment of bus signal pins Increase of spacing between macros Core aspect ratio and size p Making block taller to add more horizontal routing resource Increase of the block size t reduce overall congestion I f th bl k i to d ll ti Power grid: Fixing any routed or non-preferred layers
  • 68. Congestion Driven vs. Timing Driven Placement In general there is a direct trade-off between congestion and timing g g Timing-driven placement tries to shorten nets whereas congestion driven p g placement tries to spread cells, thus lengthing nets. Iterative placement trials should be p performed to find a balance between the different tool options/settings. p g
  • 69. Timing and Congestion Optimization Some things that can be done for timing optimization… Adding deleting buffers Addi / d l ti b ff Resizing gates Restructuring the netlist Swapping pins Moving instances g Area recovery Congestion optimization tries to reduce local congestion hotspots. Generally if congestion exists after placement, little more can be done if area recovery is not significant done, significant. It is essential that sufficient area is available for any optimizations that are required
  • 70. Clock Tree S th i Cl k T Synthesis CTS
  • 71. General Concept of Clock tree synthesis y CLK CLK Unbuffered clock tree Buffered/balanced clock tree Skew Area (#buffers) Power Slew rates + Minimize total insertion delay (latency) 71
  • 72. Sources of skew S f k Not perfectly balanced clock tree p y Different levels of buffering Different cells Different load due to routing Different RC delays Setting a skew constraint = 0 ps S Makes no sense Insertion delay (latency) will increase Power consumption will increase Area will increase Rule of thumb : skew values : 100 – 150 ps for 90 nm
  • 73. Extra sources of clock skew : variability y Unwanted Skew Variations Process variations in clock buffers T W S Power supply noise H Temperature variations Ground plane . part of the OCV (lecture 15) . L effective . Gate length Gate width tox 73
  • 74. CTS in a design flow VLSI Design Steps Simplified CTS Design Flow RTL Clock gating Logical Sequentials Clock Tree ( ,y) (x,y) Logic Synthesis Clock Buffering Physical Synthesis (Placement) Routing Clock Nets CTS Sizing Clock Buffers Routing
  • 75. Prepare the netlist for CTS Analyze the clock trees Check the clocks Remove unwanted buffering
  • 76. Remove unwanted b ff i R t d buffering Unnecessary pre-existing clock buffers/inverters remove_clock_tree
  • 77. CTS : Goals Meeting the clock tree design rule constraints Constraints are upper Maximum transition delay bound goals. If constraints Maximum load capacitance are not met, violations will t t i l ti ill Maximum fanout be reported. [ [Maximum buffer levels] ] defaults Meeting the clock tree targets Maximum skew Highest priority Min/Max insertion delay (latency) 77
  • 78. Effect of Clock Tree Synthesis on placement Clock buffers added Congestion may increase Non clock cells may have been moved to less ideal locations Inserting clock trees can introduce new timing and max tran/cap violations “real” skew taken into account
  • 79. Summary Clock tree synthesis is one of the most important steps of IC design and can have a significant impact on timing power area timing, power, area, etc. The l ki Th clocking strategy h t b di t t has to be discussedd with the frontend people before CTS is started t t d Clocks identification Clock dependencies Clock balancing
  • 81. Overview Routing fundamentals / Advanced issues intro The routing flow Special topics for 90nm and below Additional routing considerations Summary
  • 82. Physical Design Flow Physical Design Flow Design/Constraints Import Floorplanning Placement Clock Tree Synthesis Routing g Post Route Optimization Finishing Fi i hi 82
  • 83. Routing Fundamentals Goal is to realize the metal/copper connections between the pins of standard cells and macros Input : placed design fixed number of metal/copper layers Goal: routed design that is DRC clean and meets setup/hold timing Consists of two phases 1. Global route Standard cell pin 2. Detail route Horizontal routing tracks Vertical routing tracks
  • 84. Routing Fundamentals : Advanced Issues Timing driven routing Timing budget for each net Minimize critical paths Signal integrity aware : 90nm and below !!!! Minimize crosstalk DFM / DFY DRC clean Rule based versus Model based
  • 85. General Flow for Routing Placement and CTS Route Clock Nets Global Route Signal Nets Detail Route Signal Nets Design for Manufacturing (DFM) Geert Vanwijnsberghe - Affiliation 85
  • 86. Global Route Vertical routing capacity = 9 tracks Y Horizontal routing capacity = 9 tracks X X Y 86
  • 87. Global Route Input: Cell and macro placement Routing channel capacity per layer / per direction Goal: Perform fast, coarse grid routing through global routing cells (GCells) while considering the following: Wire length Congestion Timing Noise / SI Often used by placement engines to predict congestion in the form of a “trial ro te” or route” “virtual route” 87
  • 88. Global Route Global Route Assigns nets to specific metal layers and global routing cells (Gcells) global route Tries to avoid congested Gcells while minimizing detours Congestion exists when more tracks are needed than available Detours increase wire length (delay) Also avoids P/G (rings/straps/rails) and routing blockages Y virtual route X congested area 88
  • 89. Global Route Preroute Global route 89
  • 90. Detail Route Using global route plan, within each global route cell Assign nets to tracks Lay down wires L d i Connect pins to corresponding nets Solve DRC violations Reduce cross couple cap p p Apply special routing rules 90
  • 91. Detail Route: Track Assignment For nets that traverse multiple GCells Assigns each net to a specific track and lays down the actual metal traces Makes long, straight traces and Reduces the number Preroute TA metal traces Jog reduces via count of vias 91
  • 92. Detail route : Solve DRC Violations Solve shorts Detail Route Boxes Notch Spacing Notch Spacing Thin&Fat Spacing Min Mi Spacing 92
  • 93. Detail Route: Analysis of Routing DRC Errors 93
  • 94. Timing Driven Routing At 90 Quality of route can effect timing nm net delay becomes significant Optimize critical paths Route some nets first Most routing freedom at start Use shortest paths possible Net weights Order of routing (priorities : eg. Default : Clocks 50, others 2) Wire id i Wi widening Reduce resistance
  • 95. What is Signal Integrity or SI? (1) Signal delay caused by crosstalk noise Possible in 2 directions : push-out pull-down p p net 1 Aggressor net 2 Victim Speed Up Delay 95
  • 96. What is SI? (2) Glitch caused by crosstalk noise Aggressor Extra clock cycle! Functional Failure Vdd D Q ^ Clk Victim 96
  • 97. Crosstalk Prevention : Design Optimization Noise depends on Coupling capacitance Total net capacitance Strength of the driver (Rd of the victim net) Design optimization Increase drive strength often easier (only strength, local effect) Buffer long nets
  • 98. Crosstalk Prevention : Routing Routing solution Limit length of parallel nets (H&V) Wire spreading (skip track - clocks) Shield special nets Coupling free routing 98
  • 99. Crosstalk Prevention : Reduce Cross Coupling Cap Critical Nets Extra space Grounded shields Spacing Shielding Same layer (H) Adjacent layers (V) Net Ordering 99
  • 100. Effect of Floorplanning on Routing Congestion For hierarchical designs, good pin p placement is essential to p preventing g routing congestion. Can use pin guides during partitioning
  • 101. Routing around blockages and over macros By default routing tool will: Route over macros M1- M4 Routing Blockage Not route where there is a routing blockage Not route through a narrow M1- M3 Routing Blockage channel in the non-preferred non preferred routing direction M1- M4 Routing Blockage M4 has a horizontal routing channel but its preferred routing direction is vertical Macro The preferred routing direction needs to be changed
  • 102. Clock Tree Routing For SI prevention we generally want to route our clocks with extra spacing spacing. Global H-trees are often routed manually before placement Htree nets may be routed with wide-metal and shielding. Wide metal H Tree Wide-metal H-Tree net 102 Grounded shields
  • 103. Post Route Clock Tree Optimization (CTO) improve the skew on clock nets Detail Routed Before CTO Design Yes Skew OK? Short path No Postroute CTO ECO Route After CTO Increased delay
  • 104. Options for CPU effort O ti f ff t # processors Routing in parallel on # processors Superthreading, multithreading Some routers are better a threading than others # iterations for detail route # of iteration steps done to get a DRC free design
  • 105. Summary Starting from 90 nm technologies Timing Driven Route net delay is becoming more of a factor SI Aware Route Small geometries make SI timing closure much more difficult DFM / DFY Now a crucial part of the routing flow DRC Number and complexity of DRC rules has increased dramatically