SlideShare a Scribd company logo
1 of 34
20-25% CAGR in market volumes
Competitive advantage hinges on speed,
transparency, and proximity to data
sources. The application must be in the
data path – seamlessly

Quest to balance risk/compliance with
performance                 HPC on Wall Street - 2012
10GbE Switches for the
Virtualized Data Center,
but a software company
at the core

>1300 Customers
>325 Employees
Profitable, self-funded,
pre-IPO network
infrastructure provider
Open Linux-based OS
Fully automated testing,
and SW development
                           HPC on Wall Street - 2012
Arista Application Switch - 7124FX




• Couples ultra-low latency switch with next
  generation programmable FPGA and memory
  subsystem
• Customer programmable FPGA and Control Plane
  provides total control over the network, forwarding,
  inspection, redirection, etc.
• Targeted for early adopters of hardware
                                  HPC on Wall Street - 2012
Exegy believes…

     •   Exegy believes in continually challenging the status quo of
         market data delivery systems and trading platforms.
         –   First to market with hardware-accelerated market data appliances
             based on FPGA technology.
         –   Best of breed solutions for major use cases faced by low-latency,
             high-capacity consumers of financial market data feeds.


     •   Exegy believes that delivery and consumption of quality
         market data should be as easy and painless as possible.
         –   Fully managed and constantly monitored appliances to assure
             optimal performance and the best customer experience.
         –   A passion to help our customers succeed in the face of escalating
             complexity and the increasing demands placed on them.


v1                                                                               4
Impulse C, Custom FPGA-Accelerated Solutions for the Arista
                          7124FX
                              Brian Durwood, Co-founder
Converting C to multiple streaming hardware
processes ain’t that hard.
 Focus on reducing clock cycles
 Verify as you go
 Iterate, iterate, iterate (no “magic button”)

The tool flow is a bit awkward for first timers.
 Visual Studio or equivalent
 Impulse C co-development, analysis & compile
 Altera Quartus II for place & route into FPGA

Things you can do to get up to speed quickly:
 Work from known good sw modules
 Get up-front training or factory engineering
Programming With Impulse C
Not a new language
                                                                         C language
    Based on standard ANSI C                                            applications

C-language for FPGA programming
    For embedded and HPC applications                      Generate        Generate      Generate
                                                           accelerator      hardware       software
    Supports standard C development tools                  hardware        interfaces    interfaces
    Supports multi-process partitioning
                                                                 HDL                     C software
A software-to-hardware compiler                                  files                    libraries
    Optimizes C code for parallelism
    Generates HDL, ready for FPGA synthesis
                                                              Arista’s
    Also generates hardware/software interfaces             on-board
                                                              FGPA
Purpose
    Describe hardware accelerators using C
    Move compute-intensive functions to FPGAs



                              www.ImpulseAccelerated.com
Reference slides from hereafter




              www.ImpulseC.com    7
Custom FPGA-Accelerated
   Solutions for the Arista 7124FX
              Brian Durwood, Co-founder

Converting C to Multiple Streaming Hardware Processes
FPGAs – Advantages Over Software
  Massive parallelism
     At system level, loop level, instruction level

  One FPGA can replace multiple CPUs
     For specific tasks/algorithms, using much lower power

  No need for separate NIC card
     Enable in line processing at near line speed

  Minimize OS interference in filtering
     Especially during high transaction load events
     Reduces jitter and other interference

  Offloads standard CPUs with customized pre-processors
     e.g. select limited analysis of X message types that meet X
      criteria for X symbols


                      www.ImpulseAccelerated.com   Confidential     9
3 Popular FPGA Configurations
 Usage                                    Usage
                                           Option             Embedded
                                                                CPU
   1                                        2                   Core
                  Generated
                  Generated
                   Hardware
                  hardware
                                                              Generated
                                                               Embedded
                   module
                    Module                                     Hardware
                                                               hardware
                                                              Accelerators
                                                              accelerator
           FPGA
                                                       FPGA
         Create a hardware module
                                                    Accelerate an embedded CPU

 Usage

   3     Accelerate an
                                                              Generated
         external/host CPU                                      Generated
                                                               hardware
                                                                  Generated
                                                                 hardware
         or computing                                               Generated
                                                              accelerator
                                                                   hardware
                                                                accelerator
         cluster                                                     hardware
                                                                  accelerator
                                                                    accelerator
                                       Host
                                    processor
                                    or cluster           FPGA coprocessor
                                                                                  10
Configurations Can Be Combined
 Combining streaming, embedded processor, and host processor

                    Stream                         FPGA
10G Ethernet
                  processing                                     Embedded
                                            Matching
                     and                                            CPU
                                            algorithm
                    parsing                                          for
                                           and strategy
                                                                configuration

                        Host
                       message              Embedded and shared RAM
                      generation
               FPGA




                                            FPGA strategies can be coded using
                                            C for hardware and for embedded
                                            CPU, with shared RAM for hash table
                                            lookup or other local data

                                   www.ImpulseAccelerated.com
Impulse C Programming Model

                    H/W process                          S/W process
     S/W process                      H/W process


                       H/W process

 Communicating C-Language Processes
    Supports dataflow and message-based communications
    Supports parallelism at the application level and at the level of
     individual processes
    Allows simulation and
     debugging of parallel
     software processes.



                            www.ImpulseAccelerated.com                   12
Parallelism via Multiple Processes




Spatial
parallelism



                                       C
                     Temporal
                    parallelism
              (system-level pipelining)

               www.ImpulseAccelerated.com   13
An Impulse C Process
                                                                   Multiple methods of
                                                                   process-to-process
                              Shared memory                        communications
         C                     block reads/writes                  are supported



                Stream
                  inputs               C                Stream
                                                         outputs         C
     C           Signal
                  inputs
                                  process                Signal
                                                         outputs

                Register                                Register
                  inputs                                 outputs


                                 App Monitor
                                     outputs
                                                                   C
             Processes are independently
             synchronized

                           www.ImpulseAccelerated.com                                    14
Compile and Optimize
 Optimize the results using
   interactive tools
    Pipeline analysis
    Loop unrolling
    Instruction scheduling


 Generate FPGA hardware
    VHDL or Verilog
    Low level interfaces to
     memory, I/O and
     busses.
    ModelSim Test bench

                        www.ImpulseAccelerated.com   15
Debug and Verify

 Use C tools for application
 debugging
  Source-level debuggers
  C-language testing


 Test and analyze parallel
 dataflow with the Impulse
 Application Monitor

 Automatically generate
 VHDL or Verilog Test-
 benches

                        www.ImpulseAccelerated.com   16
Constructs Familiar to C Programmers

  Concept is similar to getc(), putc() in C for I/O


  co_stream_create                   Used in configuration

  co_stream_open                     Open the stream (clear eos)
  co_stream_close                    Close the stream (set eos)
  co_stream_eos                      Check end of stream (eos)

  co_stream_read                     Read from stream (with rdy, en)
  co_stream_write                    Write to stream (with rdy, en)

  co_stream_read_nb                  Non-blocking read (no rdy)
  co_stream_write_nb                 No-blocking write (no rdy)

                                    www.ImpulseC.com                   17
Credible Solution in use by:



            Multiple Confidential
                  Financial




    NDA Covered
   Financial Teams




                         www.ImpulseAccelerated.com   Confidential   18
Impulse Platform Support Package
                                                FPGA
                                              Embedded
                                              Processor               Memory
                                                                     Resources



                                                     FPGA                  Host Interfaces
       Impulse            Produces             Fabric Processing
    CoDeveloper™                                      Core

                                       PSP generates HW/SW                   Ethernet
                                       wrappers between FPGA
                                       core & system elements

                                                                     Other I/O
  Extensions (scripts and wrapper generators)
  Platform-specific library functions
  Documentation and tutorials
  Current ready to run examples for platform

                         www.ImpulseAccelerated.com   Confidential                           19
Examples of FPGA processing:
 Financial feed kernel bypass or Full
   Hardware based trading
    Direct handling of financial feeds

    Parsing incoming feeds and triggering
     outbound orders – your strategy in
     hardware
 Normalization or Protocol Conversion
    Gateway sending a sub-feed of data
 Pre-Trade Risk Checking
    Low Latency Broker Dealer Compliance
 Financial valuations
    Co-processor off-loading for Monte Carlo
     and other algorithms


                      www.ImpulseAccelerated.com   Confidential   20
Stand-Alone Feed Handling Solution
  Usage

    3


                                    RX
                                  Adapter
                                   (Verilog)

                                                              Feed Handler
           1G or 10G
                                                                   and
            Ethernet
                                                              Outbound UDP
             MAC
                                                               (Impulse C)
                                    TX
                                  Adapter
                                   (Verilog)




              www.ImpulseAccelerated.com       Confidential                  21
Network Processing Pipeline
                               FPGA                              UDP and TCP/IP
                                                                 implemented
                                                                 directly in FPGA
                                                                 hardware for low
                      Enet              UDP Parser
1/10GigE   MAC                                                   latency
                      Filter           and/or TCP/IP
                                           Stack


                                                                        Host System

                                          Custom
                 Embedded                 Filtering                        User
                   CPU                   Application                      Applica-
                                                                            tion

                                                                           Driver
                                            Host
                                        I/O Interface                      Host
                                                                          Memory
                     www.ImpulseAccelerated.com   Confidential                  22
Complex Order Support
          Standard
          Standard                                                                                                   FPGA or FPGA-Based Board
             and
 Exchanges, feed handlers, order data sources


              and
 Exchanges, feed handlers, order data sources



           CustomIncoming                                                                                                                    Outgoing
           Custom
            Feed
             Feed




                                                                                 Direct connection Impulse UDP/TCP
                                                                                 Direct connection Impulse UDP/TCP
        Normalizing Across Feeds
          Handler
           Handler                                                                                                                        Replace NIC
          Formats
           Formats
         e.g.: ITCH, Sub-Feed
             Produce
         e.g.: ITCH,                                                                                                            Revert feed to exchange formats
           OUCH,
            OUCH,
    Pull and Present Opportunities
            OPRA,
            OPRA,          10 Gb/S
                                                                                                                           Hardwire potential X required responses
                           Decompression
                           Ethernet
          BATS, &
 DecryptionBATS, &
           Generic
           Generic
   Replace UDP.
             UDP.
             NIC        Apply Trade Logic                                                                                   Message Management With Exchanges

                                                 Adapters
                                                 Adapters
                                                Processing without OS                                                        Insert risk limitations awaiting confirm
                                                  RMDS,
                                                   RMDS,
                                                Bloomberg
                                                Bloomberg
                                                 Ultra-fast pattern matching
                                                     and
                                                     and                                                                                       Manage Risk
                                                 Custom.
                                                  Custom.


                                                                       www.ImpulseAccelerated.com                          Confidential                                 23
Three Ways To Get Started
Learn the tools
   Acquire an Impulse CoDeveloper license.
   Work from the included reference designs.
   Experiment with ways to optimize your algorithms to run efficiently as
    multiple streaming processes in FPGA.
Turn Key System (“Bump in the Wire”)
     License above +
     UDP or other network attached FPGA-enabled reference design.
     FPGA-based accelerator platform.
     Impulse factory engineers to help get your system on line.

Turn Key System Running A Target Algorithm
   License above + Turn Key System above +
   Impulse Engineers, under NDA, refactor your target algorithm(s) for
    efficient compilation to FPGA.
   Impulse Engineers train your team on how the refactoring works.

                         www.ImpulseAccelerated.com   Confidential           24
About Impulse

  Most widely used C to FGPA tool

  Pure ANSI C
      No PAR or HW statements inserted


  Founded in 2002
      By part of the original ABEL team


              www.ImpulseAccelerated.com   Confidential   25
Additional Resources

  Engineering consultation
   info@ImpulseAccelerated.com
  Tutorials:
   www.ImpulseAccelerated.com/Tutorials
  Book:
   Practical FPGAProgramming in C




               www.ImpulseAccelerated.com   26
Arista Application Switch – Systems Design
Compute, Storage, Memory, I/O, Application Acceleration –
Together




                                        HPC on Wall Street - 2012
Platform Details
                                                                            Console Port
                                                     Air Vents               Clock Input




                           16 Base SFP/SFP+ Ports     8 FX SFP/SFP+ Ports  USB Port
                                                                      Management Port
24 Wirespeed 1G/ 10G SFP/ SFP + Ports


 High Availability:
 Dual Hot-swappable Power
 Supplies
 Multiple Hot-swappable Fan Units

 Designed for Data Center + Colocation:
 Flexible Front-to-Rear or Rear-to-
 Front Airflow
 Choice of AC or DC Power
                                                Application Switching for Cloud
 Supplies                                            HPC on Wall Street - 2012
                                                                     Networks
Arista Application Switch - 7124FX
Ultra Low Latency 24 port 10GbE Switch
•16 10GbE ports connected to LLE ASIC
•8 10GbE ports connected through Stratix V FPGA
•Built in 50GB SSD
•Optional Chip-Scale Atomic Clock and External Clock
Source




                                    HPC on Wall Street - 2012
Application Switch Markets




                   HPC on Wall Street - 2012
Financial Services Applications

Inline Risk Analysis                  Low Latency Broker Dealer Compliance

                                         Offload line arbitration to dramatically
Feed Handling and A/B Arbitration             improve application performance
                                    Instrument transaction performance at high
Real-time Data analysis                                             resolution
                                           Reducing system latency increases
Algorithmic trading                          performance of trading strategies
                                      Convert or normalize multiple order entry
Order Protocol Conversion                         formats to a common format

Order Execution Routing                    Set order policies for best execution




Application Switching for Cloud
Networks                                    HPC on Wall Street - 2012
                                                         March 19, 2011
Developing on the Application Switch




                        HPC on Wall Street - 2012
Application Switch Development Partners

  Complete integrated appliance model
   • Novasparks 100% Hardware market data solution
   • Exegy Appliance based robust ticker plant

  System integrators and development support
   • Impulse C C to RTL tools
   • Enyx Customer trading solutions and IP blocks




                                  HPC on Wall Street - 2012
Arista Application Switch 7124FX




A new category of product that provides a network
accelerated platform for high performance app
vendors to develop on

Combines a true network switch with full routing and
switching protocols, with fully-programmable hardware
creates a new market for the most demanding
applications

Application logic inserted into real-time environments
                                     HPC on Wall Street - 2012
with complete transparency

More Related Content

What's hot

Performance out of the box developers
Performance   out of the box developersPerformance   out of the box developers
Performance out of the box developersMichelle Holley
 
AMD Embedded G-Series Product Page
AMD Embedded G-Series Product PageAMD Embedded G-Series Product Page
AMD Embedded G-Series Product PageAMD
 
D2 audio dv_club_verification_flow
D2 audio dv_club_verification_flowD2 audio dv_club_verification_flow
D2 audio dv_club_verification_flowObsidian Software
 
3 additional dpdk_theory(1)
3 additional dpdk_theory(1)3 additional dpdk_theory(1)
3 additional dpdk_theory(1)videos
 
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning AccelerationclCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning AccelerationIntel® Software
 
Case Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded ProcessorsCase Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded Processorsaccount inactive
 
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsShinya Takamaeda-Y
 
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-ResolutionUltra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-ResolutionIntel® Software
 
MIPI DevCon 2016: Accelerating Software Development for MIPI CSI-2 Cameras
MIPI DevCon 2016: Accelerating Software Development for MIPI CSI-2 CamerasMIPI DevCon 2016: Accelerating Software Development for MIPI CSI-2 Cameras
MIPI DevCon 2016: Accelerating Software Development for MIPI CSI-2 CamerasMIPI Alliance
 
Multiple Shared Processor Pools In Power Systems
Multiple Shared Processor Pools In Power SystemsMultiple Shared Processor Pools In Power Systems
Multiple Shared Processor Pools In Power SystemsAndrey Klyachkin
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloadsinside-BigData.com
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Intel® Software
 
Xilinx virtex 7 fpga - Semester Presentation
Xilinx virtex 7 fpga - Semester PresentationXilinx virtex 7 fpga - Semester Presentation
Xilinx virtex 7 fpga - Semester PresentationMuhammad Muzaffar Khan
 

What's hot (18)

Performance out of the box developers
Performance   out of the box developersPerformance   out of the box developers
Performance out of the box developers
 
AMD Embedded G-Series Product Page
AMD Embedded G-Series Product PageAMD Embedded G-Series Product Page
AMD Embedded G-Series Product Page
 
D2 audio dv_club_verification_flow
D2 audio dv_club_verification_flowD2 audio dv_club_verification_flow
D2 audio dv_club_verification_flow
 
3 additional dpdk_theory(1)
3 additional dpdk_theory(1)3 additional dpdk_theory(1)
3 additional dpdk_theory(1)
 
Xilinx track g
Xilinx   track gXilinx   track g
Xilinx track g
 
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning AccelerationclCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
 
Williams xen summit 2010
Williams   xen summit 2010Williams   xen summit 2010
Williams xen summit 2010
 
Case Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded ProcessorsCase Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded Processors
 
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
 
513 516
513 516513 516
513 516
 
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-ResolutionUltra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
 
MIPI DevCon 2016: Accelerating Software Development for MIPI CSI-2 Cameras
MIPI DevCon 2016: Accelerating Software Development for MIPI CSI-2 CamerasMIPI DevCon 2016: Accelerating Software Development for MIPI CSI-2 Cameras
MIPI DevCon 2016: Accelerating Software Development for MIPI CSI-2 Cameras
 
Multiple Shared Processor Pools In Power Systems
Multiple Shared Processor Pools In Power SystemsMultiple Shared Processor Pools In Power Systems
Multiple Shared Processor Pools In Power Systems
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloads
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Xilinx virtex 7 fpga - Semester Presentation
Xilinx virtex 7 fpga - Semester PresentationXilinx virtex 7 fpga - Semester Presentation
Xilinx virtex 7 fpga - Semester Presentation
 
Graphics virtualization
Graphics virtualizationGraphics virtualization
Graphics virtualization
 
No[1][1]
No[1][1]No[1][1]
No[1][1]
 

Similar to Arista @ HPC on Wall Street 2012

15.00 hr van Hilten
15.00 hr van Hilten15.00 hr van Hilten
15.00 hr van HiltenThemadagen
 
FPGA Intro
FPGA IntroFPGA Intro
FPGA Intronaito88
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...mentoresd
 
FPGA Overview
FPGA OverviewFPGA Overview
FPGA OverviewMetalMath
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
Announcing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAsAnnouncing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAsAmazon Web Services
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Cesar Maciel
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...tdc-globalcode
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryDeepak Shankar
 
BAS004-1_伺服器硬體基礎_v181026 (View online)
BAS004-1_伺服器硬體基礎_v181026 (View online)BAS004-1_伺服器硬體基礎_v181026 (View online)
BAS004-1_伺服器硬體基礎_v181026 (View online)rwp99346
 
Velocity-EHF for Android
Velocity-EHF for AndroidVelocity-EHF for Android
Velocity-EHF for Androidmichaeljfawcett
 
FPGA Camp - National Instruments Presentation
FPGA Camp - National Instruments PresentationFPGA Camp - National Instruments Presentation
FPGA Camp - National Instruments PresentationFPGA Central
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas
 
BAS004-1_伺服器硬體基礎_v181026
BAS004-1_伺服器硬體基礎_v181026BAS004-1_伺服器硬體基礎_v181026
BAS004-1_伺服器硬體基礎_v181026rwp99346
 
Python и программирование GPU (Ивашкевич Глеб)
Python и программирование GPU (Ивашкевич Глеб)Python и программирование GPU (Ивашкевич Глеб)
Python и программирование GPU (Ивашкевич Глеб)IT-Доминанта
 

Similar to Arista @ HPC on Wall Street 2012 (20)

15.00 hr van Hilten
15.00 hr van Hilten15.00 hr van Hilten
15.00 hr van Hilten
 
Choosing the right processor
Choosing the right processorChoosing the right processor
Choosing the right processor
 
FPGA Intro
FPGA IntroFPGA Intro
FPGA Intro
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
 
Avb Module Datasheet
Avb Module DatasheetAvb Module Datasheet
Avb Module Datasheet
 
Agnostic Device Drivers
Agnostic Device DriversAgnostic Device Drivers
Agnostic Device Drivers
 
FPGA Overview
FPGA OverviewFPGA Overview
FPGA Overview
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
Announcing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAsAnnouncing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAs
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
BAS004-1_伺服器硬體基礎_v181026 (View online)
BAS004-1_伺服器硬體基礎_v181026 (View online)BAS004-1_伺服器硬體基礎_v181026 (View online)
BAS004-1_伺服器硬體基礎_v181026 (View online)
 
Velocity-EHF for Android
Velocity-EHF for AndroidVelocity-EHF for Android
Velocity-EHF for Android
 
FPGA Camp - National Instruments Presentation
FPGA Camp - National Instruments PresentationFPGA Camp - National Instruments Presentation
FPGA Camp - National Instruments Presentation
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
BAS004-1_伺服器硬體基礎_v181026
BAS004-1_伺服器硬體基礎_v181026BAS004-1_伺服器硬體基礎_v181026
BAS004-1_伺服器硬體基礎_v181026
 
Python и программирование GPU (Ивашкевич Глеб)
Python и программирование GPU (Ивашкевич Глеб)Python и программирование GPU (Ивашкевич Глеб)
Python и программирование GPU (Ивашкевич Глеб)
 

More from Kazunori Sato

FPGAによる大規模データ処理の高速化
FPGAによる大規模データ処理の高速化FPGAによる大規模データ処理の高速化
FPGAによる大規模データ処理の高速化Kazunori Sato
 
Moving computation to the data (1)
Moving computation to the data (1)Moving computation to the data (1)
Moving computation to the data (1)Kazunori Sato
 
CBA Google App Engine 20101208
CBA Google App Engine 20101208CBA Google App Engine 20101208
CBA Google App Engine 20101208Kazunori Sato
 
GDD2010 appengine ja night + Slim3
GDD2010 appengine ja night + Slim3GDD2010 appengine ja night + Slim3
GDD2010 appengine ja night + Slim3Kazunori Sato
 
Doc management by Confluence+Jira
Doc management by Confluence+JiraDoc management by Confluence+Jira
Doc management by Confluence+JiraKazunori Sato
 
Sthseminar Gae 20090715
Sthseminar Gae 20090715Sthseminar Gae 20090715
Sthseminar Gae 20090715Kazunori Sato
 
Flex/AIR×GAE/J 開発tips
Flex/AIR×GAE/J開発tipsFlex/AIR×GAE/J開発tips
Flex/AIR×GAE/J 開発tipsKazunori Sato
 

More from Kazunori Sato (10)

FPGAによる大規模データ処理の高速化
FPGAによる大規模データ処理の高速化FPGAによる大規模データ処理の高速化
FPGAによる大規模データ処理の高速化
 
Moving computation to the data (1)
Moving computation to the data (1)Moving computation to the data (1)
Moving computation to the data (1)
 
Bpstudy ajnreview
Bpstudy ajnreviewBpstudy ajnreview
Bpstudy ajnreview
 
cloudmix GAE slides
cloudmix GAE slidescloudmix GAE slides
cloudmix GAE slides
 
CBA Google App Engine 20101208
CBA Google App Engine 20101208CBA Google App Engine 20101208
CBA Google App Engine 20101208
 
GDD2010 appengine ja night + Slim3
GDD2010 appengine ja night + Slim3GDD2010 appengine ja night + Slim3
GDD2010 appengine ja night + Slim3
 
Doc management by Confluence+Jira
Doc management by Confluence+JiraDoc management by Confluence+Jira
Doc management by Confluence+Jira
 
XMPP and Tornado
XMPP and TornadoXMPP and Tornado
XMPP and Tornado
 
Sthseminar Gae 20090715
Sthseminar Gae 20090715Sthseminar Gae 20090715
Sthseminar Gae 20090715
 
Flex/AIR×GAE/J 開発tips
Flex/AIR×GAE/J開発tipsFlex/AIR×GAE/J開発tips
Flex/AIR×GAE/J 開発tips
 

Arista @ HPC on Wall Street 2012

  • 1. 20-25% CAGR in market volumes Competitive advantage hinges on speed, transparency, and proximity to data sources. The application must be in the data path – seamlessly Quest to balance risk/compliance with performance HPC on Wall Street - 2012
  • 2. 10GbE Switches for the Virtualized Data Center, but a software company at the core >1300 Customers >325 Employees Profitable, self-funded, pre-IPO network infrastructure provider Open Linux-based OS Fully automated testing, and SW development HPC on Wall Street - 2012
  • 3. Arista Application Switch - 7124FX • Couples ultra-low latency switch with next generation programmable FPGA and memory subsystem • Customer programmable FPGA and Control Plane provides total control over the network, forwarding, inspection, redirection, etc. • Targeted for early adopters of hardware HPC on Wall Street - 2012
  • 4. Exegy believes… • Exegy believes in continually challenging the status quo of market data delivery systems and trading platforms. – First to market with hardware-accelerated market data appliances based on FPGA technology. – Best of breed solutions for major use cases faced by low-latency, high-capacity consumers of financial market data feeds. • Exegy believes that delivery and consumption of quality market data should be as easy and painless as possible. – Fully managed and constantly monitored appliances to assure optimal performance and the best customer experience. – A passion to help our customers succeed in the face of escalating complexity and the increasing demands placed on them. v1 4
  • 5. Impulse C, Custom FPGA-Accelerated Solutions for the Arista 7124FX Brian Durwood, Co-founder Converting C to multiple streaming hardware processes ain’t that hard. Focus on reducing clock cycles Verify as you go Iterate, iterate, iterate (no “magic button”) The tool flow is a bit awkward for first timers. Visual Studio or equivalent Impulse C co-development, analysis & compile Altera Quartus II for place & route into FPGA Things you can do to get up to speed quickly: Work from known good sw modules Get up-front training or factory engineering
  • 6. Programming With Impulse C Not a new language C language  Based on standard ANSI C applications C-language for FPGA programming  For embedded and HPC applications Generate Generate Generate accelerator hardware software  Supports standard C development tools hardware interfaces interfaces  Supports multi-process partitioning HDL C software A software-to-hardware compiler files libraries  Optimizes C code for parallelism  Generates HDL, ready for FPGA synthesis Arista’s  Also generates hardware/software interfaces on-board FGPA Purpose  Describe hardware accelerators using C  Move compute-intensive functions to FPGAs www.ImpulseAccelerated.com
  • 7. Reference slides from hereafter www.ImpulseC.com 7
  • 8. Custom FPGA-Accelerated Solutions for the Arista 7124FX Brian Durwood, Co-founder Converting C to Multiple Streaming Hardware Processes
  • 9. FPGAs – Advantages Over Software Massive parallelism  At system level, loop level, instruction level One FPGA can replace multiple CPUs  For specific tasks/algorithms, using much lower power No need for separate NIC card  Enable in line processing at near line speed Minimize OS interference in filtering  Especially during high transaction load events  Reduces jitter and other interference Offloads standard CPUs with customized pre-processors  e.g. select limited analysis of X message types that meet X criteria for X symbols www.ImpulseAccelerated.com Confidential 9
  • 10. 3 Popular FPGA Configurations Usage Usage Option Embedded CPU 1 2 Core Generated Generated Hardware hardware Generated Embedded module Module Hardware hardware Accelerators accelerator FPGA FPGA Create a hardware module Accelerate an embedded CPU Usage 3 Accelerate an Generated external/host CPU Generated hardware Generated hardware or computing Generated accelerator hardware accelerator cluster hardware accelerator accelerator Host processor or cluster FPGA coprocessor 10
  • 11. Configurations Can Be Combined Combining streaming, embedded processor, and host processor Stream FPGA 10G Ethernet processing Embedded Matching and CPU algorithm parsing for and strategy configuration Host message Embedded and shared RAM generation FPGA FPGA strategies can be coded using C for hardware and for embedded CPU, with shared RAM for hash table lookup or other local data www.ImpulseAccelerated.com
  • 12. Impulse C Programming Model H/W process S/W process S/W process H/W process H/W process Communicating C-Language Processes  Supports dataflow and message-based communications  Supports parallelism at the application level and at the level of individual processes  Allows simulation and debugging of parallel software processes. www.ImpulseAccelerated.com 12
  • 13. Parallelism via Multiple Processes Spatial parallelism C Temporal parallelism (system-level pipelining) www.ImpulseAccelerated.com 13
  • 14. An Impulse C Process Multiple methods of process-to-process Shared memory communications C block reads/writes are supported Stream inputs C Stream outputs C C Signal inputs process Signal outputs Register Register inputs outputs App Monitor outputs C Processes are independently synchronized www.ImpulseAccelerated.com 14
  • 15. Compile and Optimize Optimize the results using interactive tools  Pipeline analysis  Loop unrolling  Instruction scheduling Generate FPGA hardware  VHDL or Verilog  Low level interfaces to memory, I/O and busses.  ModelSim Test bench www.ImpulseAccelerated.com 15
  • 16. Debug and Verify Use C tools for application debugging  Source-level debuggers  C-language testing Test and analyze parallel dataflow with the Impulse Application Monitor Automatically generate VHDL or Verilog Test- benches www.ImpulseAccelerated.com 16
  • 17. Constructs Familiar to C Programmers Concept is similar to getc(), putc() in C for I/O co_stream_create Used in configuration co_stream_open Open the stream (clear eos) co_stream_close Close the stream (set eos) co_stream_eos Check end of stream (eos) co_stream_read Read from stream (with rdy, en) co_stream_write Write to stream (with rdy, en) co_stream_read_nb Non-blocking read (no rdy) co_stream_write_nb No-blocking write (no rdy) www.ImpulseC.com 17
  • 18. Credible Solution in use by: Multiple Confidential Financial NDA Covered Financial Teams www.ImpulseAccelerated.com Confidential 18
  • 19. Impulse Platform Support Package FPGA Embedded Processor Memory Resources FPGA Host Interfaces Impulse Produces Fabric Processing CoDeveloper™ Core PSP generates HW/SW Ethernet wrappers between FPGA core & system elements Other I/O  Extensions (scripts and wrapper generators)  Platform-specific library functions  Documentation and tutorials  Current ready to run examples for platform www.ImpulseAccelerated.com Confidential 19
  • 20. Examples of FPGA processing: Financial feed kernel bypass or Full Hardware based trading  Direct handling of financial feeds  Parsing incoming feeds and triggering outbound orders – your strategy in hardware Normalization or Protocol Conversion  Gateway sending a sub-feed of data Pre-Trade Risk Checking  Low Latency Broker Dealer Compliance Financial valuations  Co-processor off-loading for Monte Carlo and other algorithms www.ImpulseAccelerated.com Confidential 20
  • 21. Stand-Alone Feed Handling Solution Usage 3 RX Adapter (Verilog) Feed Handler 1G or 10G and Ethernet Outbound UDP MAC (Impulse C) TX Adapter (Verilog) www.ImpulseAccelerated.com Confidential 21
  • 22. Network Processing Pipeline FPGA UDP and TCP/IP implemented directly in FPGA hardware for low Enet UDP Parser 1/10GigE MAC latency Filter and/or TCP/IP Stack Host System Custom Embedded Filtering User CPU Application Applica- tion Driver Host I/O Interface Host Memory www.ImpulseAccelerated.com Confidential 22
  • 23. Complex Order Support Standard Standard FPGA or FPGA-Based Board and Exchanges, feed handlers, order data sources and Exchanges, feed handlers, order data sources CustomIncoming Outgoing Custom Feed Feed Direct connection Impulse UDP/TCP Direct connection Impulse UDP/TCP Normalizing Across Feeds Handler Handler Replace NIC Formats Formats e.g.: ITCH, Sub-Feed Produce e.g.: ITCH, Revert feed to exchange formats OUCH, OUCH, Pull and Present Opportunities OPRA, OPRA, 10 Gb/S Hardwire potential X required responses Decompression Ethernet BATS, & DecryptionBATS, & Generic Generic Replace UDP. UDP. NIC Apply Trade Logic Message Management With Exchanges Adapters Adapters Processing without OS Insert risk limitations awaiting confirm RMDS, RMDS, Bloomberg Bloomberg Ultra-fast pattern matching and and Manage Risk Custom. Custom. www.ImpulseAccelerated.com Confidential 23
  • 24. Three Ways To Get Started Learn the tools  Acquire an Impulse CoDeveloper license.  Work from the included reference designs.  Experiment with ways to optimize your algorithms to run efficiently as multiple streaming processes in FPGA. Turn Key System (“Bump in the Wire”)  License above +  UDP or other network attached FPGA-enabled reference design.  FPGA-based accelerator platform.  Impulse factory engineers to help get your system on line. Turn Key System Running A Target Algorithm  License above + Turn Key System above +  Impulse Engineers, under NDA, refactor your target algorithm(s) for efficient compilation to FPGA.  Impulse Engineers train your team on how the refactoring works. www.ImpulseAccelerated.com Confidential 24
  • 25. About Impulse Most widely used C to FGPA tool Pure ANSI C No PAR or HW statements inserted Founded in 2002 By part of the original ABEL team www.ImpulseAccelerated.com Confidential 25
  • 26. Additional Resources Engineering consultation info@ImpulseAccelerated.com Tutorials: www.ImpulseAccelerated.com/Tutorials Book: Practical FPGAProgramming in C www.ImpulseAccelerated.com 26
  • 27. Arista Application Switch – Systems Design Compute, Storage, Memory, I/O, Application Acceleration – Together HPC on Wall Street - 2012
  • 28. Platform Details Console Port Air Vents Clock Input 16 Base SFP/SFP+ Ports 8 FX SFP/SFP+ Ports USB Port Management Port 24 Wirespeed 1G/ 10G SFP/ SFP + Ports High Availability: Dual Hot-swappable Power Supplies Multiple Hot-swappable Fan Units Designed for Data Center + Colocation: Flexible Front-to-Rear or Rear-to- Front Airflow Choice of AC or DC Power Application Switching for Cloud Supplies HPC on Wall Street - 2012 Networks
  • 29. Arista Application Switch - 7124FX Ultra Low Latency 24 port 10GbE Switch •16 10GbE ports connected to LLE ASIC •8 10GbE ports connected through Stratix V FPGA •Built in 50GB SSD •Optional Chip-Scale Atomic Clock and External Clock Source HPC on Wall Street - 2012
  • 30. Application Switch Markets HPC on Wall Street - 2012
  • 31. Financial Services Applications Inline Risk Analysis Low Latency Broker Dealer Compliance Offload line arbitration to dramatically Feed Handling and A/B Arbitration improve application performance Instrument transaction performance at high Real-time Data analysis resolution Reducing system latency increases Algorithmic trading performance of trading strategies Convert or normalize multiple order entry Order Protocol Conversion formats to a common format Order Execution Routing Set order policies for best execution Application Switching for Cloud Networks HPC on Wall Street - 2012 March 19, 2011
  • 32. Developing on the Application Switch HPC on Wall Street - 2012
  • 33. Application Switch Development Partners Complete integrated appliance model • Novasparks 100% Hardware market data solution • Exegy Appliance based robust ticker plant System integrators and development support • Impulse C C to RTL tools • Enyx Customer trading solutions and IP blocks HPC on Wall Street - 2012
  • 34. Arista Application Switch 7124FX A new category of product that provides a network accelerated platform for high performance app vendors to develop on Combines a true network switch with full routing and switching protocols, with fully-programmable hardware creates a new market for the most demanding applications Application logic inserted into real-time environments HPC on Wall Street - 2012 with complete transparency

Editor's Notes

  1. Ed
  2. Ed
  3. Ed
  4. Ed
  5. Ed