SlideShare a Scribd company logo
1 of 19
GPU-based Parallelization of
System Modeling

Stephan Pachnicke, 18.03.2013
Outline

• Motivation

• Numerical System Modeling

• GPU-Parallelization

• Comparison of Speedup and Accuracy

• Conclusion




2                       © 2013 ADVA Optical Networking. All rights reserved.
Acknowledgments

The author would like to acknowledge the help and
contributions of


Adam Chachaj – Krone Messtechnik

Heinrich Müller – TU Dortmund

Peter Krummrich – TU Dortmund

Markus Roppelt – ADVA Optical Networking

Michael Eiselt – ADVA Optical Networking




3                    © 2013 ADVA Optical Networking. All rights reserved.
Motivation




4            © 2013 ADVA Optical Networking. All rights reserved.
In Short: Computational Performance




                                                                           Graphical Processing Unit
                                                                                    (GPU)




                                       vs.
      CPU Cluster




5                   © 2013 ADVA Optical Networking. All rights reserved.
Increase in GFlop/s




• GPU performance is growing even faster than predicted by Moore„s
  law and is significantly higher than CPU performance

• GPUs are attractive also for general purpose computing
  (complex numerical simulations)



6                      © 2013 ADVA Optical Networking. All rights reserved.
Optical System Modeling

• Simulation of (long-haul) optical transmission systems requires
  numerical solution of the nonlinear Schrödinger equation

 High computational effort for small step-sizes due to accurate
  simulation of nonlinear fiber effects

• Precise estimation of the bit error ratio with Monte-Carlo
  simulations for PMD and noise

 Requires a high number of simulated bits




7                     © 2013 ADVA Optical Networking. All rights reserved.
Split-Step Fourier Method (SSFM)
•   Splits nonlinear Schrödinger equation in linear and nonlinear parts
•   Separate solution of linear and nonlinear parts




•   Solution of the linear part in the frequency domain and of the nonlinear
    part in time domain (acceptable for small step-sizes)




…                           FFT
                             FFT                                                 IFFT
                                                                                  IFFT
                                                                                   IFFT   …


                                       1 Split-Step
8                         © 2013 ADVA Optical Networking. All rights reserved.
Speedup Factor                              (GPU vs CPU)


          Single precision
                (SP)




                             Double precision
                                  (DP)
                                                                                              Legend
                                                                          DP:           Nvidia CUDA FFT
                                                                          SP:           FFT using pre-calculated
                                                                                        twiddle factors




•   Single precision arithmetic has much higher performance on GPU
    (because main target group is computer gaming)

•   Longer block lengths allow better parallelization

 Single precision implementation desirable

9                                © 2013 ADVA Optical Networking. All rights reserved.
Accuracy         (in single precision)


                                                                                   Legend
                                                                  CUFFT: Nvidia CUDA FFT
                                                                  FFTW: Fastest Fourier Transform
                                                                        in the West
                                                                  IPP:        Intel Integrated
                                                                              Performance Primitives
                             LUT-based FFT                        LUT:        Precalculate trigonometric
                                                                              functions in DP




 • Total accuracy of SSFM dominated by FFT accuracy

 • Backward error grows linearly with increasing number of FFTs

 • CUDA FFT shows considerably higher error than other FFT
   implementations

10                     © 2013 ADVA Optical Networking. All rights reserved.
Analysis: Accuracy

 Why is the accuracy of CUFFT in SP relatively low?

  FFT performance depends crucially on accuracy of „twiddle-
   factors“ (or trigonometric functions)

  HW implementation of trigonometric functions in SP on GPUs
   optimized for peak performance not accuracy


 What can be done to increase accuracy in single precision?

  Implementation of Taylor series expansion (slow!)

  Compute trigonometric functions in DP on CPU and store them in
   a look-up table on the GPU
   (especially suited to the split-step Fourier method with thousands
   of FFTs of similar length)

                                                         J. C. Schatzman, SIAM J. Scientific Comput. (1996).

11                     © 2013 ADVA Optical Networking. All rights reserved.
Illustrative Example
             CUDA FFT (SP)                                                  LUT-based FFT (SP)




                                                                                                                 -: GPU
                                                                                                                 -: CPU




     •   Look-up table based FFT provides a significantly increased accuracy in single-
         precision arithmetics
     •   Look-up table holds pre-calculated „twiddle-factor“ values

                                                                                   Source: S. Pachnicke, et al, OFC 2011.

12                              © 2013 ADVA Optical Networking. All rights reserved.
System Analysis                                               (SSFM Simulation)

      Req. OSNR deviation for BER=10-3 [dB]




                                                                                                     GPU simulation
                                                                                                      (in SP or DP)
                                                                                                           vs.
                                                                                                     CPU simulation
                                                                                                         (in DP)

                                                                                                     11x 112 Gb/s CP-QPSK




 •   GPU double precision results are (almost) identical to CPU results

 •   The OSNR penalty of our single precision implementation remains below
     0.1 dB up to a number of approx. 125,000 split-steps
                                                                                             Source: S. Pachnicke, IEEE ICTON, 2010.


13                                            © 2013 ADVA Optical Networking. All rights reserved.
Combined Simulation in SP & DP
                                                                  Calculate approximate
                                                                   division of the parameter
                                                                   space into strata by fast
                                                                   simulations with single
                                                                   precision.
                                                                  The ellipses represent
                                                                   parameter combinations
                                                                   for which bit errors occur
                                                                   during transmission.
                                                                  Execute simulations with
                                                                   double precision
                                                                   accuracy sparsely in the
                                                                   different strata to assess
                                                                   the BER.


  Combined simulation with single and double precision and automatic
   (algorithmic) choice of amount of single precision simulations
                                                                               P. Serena, et al, IEEE JLT, 2009.
                                                                                 S. Pachnicke, et al, OFC 2011.

14                      © 2013 ADVA Optical Networking. All rights reserved.
Discussion




                                                                   Robustness of algorithm has
                                                                   been checked by deliberately
                                                                   selecting high amount of
                                                                   880,000 split-steps



 •   Results of combined (SP & DP) GPU simulations match well with results obtained
     from CPU simulations in DP
 •   Speedup of up to a factor of 180 possible compared to CPU
  Stratified Monte-Carlo sampling allows algorithmic choice of amount of required DP
   simulations for a given accuracy


                                                                                    Source: S. Pachnicke, et al, OFC 2011.


15                           © 2013 ADVA Optical Networking. All rights reserved.
Design Advantages
 •   GPU parallelization allows simulation of a long distance 80 WDM channel system on
     a PC in reasonable time




                                                             Source: C. Xia, D. van den Borne, OFC, 2011




 •   Result: The system performance can be estimated much more precisely than with
     CPU-based simulations (typically modeling only 10 WDM channel systems)




16                           © 2013 ADVA Optical Networking. All rights reserved.
Conclusion

 • GPUs offer a much higher computational peak performance
   than CPUs

 • Full benefit of GPU power only in single precision

 • Increase in single precision accuracy possible by pre-computing of
   trigonometric function values for FFTs

 • Speedup in simulation time of more than a factor of 100 possible
   compared to CPU




17                     © 2013 ADVA Optical Networking. All rights reserved.
Further Reading

 •   N. K. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, J. Manferdelli, “High
     Performance Discrete Fourier Transforms on Graphics Processors”, Proc. of
     IEEE conference on Supercomputing (SC), article no. 2 (2008).

 •   S. Pachnicke, “Fiber-Optic Transmission Networks: Efficient Design and
     Dynamic Operation”, Springer (2011).

 •   J. C. Schatzman, “Accuracy of the Discrete Fourier Transform and the Fast
     Fourier Transform”, SIAM J. Scientific Comput. 17, 1150-1166 (1996).

 •   G. Falcao, V. Silva, L. Sousa, “How GPUs can outperform ASICs for fast LDPC
     decoding”, Proc. of ACM International Conference on Supercomputing
     (ICS), 390-399 (2009).

 •   J. A. Stratton, S. S. Stone, W.-M. W. Hwu, “MCUDA: An Efficient
     Implementation of CUDA Kernels for Multi-core CPUs”, Lecture Notes in
     Computer Science 5335, 16-30 (2008).

 •   R. R. Exposito, G. L. Taboada, S. Ramos, J. Tourino, R. Doallo, “General-
     purpose computation on GPUs for high performance cloud computing”, Wiley J.
     Concurrency and Computation 24 (2012).




18                          © 2013 ADVA Optical Networking. All rights reserved.
Thank you

spachnicke@advaoptical.com


IMPORTANT NOTICE

The content of this presentation is strictly confidential. ADVA Optical Networking is the exclusive owner or licensee of the
content, material, and information in this presentation. Any reproduction, publication or reprint, in whole or in part, is strictly
prohibited.

The information in this presentation may not be accurate, complete or up to date, and is provided without warranties or
representations of any kind, either express or implied. ADVA Optical Networking shall not be responsible for and disclaims any
liability for any loss or damages, including without limitation, direct, indirect, incidental, consequential and special damages,
alleged to have been caused by or in connection with using and/or relying on the information contained in this presentation.

Copyright © for the entire content of this presentation: ADVA Optical Networking.

More Related Content

What's hot

Emerging Trends and Applications for Cost Effective ROADMs
Emerging Trends and Applications for Cost Effective ROADMsEmerging Trends and Applications for Cost Effective ROADMs
Emerging Trends and Applications for Cost Effective ROADMs
CPqD
 
ROADM Technologies for Flexible - Tbitsec Optical Networks
ROADM Technologies for Flexible - Tbitsec Optical NetworksROADM Technologies for Flexible - Tbitsec Optical Networks
ROADM Technologies for Flexible - Tbitsec Optical Networks
CPqD
 
Optical Fibre & Introduction to TDM & DWDM
Optical Fibre & Introduction to TDM & DWDMOptical Fibre & Introduction to TDM & DWDM
Optical Fibre & Introduction to TDM & DWDM
Hasna Heng
 

What's hot (20)

Emerging Trends and Applications for Cost Effective ROADMs
Emerging Trends and Applications for Cost Effective ROADMsEmerging Trends and Applications for Cost Effective ROADMs
Emerging Trends and Applications for Cost Effective ROADMs
 
ROADM Technologies for Flexible - Tbitsec Optical Networks
ROADM Technologies for Flexible - Tbitsec Optical NetworksROADM Technologies for Flexible - Tbitsec Optical Networks
ROADM Technologies for Flexible - Tbitsec Optical Networks
 
Evaluation of Virtualization Models for Optical Connectivity Service Providers
Evaluation of Virtualization Models for Optical Connectivity Service ProvidersEvaluation of Virtualization Models for Optical Connectivity Service Providers
Evaluation of Virtualization Models for Optical Connectivity Service Providers
 
Metro High-Speed Product Line Manager
Metro High-Speed Product Line ManagerMetro High-Speed Product Line Manager
Metro High-Speed Product Line Manager
 
Ft tx presentation to telkom 25092013
Ft tx presentation to telkom 25092013Ft tx presentation to telkom 25092013
Ft tx presentation to telkom 25092013
 
Basics of DWDM Technology
Basics of DWDM TechnologyBasics of DWDM Technology
Basics of DWDM Technology
 
Performance Tradeoffs of 120 Gb/s DP-QPSK in ROADM Systems
Performance Tradeoffs of 120 Gb/s DP-QPSK in ROADM SystemsPerformance Tradeoffs of 120 Gb/s DP-QPSK in ROADM Systems
Performance Tradeoffs of 120 Gb/s DP-QPSK in ROADM Systems
 
Optical Transport Technologies and Trends
Optical Transport Technologies and TrendsOptical Transport Technologies and Trends
Optical Transport Technologies and Trends
 
DWDM & Packet Optical Fundamentals by Dion Leung [APRICOT 2015]
DWDM & Packet Optical Fundamentals by Dion Leung [APRICOT 2015]DWDM & Packet Optical Fundamentals by Dion Leung [APRICOT 2015]
DWDM & Packet Optical Fundamentals by Dion Leung [APRICOT 2015]
 
WDM Basics
WDM BasicsWDM Basics
WDM Basics
 
Optical Networks Infrastructure
Optical Networks InfrastructureOptical Networks Infrastructure
Optical Networks Infrastructure
 
Synchronization protection & redundancy in ng networks itsf 2015
Synchronization protection & redundancy in ng networks   itsf 2015Synchronization protection & redundancy in ng networks   itsf 2015
Synchronization protection & redundancy in ng networks itsf 2015
 
Mobile Broadband
Mobile BroadbandMobile Broadband
Mobile Broadband
 
Introduction to dwdm technology
Introduction to dwdm technologyIntroduction to dwdm technology
Introduction to dwdm technology
 
Optical Fibre & Introduction to TDM & DWDM
Optical Fibre & Introduction to TDM & DWDMOptical Fibre & Introduction to TDM & DWDM
Optical Fibre & Introduction to TDM & DWDM
 
DWDM 101 - BRKOPT-2016
DWDM 101 - BRKOPT-2016DWDM 101 - BRKOPT-2016
DWDM 101 - BRKOPT-2016
 
Optical network evolution
Optical network evolutionOptical network evolution
Optical network evolution
 
Implications of super channels on CDC ROADM architectures
Implications of super channels on CDC ROADM architecturesImplications of super channels on CDC ROADM architectures
Implications of super channels on CDC ROADM architectures
 
LTE introduction part1
LTE introduction part1LTE introduction part1
LTE introduction part1
 
Guide otn ang
Guide otn angGuide otn ang
Guide otn ang
 

Viewers also liked

Viewers also liked (7)

Deploying Virtualized Services Over Legacy Networks
Deploying Virtualized Services Over Legacy NetworksDeploying Virtualized Services Over Legacy Networks
Deploying Virtualized Services Over Legacy Networks
 
Statistical-Multiplexing Gain of C-RAN
Statistical-Multiplexing Gain of C-RANStatistical-Multiplexing Gain of C-RAN
Statistical-Multiplexing Gain of C-RAN
 
数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战
 
Forget the Layers: NFV Is About Dynamism
Forget the Layers: NFV Is About DynamismForget the Layers: NFV Is About Dynamism
Forget the Layers: NFV Is About Dynamism
 
WDM PON Forum Workshop
WDM PON Forum WorkshopWDM PON Forum Workshop
WDM PON Forum Workshop
 
NGFI (Next Generation Fronthaul Interface) native RoE (Radio over Ethernet)
NGFI (Next Generation Fronthaul Interface) native RoE (Radio over Ethernet)NGFI (Next Generation Fronthaul Interface) native RoE (Radio over Ethernet)
NGFI (Next Generation Fronthaul Interface) native RoE (Radio over Ethernet)
 
Tunable DWDM PON at WDM PON Forum Workshop
Tunable DWDM PON at WDM PON Forum WorkshopTunable DWDM PON at WDM PON Forum Workshop
Tunable DWDM PON at WDM PON Forum Workshop
 

Similar to OFC/NFOEC: GPU-based Parallelization of System Modeling

APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
Junli Gu
 
Symposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT KanpurSymposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT Kanpur
Rishi Pathak
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Fisnik Kraja
 
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
Altera Corporation
 

Similar to OFC/NFOEC: GPU-based Parallelization of System Modeling (20)

PG-Strom
PG-StromPG-Strom
PG-Strom
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
Design and implementation of GPU-based SAR image processor
Design and implementation of GPU-based SAR image processorDesign and implementation of GPU-based SAR image processor
Design and implementation of GPU-based SAR image processor
 
Imaging on embedded GPUs
Imaging on embedded GPUsImaging on embedded GPUs
Imaging on embedded GPUs
 
Circuits eda
Circuits edaCircuits eda
Circuits eda
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performance
 
Main (3)
Main (3)Main (3)
Main (3)
 
427 432
427 432427 432
427 432
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Symposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT KanpurSymposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT Kanpur
 
FIR filter on GPU
FIR filter on GPUFIR filter on GPU
FIR filter on GPU
 
N A G P A R I S280101
N A G P A R I S280101N A G P A R I S280101
N A G P A R I S280101
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with compute
 
Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit ppt
 
Jpeg dct
Jpeg dctJpeg dct
Jpeg dct
 
stdp_on_fpga.ppt
stdp_on_fpga.pptstdp_on_fpga.ppt
stdp_on_fpga.ppt
 
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
 
PhD defense talk (portfolio of my expertise)
PhD defense talk (portfolio of my expertise)PhD defense talk (portfolio of my expertise)
PhD defense talk (portfolio of my expertise)
 

More from ADVA

More from ADVA (20)

Industrial optically pumped cesium beam clock
Industrial optically pumped cesium beam clockIndustrial optically pumped cesium beam clock
Industrial optically pumped cesium beam clock
 
The need for GBaaS as GPS/GNSS is no longer a reliable source for critical PN...
The need for GBaaS as GPS/GNSS is no longer a reliable source for critical PN...The need for GBaaS as GPS/GNSS is no longer a reliable source for critical PN...
The need for GBaaS as GPS/GNSS is no longer a reliable source for critical PN...
 
Industry's longest holdover with the OSA 3350 SePRC™ optical cesium clock
Industry's longest holdover with the OSA 3350  SePRC™ optical cesium clockIndustry's longest holdover with the OSA 3350  SePRC™ optical cesium clock
Industry's longest holdover with the OSA 3350 SePRC™ optical cesium clock
 
Addressing PNT threats in critical defense infrastructure
Addressing PNT threats in critical defense infrastructureAddressing PNT threats in critical defense infrastructure
Addressing PNT threats in critical defense infrastructure
 
Precise and assured timing for enterprise networks
Precise and assured timing for enterprise networksPrecise and assured timing for enterprise networks
Precise and assured timing for enterprise networks
 
Introducing Ensemble Cloudlet for on-premises cloud demand
Introducing Ensemble Cloudlet for on-premises cloud demandIntroducing Ensemble Cloudlet for on-premises cloud demand
Introducing Ensemble Cloudlet for on-premises cloud demand
 
ePRTC in data centers - GNSS-backup-as-a-service (GBaaS)
ePRTC in data centers - GNSS-backup-as-a-service (GBaaS)ePRTC in data centers - GNSS-backup-as-a-service (GBaaS)
ePRTC in data centers - GNSS-backup-as-a-service (GBaaS)
 
Sync on TAP - Syncing infrastructure with software
Sync on TAP - Syncing infrastructure with softwareSync on TAP - Syncing infrastructure with software
Sync on TAP - Syncing infrastructure with software
 
Meet stringent latency demands with time-sensitive networking
Meet stringent latency demands with time-sensitive networkingMeet stringent latency demands with time-sensitive networking
Meet stringent latency demands with time-sensitive networking
 
Making networks secure with multi-layer encryption
Making networks secure with multi-layer encryptionMaking networks secure with multi-layer encryption
Making networks secure with multi-layer encryption
 
Quantum threat: How to protect your optical network
Quantum threat: How to protect your optical networkQuantum threat: How to protect your optical network
Quantum threat: How to protect your optical network
 
Optical networks and the ecodesign tradeoff between climate change mitigation...
Optical networks and the ecodesign tradeoff between climate change mitigation...Optical networks and the ecodesign tradeoff between climate change mitigation...
Optical networks and the ecodesign tradeoff between climate change mitigation...
 
Trends in next-generation data center interconnects (DCI)
Trends in next-generation data center interconnects (DCI)Trends in next-generation data center interconnects (DCI)
Trends in next-generation data center interconnects (DCI)
 
Open optical edge connecting mobile access networks
Open optical edge connecting mobile access networksOpen optical edge connecting mobile access networks
Open optical edge connecting mobile access networks
 
Introducing Adva Network Security – a trusted German anchor
Introducing Adva Network Security – a trusted German anchorIntroducing Adva Network Security – a trusted German anchor
Introducing Adva Network Security – a trusted German anchor
 
Meet the industry's first pluggable 10G demarcation device
Meet the industry's first pluggable 10G demarcation deviceMeet the industry's first pluggable 10G demarcation device
Meet the industry's first pluggable 10G demarcation device
 
Introducing ADVA AccessWave25™
Introducing ADVA AccessWave25™Introducing ADVA AccessWave25™
Introducing ADVA AccessWave25™
 
10G edge technology for outdoor environments
10G edge technology for outdoor environments10G edge technology for outdoor environments
10G edge technology for outdoor environments
 
The quantum age - secure transport networks
The quantum age - secure transport networksThe quantum age - secure transport networks
The quantum age - secure transport networks
 
From leased lines to optical spectrum services
From leased lines to optical spectrum servicesFrom leased lines to optical spectrum services
From leased lines to optical spectrum services
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

OFC/NFOEC: GPU-based Parallelization of System Modeling

  • 1. GPU-based Parallelization of System Modeling Stephan Pachnicke, 18.03.2013
  • 2. Outline • Motivation • Numerical System Modeling • GPU-Parallelization • Comparison of Speedup and Accuracy • Conclusion 2 © 2013 ADVA Optical Networking. All rights reserved.
  • 3. Acknowledgments The author would like to acknowledge the help and contributions of Adam Chachaj – Krone Messtechnik Heinrich Müller – TU Dortmund Peter Krummrich – TU Dortmund Markus Roppelt – ADVA Optical Networking Michael Eiselt – ADVA Optical Networking 3 © 2013 ADVA Optical Networking. All rights reserved.
  • 4. Motivation 4 © 2013 ADVA Optical Networking. All rights reserved.
  • 5. In Short: Computational Performance Graphical Processing Unit (GPU) vs. CPU Cluster 5 © 2013 ADVA Optical Networking. All rights reserved.
  • 6. Increase in GFlop/s • GPU performance is growing even faster than predicted by Moore„s law and is significantly higher than CPU performance • GPUs are attractive also for general purpose computing (complex numerical simulations) 6 © 2013 ADVA Optical Networking. All rights reserved.
  • 7. Optical System Modeling • Simulation of (long-haul) optical transmission systems requires numerical solution of the nonlinear Schrödinger equation  High computational effort for small step-sizes due to accurate simulation of nonlinear fiber effects • Precise estimation of the bit error ratio with Monte-Carlo simulations for PMD and noise  Requires a high number of simulated bits 7 © 2013 ADVA Optical Networking. All rights reserved.
  • 8. Split-Step Fourier Method (SSFM) • Splits nonlinear Schrödinger equation in linear and nonlinear parts • Separate solution of linear and nonlinear parts • Solution of the linear part in the frequency domain and of the nonlinear part in time domain (acceptable for small step-sizes) … FFT FFT IFFT IFFT IFFT … 1 Split-Step 8 © 2013 ADVA Optical Networking. All rights reserved.
  • 9. Speedup Factor (GPU vs CPU) Single precision (SP) Double precision (DP) Legend DP: Nvidia CUDA FFT SP: FFT using pre-calculated twiddle factors • Single precision arithmetic has much higher performance on GPU (because main target group is computer gaming) • Longer block lengths allow better parallelization  Single precision implementation desirable 9 © 2013 ADVA Optical Networking. All rights reserved.
  • 10. Accuracy (in single precision) Legend CUFFT: Nvidia CUDA FFT FFTW: Fastest Fourier Transform in the West IPP: Intel Integrated Performance Primitives LUT-based FFT LUT: Precalculate trigonometric functions in DP • Total accuracy of SSFM dominated by FFT accuracy • Backward error grows linearly with increasing number of FFTs • CUDA FFT shows considerably higher error than other FFT implementations 10 © 2013 ADVA Optical Networking. All rights reserved.
  • 11. Analysis: Accuracy Why is the accuracy of CUFFT in SP relatively low?  FFT performance depends crucially on accuracy of „twiddle- factors“ (or trigonometric functions)  HW implementation of trigonometric functions in SP on GPUs optimized for peak performance not accuracy What can be done to increase accuracy in single precision?  Implementation of Taylor series expansion (slow!)  Compute trigonometric functions in DP on CPU and store them in a look-up table on the GPU (especially suited to the split-step Fourier method with thousands of FFTs of similar length) J. C. Schatzman, SIAM J. Scientific Comput. (1996). 11 © 2013 ADVA Optical Networking. All rights reserved.
  • 12. Illustrative Example CUDA FFT (SP) LUT-based FFT (SP) -: GPU -: CPU • Look-up table based FFT provides a significantly increased accuracy in single- precision arithmetics • Look-up table holds pre-calculated „twiddle-factor“ values Source: S. Pachnicke, et al, OFC 2011. 12 © 2013 ADVA Optical Networking. All rights reserved.
  • 13. System Analysis (SSFM Simulation) Req. OSNR deviation for BER=10-3 [dB] GPU simulation (in SP or DP) vs. CPU simulation (in DP) 11x 112 Gb/s CP-QPSK • GPU double precision results are (almost) identical to CPU results • The OSNR penalty of our single precision implementation remains below 0.1 dB up to a number of approx. 125,000 split-steps Source: S. Pachnicke, IEEE ICTON, 2010. 13 © 2013 ADVA Optical Networking. All rights reserved.
  • 14. Combined Simulation in SP & DP  Calculate approximate division of the parameter space into strata by fast simulations with single precision.  The ellipses represent parameter combinations for which bit errors occur during transmission.  Execute simulations with double precision accuracy sparsely in the different strata to assess the BER.  Combined simulation with single and double precision and automatic (algorithmic) choice of amount of single precision simulations P. Serena, et al, IEEE JLT, 2009. S. Pachnicke, et al, OFC 2011. 14 © 2013 ADVA Optical Networking. All rights reserved.
  • 15. Discussion Robustness of algorithm has been checked by deliberately selecting high amount of 880,000 split-steps • Results of combined (SP & DP) GPU simulations match well with results obtained from CPU simulations in DP • Speedup of up to a factor of 180 possible compared to CPU  Stratified Monte-Carlo sampling allows algorithmic choice of amount of required DP simulations for a given accuracy Source: S. Pachnicke, et al, OFC 2011. 15 © 2013 ADVA Optical Networking. All rights reserved.
  • 16. Design Advantages • GPU parallelization allows simulation of a long distance 80 WDM channel system on a PC in reasonable time Source: C. Xia, D. van den Borne, OFC, 2011 • Result: The system performance can be estimated much more precisely than with CPU-based simulations (typically modeling only 10 WDM channel systems) 16 © 2013 ADVA Optical Networking. All rights reserved.
  • 17. Conclusion • GPUs offer a much higher computational peak performance than CPUs • Full benefit of GPU power only in single precision • Increase in single precision accuracy possible by pre-computing of trigonometric function values for FFTs • Speedup in simulation time of more than a factor of 100 possible compared to CPU 17 © 2013 ADVA Optical Networking. All rights reserved.
  • 18. Further Reading • N. K. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, J. Manferdelli, “High Performance Discrete Fourier Transforms on Graphics Processors”, Proc. of IEEE conference on Supercomputing (SC), article no. 2 (2008). • S. Pachnicke, “Fiber-Optic Transmission Networks: Efficient Design and Dynamic Operation”, Springer (2011). • J. C. Schatzman, “Accuracy of the Discrete Fourier Transform and the Fast Fourier Transform”, SIAM J. Scientific Comput. 17, 1150-1166 (1996). • G. Falcao, V. Silva, L. Sousa, “How GPUs can outperform ASICs for fast LDPC decoding”, Proc. of ACM International Conference on Supercomputing (ICS), 390-399 (2009). • J. A. Stratton, S. S. Stone, W.-M. W. Hwu, “MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs”, Lecture Notes in Computer Science 5335, 16-30 (2008). • R. R. Exposito, G. L. Taboada, S. Ramos, J. Tourino, R. Doallo, “General- purpose computation on GPUs for high performance cloud computing”, Wiley J. Concurrency and Computation 24 (2012). 18 © 2013 ADVA Optical Networking. All rights reserved.
  • 19. Thank you spachnicke@advaoptical.com IMPORTANT NOTICE The content of this presentation is strictly confidential. ADVA Optical Networking is the exclusive owner or licensee of the content, material, and information in this presentation. Any reproduction, publication or reprint, in whole or in part, is strictly prohibited. The information in this presentation may not be accurate, complete or up to date, and is provided without warranties or representations of any kind, either express or implied. ADVA Optical Networking shall not be responsible for and disclaims any liability for any loss or damages, including without limitation, direct, indirect, incidental, consequential and special damages, alleged to have been caused by or in connection with using and/or relying on the information contained in this presentation. Copyright © for the entire content of this presentation: ADVA Optical Networking.