SlideShare a Scribd company logo
1 of 21
Download to read offline
XStream: Rapid Generation of
Custom Processors
for ASIC Designs
by
Ali Shahbazi
2
Overview
 What is XStream ?
 Comparison to Network Processors
 Design Flow
 Design Example: Ethernet Bridge/VLAN
Switch
3
What is XStream ?
 Software tool to rapidly generate high
performance custom stream processors
 Stream Processing: Repeated application of an algorithm kernel to
a sequence of packets subject to throughput specifications
 Resulting custom processors:
 40-90% performance of a custom ASIC
 < 5% design effort of a custom ASIC
 Rapidly develop your own ultra high
performance network processors!
4
When you use a Network Processor
What your product looks like What your competitor’s
product looks like
5
XStream vs Network Processor
What if my application does not look like this ?
6
XStream vs Network Processor
What if my application does not look like this ?
Network Processor: No help
XStream: Make a system that looks like my app in days
7
XStream vs Network Processor
What if I want to use cheaper DDR2 instead of RDRAM or need more b/w ?
8
XStream vs Network Processor
What if I want to use cheaper DDR2 instead of RDRAM or need more b/w ?
Network Processor: No help
XStream: Select a different controller from the GUI and plop it on the chip
9
XStream vs Network Processor
 What if I need
 Different type/number of micro-engines
 More capable control processor
 Additional high performance processors for value
added services
 More crypto cores
 Different trie lookup hardware
 Different DRAM bandwidth
 Etc, etc, etc
 Network processor: No help
 XStream: Yes
10
Design Flow
 Draw an architecture diagram for your application
 Select processors, interfaces, IP blocks etc from a
GUI
 Specify parameters, throughput requirements etc
 Specify the high level function of any additional
custom coprocessors you need
 Press a button and wait...
 XStream generates the h/w for you
11
Design Example
 Objective:
 Design a platform chip that is shared across different
products to save cost
 Product 1: 16 port Ethernet Bridge
 Product 2: 16 port VLAN switch with advanced
filtering abilities
 Major differences:
 Wimpy ingress/egress processors ok on the bridge
 VLAN Switch needs high performance ingress/egress
processors
 VLAN Switch needs high performance filter rule
engine
12
XStream: Designing a Platform Chip
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
.
.
.
16 ports
Ingress
Queue
Egress
Queue
Crossbar
Stream
Processor
for
Switching
Decisions
Control
Processor
External
DRAM
13
The Streams in XStream
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
.
.
.
16 ports
Ingress
Queue
Egress
Queue
Crossbar
Stream
Processor
for
Switching
Decisions
Control
Processor
External
DRAM
14
The Streams in Xstream
Link
Interface
Port
Ingress
Processor
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
.
.
.
16 ports
Ingress
Queue
Egress
Queue
Crossbar
Stream
Processor
for
Switching
Decisions
Control
Processor
External
DRAM
Port
Egress
Processor
15
The Streams in Xstream
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
.
.
.
16 ports
Ingress
Queue
Egress
Queue
Crossbar
Stream
Processor
for
Switching
Decisions
Control
Processor
External
DRAM
16
XStream: Mapping the core processor
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
.
.
.
16 ports
Ingress
Queue
Egress
Queue
Crossbar
Stream
Processor
for
Switching
Decisions
Control
Processor
External
DRAM
17
XStream: Mapping the core processor...
Ingress
Queue
Egress
Queue
Stream
Processor
for
Switching
Decisions
 Imagine a snazzy GUI here
 Designer says:
 Stream processor, 8 issue
 Stream 1: Input, 16x1 queue, N deep
 Stream 2: Output,16x1 queue, M deep
 Stream 3: Inout, RISC processor
interface
 Add a CAM: 2 port, 48 bit keys, 1024
entries, 4 way associative, hash=F(…)
 The tool ponders for a while…
 Says: “Yes master”
18
Ingress
Queue
Egress
Queue
Stream
Processor
for
Switching
Decisions
 Imagine a snazzy GUI here
 Designer writes 15 lines of code for the data plane,
say in a subset of C
 Designer says: Schedule and report
 The tool ponders for a while…Says:
 Compiled 45 instructions
 Using modulo accelerator
 Initiation interval = 8 cycles
 Clock speed: 500 MHz
 Throughput based on 64 byte (worst case)
packet size:
 500MHz/8 * 64 * 8 = 32 Gb/s
 Area: 2.5mm x 2.5mm
 Power: 1.2 W
 Single stream processor @ 500 MHz = 32 Gb/s
 Have designed up to 1 GHz processor in 0.13u
process
XStream: Mapping the core processor...
19
XStream: Mapping the ingress processor...
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
.
.
.
16 ports
Ingress
Queue
Egress
Queue
Crossbar
Stream
Processor
for
Switching
Decisions
Control
Processor
External
DRAM
20
XStream: Mapping the ingress processor...
Port
Ingress
Processor
Filter
Rule
Engine
 Imagine a snazzy GUI here
 Designer says:
 RISC processor engine, no-cache
 2 issue, scratchpad memory
 Stream 1: Input, link interface
 Stream 2: Output, StreamProc:Ingress
Queue
 Add a Filter Rule Engine: Rule
complexity = 64 terms, …
 The tool ponders for a while…Says:
 RISC core and compiler generated
 Area: 1mm x 1mm (i.e. this can be
replicated 100x on a 10x10mm chip)
 Power: 250 mW
21
Summary
 Showed network processor design
 But might as well be multi-media or wireless product
design
 Very high performance custom processors replace
ASIC modules
 Reduce design time for stream oriented ASIC modules
by 95%
 Retain 40-90% of ASIC performance
 Software replaces hardware design
 Software prototype already exists
 Flexible, fast bug fixes, feature upgrades
 Share chip across product family

More Related Content

What's hot

PIC32MX5XX/6XX/7XX USB, CAN and Ethernet 32-Bit Flash Microcontrollers
PIC32MX5XX/6XX/7XX USB, CAN and Ethernet 32-Bit Flash MicrocontrollersPIC32MX5XX/6XX/7XX USB, CAN and Ethernet 32-Bit Flash Microcontrollers
PIC32MX5XX/6XX/7XX USB, CAN and Ethernet 32-Bit Flash MicrocontrollersPremier Farnell
 
atmega 128 and communication protocol
atmega 128 and communication protocolatmega 128 and communication protocol
atmega 128 and communication protocolRashmi Deoli
 
PIC32MX Microcontroller Family
PIC32MX Microcontroller FamilyPIC32MX Microcontroller Family
PIC32MX Microcontroller FamilyPremier Farnell
 
Getting to Know the R8C/2A, 2B Group MCUs
Getting to Know the R8C/2A, 2B Group MCUs Getting to Know the R8C/2A, 2B Group MCUs
Getting to Know the R8C/2A, 2B Group MCUs Premier Farnell
 
04 Mcu Day Stellaris 8 12b Editado
04   Mcu Day   Stellaris 8 12b   Editado04   Mcu Day   Stellaris 8 12b   Editado
04 Mcu Day Stellaris 8 12b EditadoTexas Instruments
 
Getting Started with RS08 MCUs
Getting Started with RS08 MCUsGetting Started with RS08 MCUs
Getting Started with RS08 MCUsPremier Farnell
 
03 Mcu Day 2009 (C2000) 8 13 Editado
03   Mcu Day 2009 (C2000) 8 13   Editado03   Mcu Day 2009 (C2000) 8 13   Editado
03 Mcu Day 2009 (C2000) 8 13 EditadoTexas Instruments
 
What's going on with SPI
What's going on with SPI What's going on with SPI
What's going on with SPI Mark Brown
 
Arm corrected ppt
Arm corrected pptArm corrected ppt
Arm corrected pptanish jagan
 
ARM architcture
ARM architcture ARM architcture
ARM architcture Hossam Adel
 
2 introduction to arm architecture
2 introduction to arm architecture2 introduction to arm architecture
2 introduction to arm architecturesatish1jisatishji
 
An Overview of LPC2101/02/03
An Overview of LPC2101/02/03An Overview of LPC2101/02/03
An Overview of LPC2101/02/03Premier Farnell
 
The FlexTiles Development Platform offers Dual FPGA for 3D SoC Prototyping
The FlexTiles Development Platform offers Dual FPGA for 3D SoC PrototypingThe FlexTiles Development Platform offers Dual FPGA for 3D SoC Prototyping
The FlexTiles Development Platform offers Dual FPGA for 3D SoC PrototypingFlexTiles Team
 

What's hot (20)

PIC32MX5XX/6XX/7XX USB, CAN and Ethernet 32-Bit Flash Microcontrollers
PIC32MX5XX/6XX/7XX USB, CAN and Ethernet 32-Bit Flash MicrocontrollersPIC32MX5XX/6XX/7XX USB, CAN and Ethernet 32-Bit Flash Microcontrollers
PIC32MX5XX/6XX/7XX USB, CAN and Ethernet 32-Bit Flash Microcontrollers
 
atmega 128 and communication protocol
atmega 128 and communication protocolatmega 128 and communication protocol
atmega 128 and communication protocol
 
EC8791 UML-model train controller
EC8791 UML-model train controllerEC8791 UML-model train controller
EC8791 UML-model train controller
 
PIC32MX Microcontroller Family
PIC32MX Microcontroller FamilyPIC32MX Microcontroller Family
PIC32MX Microcontroller Family
 
Unit vi (1)
Unit vi (1)Unit vi (1)
Unit vi (1)
 
Getting to Know the R8C/2A, 2B Group MCUs
Getting to Know the R8C/2A, 2B Group MCUs Getting to Know the R8C/2A, 2B Group MCUs
Getting to Know the R8C/2A, 2B Group MCUs
 
04 Mcu Day Stellaris 8 12b Editado
04   Mcu Day   Stellaris 8 12b   Editado04   Mcu Day   Stellaris 8 12b   Editado
04 Mcu Day Stellaris 8 12b Editado
 
Getting Started with RS08 MCUs
Getting Started with RS08 MCUsGetting Started with RS08 MCUs
Getting Started with RS08 MCUs
 
Dual port ram
Dual port ramDual port ram
Dual port ram
 
Arm Processor
Arm ProcessorArm Processor
Arm Processor
 
03 Mcu Day 2009 (C2000) 8 13 Editado
03   Mcu Day 2009 (C2000) 8 13   Editado03   Mcu Day 2009 (C2000) 8 13   Editado
03 Mcu Day 2009 (C2000) 8 13 Editado
 
ARM7TDM
ARM7TDMARM7TDM
ARM7TDM
 
dual-port RAM (DPRAM)
dual-port RAM (DPRAM)dual-port RAM (DPRAM)
dual-port RAM (DPRAM)
 
Unit vi (2)
Unit vi (2)Unit vi (2)
Unit vi (2)
 
What's going on with SPI
What's going on with SPI What's going on with SPI
What's going on with SPI
 
Arm corrected ppt
Arm corrected pptArm corrected ppt
Arm corrected ppt
 
ARM architcture
ARM architcture ARM architcture
ARM architcture
 
2 introduction to arm architecture
2 introduction to arm architecture2 introduction to arm architecture
2 introduction to arm architecture
 
An Overview of LPC2101/02/03
An Overview of LPC2101/02/03An Overview of LPC2101/02/03
An Overview of LPC2101/02/03
 
The FlexTiles Development Platform offers Dual FPGA for 3D SoC Prototyping
The FlexTiles Development Platform offers Dual FPGA for 3D SoC PrototypingThe FlexTiles Development Platform offers Dual FPGA for 3D SoC Prototyping
The FlexTiles Development Platform offers Dual FPGA for 3D SoC Prototyping
 

Viewers also liked

Detect & Remediate Malware & Advanced Targeted Attacks
Detect & Remediate Malware & Advanced Targeted AttacksDetect & Remediate Malware & Advanced Targeted Attacks
Detect & Remediate Malware & Advanced Targeted AttacksImperva
 
Why Network and Endpoint Security Isn’t Enough
Why Network and Endpoint Security Isn’t EnoughWhy Network and Endpoint Security Isn’t Enough
Why Network and Endpoint Security Isn’t EnoughImperva
 
A review of network concepts base on CISCO by Ali Shahbazi
A review of network concepts base on CISCO by Ali ShahbaziA review of network concepts base on CISCO by Ali Shahbazi
A review of network concepts base on CISCO by Ali ShahbaziAli Shahbazi Khojasteh
 
Protect Your Data and Apps in the Public Cloud
Protect Your Data and Apps in the Public CloudProtect Your Data and Apps in the Public Cloud
Protect Your Data and Apps in the Public CloudImperva
 
More Databases. More Hackers. More Audits.
More Databases. More Hackers. More Audits.More Databases. More Hackers. More Audits.
More Databases. More Hackers. More Audits.Imperva
 
Hackers, Cyber Crime and Espionage
Hackers, Cyber Crime and EspionageHackers, Cyber Crime and Espionage
Hackers, Cyber Crime and EspionageImperva
 
FireEye - Breaches are inevitable, but the outcome is not
FireEye - Breaches are inevitable, but the outcome is not FireEye - Breaches are inevitable, but the outcome is not
FireEye - Breaches are inevitable, but the outcome is not MarketingArrowECS_CZ
 
Gartner MQ for Web App Firewall Webinar
Gartner MQ for Web App Firewall WebinarGartner MQ for Web App Firewall Webinar
Gartner MQ for Web App Firewall WebinarImperva
 
Hacking HTTP/2 : New attacks on the Internet’s Next Generation Foundation
Hacking HTTP/2: New attacks on the Internet’s Next Generation FoundationHacking HTTP/2: New attacks on the Internet’s Next Generation Foundation
Hacking HTTP/2 : New attacks on the Internet’s Next Generation FoundationImperva
 
Cisco IPv6 Tutorial
Cisco IPv6 TutorialCisco IPv6 Tutorial
Cisco IPv6 Tutorialkriz5
 
Building Healthier Communities: TEDMED 2016
Building Healthier Communities: TEDMED 2016Building Healthier Communities: TEDMED 2016
Building Healthier Communities: TEDMED 2016Luminary Labs
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 

Viewers also liked (15)

Ali shahbazi khojasteh dot1X
Ali shahbazi khojasteh dot1XAli shahbazi khojasteh dot1X
Ali shahbazi khojasteh dot1X
 
Detect & Remediate Malware & Advanced Targeted Attacks
Detect & Remediate Malware & Advanced Targeted AttacksDetect & Remediate Malware & Advanced Targeted Attacks
Detect & Remediate Malware & Advanced Targeted Attacks
 
Why Network and Endpoint Security Isn’t Enough
Why Network and Endpoint Security Isn’t EnoughWhy Network and Endpoint Security Isn’t Enough
Why Network and Endpoint Security Isn’t Enough
 
A review of network concepts base on CISCO by Ali Shahbazi
A review of network concepts base on CISCO by Ali ShahbaziA review of network concepts base on CISCO by Ali Shahbazi
A review of network concepts base on CISCO by Ali Shahbazi
 
Protect Your Data and Apps in the Public Cloud
Protect Your Data and Apps in the Public CloudProtect Your Data and Apps in the Public Cloud
Protect Your Data and Apps in the Public Cloud
 
More Databases. More Hackers. More Audits.
More Databases. More Hackers. More Audits.More Databases. More Hackers. More Audits.
More Databases. More Hackers. More Audits.
 
Hackers, Cyber Crime and Espionage
Hackers, Cyber Crime and EspionageHackers, Cyber Crime and Espionage
Hackers, Cyber Crime and Espionage
 
FireEye - Breaches are inevitable, but the outcome is not
FireEye - Breaches are inevitable, but the outcome is not FireEye - Breaches are inevitable, but the outcome is not
FireEye - Breaches are inevitable, but the outcome is not
 
Gartner MQ for Web App Firewall Webinar
Gartner MQ for Web App Firewall WebinarGartner MQ for Web App Firewall Webinar
Gartner MQ for Web App Firewall Webinar
 
Hacking HTTP/2 : New attacks on the Internet’s Next Generation Foundation
Hacking HTTP/2: New attacks on the Internet’s Next Generation FoundationHacking HTTP/2: New attacks on the Internet’s Next Generation Foundation
Hacking HTTP/2 : New attacks on the Internet’s Next Generation Foundation
 
Cisco IPv6 Tutorial
Cisco IPv6 TutorialCisco IPv6 Tutorial
Cisco IPv6 Tutorial
 
OPEX reduction in telecom industry
OPEX reduction in telecom industryOPEX reduction in telecom industry
OPEX reduction in telecom industry
 
Building Healthier Communities: TEDMED 2016
Building Healthier Communities: TEDMED 2016Building Healthier Communities: TEDMED 2016
Building Healthier Communities: TEDMED 2016
 
Ipv4 vs Ipv6 comparison
Ipv4 vs Ipv6 comparisonIpv4 vs Ipv6 comparison
Ipv4 vs Ipv6 comparison
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Similar to xstream_network

Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kazuhito Ohkawa
 
Microcontroller from basic_to_advanced
Microcontroller from basic_to_advancedMicrocontroller from basic_to_advanced
Microcontroller from basic_to_advancedImran Sheikh
 
The Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorThe Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorDeepak Tomar
 
ELC 2016 - I2C hacking demystified
ELC 2016 - I2C hacking demystifiedELC 2016 - I2C hacking demystified
ELC 2016 - I2C hacking demystifiedIgor Stoppa
 
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrjRoberto Brandao
 
Summary Of Academic Projects
Summary Of Academic ProjectsSummary Of Academic Projects
Summary Of Academic Projectsawan2008
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
 
Computer Systems And Networks Configuration
Computer Systems And Networks ConfigurationComputer Systems And Networks Configuration
Computer Systems And Networks ConfigurationTara Daly
 
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systemsVsevolod Stakhov
 
Challenges and experiences with IPTV from a network point of view
Challenges and experiences with IPTV from a network point of viewChallenges and experiences with IPTV from a network point of view
Challenges and experiences with IPTV from a network point of viewbrouer
 
2nd ARM Developer Day - NXP USB Workshop
2nd ARM Developer Day - NXP USB Workshop2nd ARM Developer Day - NXP USB Workshop
2nd ARM Developer Day - NXP USB WorkshopAntonio Mondragon
 

Similar to xstream_network (20)

Introduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSPIntroduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSP
 
Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例
 
Microcontroller from basic_to_advanced
Microcontroller from basic_to_advancedMicrocontroller from basic_to_advanced
Microcontroller from basic_to_advanced
 
The Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorThe Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft Processor
 
ELC 2016 - I2C hacking demystified
ELC 2016 - I2C hacking demystifiedELC 2016 - I2C hacking demystified
ELC 2016 - I2C hacking demystified
 
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
Summary Of Academic Projects
Summary Of Academic ProjectsSummary Of Academic Projects
Summary Of Academic Projects
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
Computer Systems And Networks Configuration
Computer Systems And Networks ConfigurationComputer Systems And Networks Configuration
Computer Systems And Networks Configuration
 
Choosing the right processor
Choosing the right processorChoosing the right processor
Choosing the right processor
 
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systems
 
Challenges and experiences with IPTV from a network point of view
Challenges and experiences with IPTV from a network point of viewChallenges and experiences with IPTV from a network point of view
Challenges and experiences with IPTV from a network point of view
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
3.TechieNest microcontrollers
3.TechieNest  microcontrollers3.TechieNest  microcontrollers
3.TechieNest microcontrollers
 
2nd ARM Developer Day - NXP USB Workshop
2nd ARM Developer Day - NXP USB Workshop2nd ARM Developer Day - NXP USB Workshop
2nd ARM Developer Day - NXP USB Workshop
 
Obstacle Avoidance Robotic Vehicle
Obstacle Avoidance Robotic VehicleObstacle Avoidance Robotic Vehicle
Obstacle Avoidance Robotic Vehicle
 
soc design for dsp applications
soc design for dsp applicationssoc design for dsp applications
soc design for dsp applications
 

xstream_network

  • 1. XStream: Rapid Generation of Custom Processors for ASIC Designs by Ali Shahbazi
  • 2. 2 Overview  What is XStream ?  Comparison to Network Processors  Design Flow  Design Example: Ethernet Bridge/VLAN Switch
  • 3. 3 What is XStream ?  Software tool to rapidly generate high performance custom stream processors  Stream Processing: Repeated application of an algorithm kernel to a sequence of packets subject to throughput specifications  Resulting custom processors:  40-90% performance of a custom ASIC  < 5% design effort of a custom ASIC  Rapidly develop your own ultra high performance network processors!
  • 4. 4 When you use a Network Processor What your product looks like What your competitor’s product looks like
  • 5. 5 XStream vs Network Processor What if my application does not look like this ?
  • 6. 6 XStream vs Network Processor What if my application does not look like this ? Network Processor: No help XStream: Make a system that looks like my app in days
  • 7. 7 XStream vs Network Processor What if I want to use cheaper DDR2 instead of RDRAM or need more b/w ?
  • 8. 8 XStream vs Network Processor What if I want to use cheaper DDR2 instead of RDRAM or need more b/w ? Network Processor: No help XStream: Select a different controller from the GUI and plop it on the chip
  • 9. 9 XStream vs Network Processor  What if I need  Different type/number of micro-engines  More capable control processor  Additional high performance processors for value added services  More crypto cores  Different trie lookup hardware  Different DRAM bandwidth  Etc, etc, etc  Network processor: No help  XStream: Yes
  • 10. 10 Design Flow  Draw an architecture diagram for your application  Select processors, interfaces, IP blocks etc from a GUI  Specify parameters, throughput requirements etc  Specify the high level function of any additional custom coprocessors you need  Press a button and wait...  XStream generates the h/w for you
  • 11. 11 Design Example  Objective:  Design a platform chip that is shared across different products to save cost  Product 1: 16 port Ethernet Bridge  Product 2: 16 port VLAN switch with advanced filtering abilities  Major differences:  Wimpy ingress/egress processors ok on the bridge  VLAN Switch needs high performance ingress/egress processors  VLAN Switch needs high performance filter rule engine
  • 12. 12 XStream: Designing a Platform Chip Link Interface Port Ingress Processor Port Egress Processor Link Interface Port Ingress Processor Port Egress Processor . . . 16 ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM
  • 13. 13 The Streams in XStream Link Interface Port Ingress Processor Port Egress Processor Link Interface Port Ingress Processor Port Egress Processor . . . 16 ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM
  • 14. 14 The Streams in Xstream Link Interface Port Ingress Processor Link Interface Port Ingress Processor Port Egress Processor . . . 16 ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM Port Egress Processor
  • 15. 15 The Streams in Xstream Link Interface Port Ingress Processor Port Egress Processor Link Interface Port Ingress Processor Port Egress Processor . . . 16 ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM
  • 16. 16 XStream: Mapping the core processor Link Interface Port Ingress Processor Port Egress Processor Link Interface Port Ingress Processor Port Egress Processor . . . 16 ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM
  • 17. 17 XStream: Mapping the core processor... Ingress Queue Egress Queue Stream Processor for Switching Decisions  Imagine a snazzy GUI here  Designer says:  Stream processor, 8 issue  Stream 1: Input, 16x1 queue, N deep  Stream 2: Output,16x1 queue, M deep  Stream 3: Inout, RISC processor interface  Add a CAM: 2 port, 48 bit keys, 1024 entries, 4 way associative, hash=F(…)  The tool ponders for a while…  Says: “Yes master”
  • 18. 18 Ingress Queue Egress Queue Stream Processor for Switching Decisions  Imagine a snazzy GUI here  Designer writes 15 lines of code for the data plane, say in a subset of C  Designer says: Schedule and report  The tool ponders for a while…Says:  Compiled 45 instructions  Using modulo accelerator  Initiation interval = 8 cycles  Clock speed: 500 MHz  Throughput based on 64 byte (worst case) packet size:  500MHz/8 * 64 * 8 = 32 Gb/s  Area: 2.5mm x 2.5mm  Power: 1.2 W  Single stream processor @ 500 MHz = 32 Gb/s  Have designed up to 1 GHz processor in 0.13u process XStream: Mapping the core processor...
  • 19. 19 XStream: Mapping the ingress processor... Link Interface Port Ingress Processor Port Egress Processor Link Interface Port Ingress Processor Port Egress Processor . . . 16 ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM
  • 20. 20 XStream: Mapping the ingress processor... Port Ingress Processor Filter Rule Engine  Imagine a snazzy GUI here  Designer says:  RISC processor engine, no-cache  2 issue, scratchpad memory  Stream 1: Input, link interface  Stream 2: Output, StreamProc:Ingress Queue  Add a Filter Rule Engine: Rule complexity = 64 terms, …  The tool ponders for a while…Says:  RISC core and compiler generated  Area: 1mm x 1mm (i.e. this can be replicated 100x on a 10x10mm chip)  Power: 250 mW
  • 21. 21 Summary  Showed network processor design  But might as well be multi-media or wireless product design  Very high performance custom processors replace ASIC modules  Reduce design time for stream oriented ASIC modules by 95%  Retain 40-90% of ASIC performance  Software replaces hardware design  Software prototype already exists  Flexible, fast bug fixes, feature upgrades  Share chip across product family