2. 2
Overview
What is XStream ?
Comparison to Network Processors
Design Flow
Design Example: Ethernet Bridge/VLAN
Switch
3. 3
What is XStream ?
Software tool to rapidly generate high
performance custom stream processors
Stream Processing: Repeated application of an algorithm kernel to
a sequence of packets subject to throughput specifications
Resulting custom processors:
40-90% performance of a custom ASIC
< 5% design effort of a custom ASIC
Rapidly develop your own ultra high
performance network processors!
4. 4
When you use a Network Processor
What your product looks like What your competitor’s
product looks like
6. 6
XStream vs Network Processor
What if my application does not look like this ?
Network Processor: No help
XStream: Make a system that looks like my app in days
7. 7
XStream vs Network Processor
What if I want to use cheaper DDR2 instead of RDRAM or need more b/w ?
8. 8
XStream vs Network Processor
What if I want to use cheaper DDR2 instead of RDRAM or need more b/w ?
Network Processor: No help
XStream: Select a different controller from the GUI and plop it on the chip
9. 9
XStream vs Network Processor
What if I need
Different type/number of micro-engines
More capable control processor
Additional high performance processors for value
added services
More crypto cores
Different trie lookup hardware
Different DRAM bandwidth
Etc, etc, etc
Network processor: No help
XStream: Yes
10. 10
Design Flow
Draw an architecture diagram for your application
Select processors, interfaces, IP blocks etc from a
GUI
Specify parameters, throughput requirements etc
Specify the high level function of any additional
custom coprocessors you need
Press a button and wait...
XStream generates the h/w for you
11. 11
Design Example
Objective:
Design a platform chip that is shared across different
products to save cost
Product 1: 16 port Ethernet Bridge
Product 2: 16 port VLAN switch with advanced
filtering abilities
Major differences:
Wimpy ingress/egress processors ok on the bridge
VLAN Switch needs high performance ingress/egress
processors
VLAN Switch needs high performance filter rule
engine
12. 12
XStream: Designing a Platform Chip
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
.
.
.
16 ports
Ingress
Queue
Egress
Queue
Crossbar
Stream
Processor
for
Switching
Decisions
Control
Processor
External
DRAM
13. 13
The Streams in XStream
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
.
.
.
16 ports
Ingress
Queue
Egress
Queue
Crossbar
Stream
Processor
for
Switching
Decisions
Control
Processor
External
DRAM
14. 14
The Streams in Xstream
Link
Interface
Port
Ingress
Processor
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
.
.
.
16 ports
Ingress
Queue
Egress
Queue
Crossbar
Stream
Processor
for
Switching
Decisions
Control
Processor
External
DRAM
Port
Egress
Processor
15. 15
The Streams in Xstream
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
.
.
.
16 ports
Ingress
Queue
Egress
Queue
Crossbar
Stream
Processor
for
Switching
Decisions
Control
Processor
External
DRAM
16. 16
XStream: Mapping the core processor
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
.
.
.
16 ports
Ingress
Queue
Egress
Queue
Crossbar
Stream
Processor
for
Switching
Decisions
Control
Processor
External
DRAM
17. 17
XStream: Mapping the core processor...
Ingress
Queue
Egress
Queue
Stream
Processor
for
Switching
Decisions
Imagine a snazzy GUI here
Designer says:
Stream processor, 8 issue
Stream 1: Input, 16x1 queue, N deep
Stream 2: Output,16x1 queue, M deep
Stream 3: Inout, RISC processor
interface
Add a CAM: 2 port, 48 bit keys, 1024
entries, 4 way associative, hash=F(…)
The tool ponders for a while…
Says: “Yes master”
18. 18
Ingress
Queue
Egress
Queue
Stream
Processor
for
Switching
Decisions
Imagine a snazzy GUI here
Designer writes 15 lines of code for the data plane,
say in a subset of C
Designer says: Schedule and report
The tool ponders for a while…Says:
Compiled 45 instructions
Using modulo accelerator
Initiation interval = 8 cycles
Clock speed: 500 MHz
Throughput based on 64 byte (worst case)
packet size:
500MHz/8 * 64 * 8 = 32 Gb/s
Area: 2.5mm x 2.5mm
Power: 1.2 W
Single stream processor @ 500 MHz = 32 Gb/s
Have designed up to 1 GHz processor in 0.13u
process
XStream: Mapping the core processor...
19. 19
XStream: Mapping the ingress processor...
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
Link
Interface
Port
Ingress
Processor
Port
Egress
Processor
.
.
.
16 ports
Ingress
Queue
Egress
Queue
Crossbar
Stream
Processor
for
Switching
Decisions
Control
Processor
External
DRAM
20. 20
XStream: Mapping the ingress processor...
Port
Ingress
Processor
Filter
Rule
Engine
Imagine a snazzy GUI here
Designer says:
RISC processor engine, no-cache
2 issue, scratchpad memory
Stream 1: Input, link interface
Stream 2: Output, StreamProc:Ingress
Queue
Add a Filter Rule Engine: Rule
complexity = 64 terms, …
The tool ponders for a while…Says:
RISC core and compiler generated
Area: 1mm x 1mm (i.e. this can be
replicated 100x on a 10x10mm chip)
Power: 250 mW
21. 21
Summary
Showed network processor design
But might as well be multi-media or wireless product
design
Very high performance custom processors replace
ASIC modules
Reduce design time for stream oriented ASIC modules
by 95%
Retain 40-90% of ASIC performance
Software replaces hardware design
Software prototype already exists
Flexible, fast bug fixes, feature upgrades
Share chip across product family