SlideShare a Scribd company logo
1 of 14
Parallel And Distributed System Lab (PADS)
National Tsing Hua University, Hsinchu, Taiwan
Attackboard:
A Novel Dependency-Aware
Traffic Generator for Exploring
NoC Design Space
Yoshi Shih-Chieh Huang, June 5, 2012
Yu-Chi Chang
Tsung-Chan Tsai
Yuan-Ying Chang
Chung-Ta King
Problem Domain
 We want to use trace-driven simulation to explore network-on-
chip (NoC) performance for message-passing many-core
architecture
 Traces that record only send/receive events:
 Simple and fast 
 Lacks information about interaction between NoC and PEs 
 Unable to reflect the effects of changes in design space 
 Storage space overhead (in terms of gigabytes) 
 Recent dependency-aware traces: [1][2]
 Packet dependencies are embedded in traces 
 Packet injections can be adjusted based on dependencies when
facing different NoC configurations 
 Trace logs are very complicated and require much more space 
Traces with packet dependencies improve
accuracy but require more storage space!
2
The BIG Problem Is: Size!
 How to reduce size of dependency-aware traces while
maintaining its accuracy?
 Lossless compression, e.g., Gzip?  not enough
 Key insight:
 Each PE has its own BIG trace for NoC operations
 Each BIG trace is actually a log of the execution of the
corresponding State Machine
 10 KB codes may result in 1GB traces!
S1
Sn
S2
…
S1
Sn
S2
…
S1
Sn
S2
…
…
…….
…….
……
….
…...
…
….
…
…….
….
…….
…
3
Rebuild the State Machines
 What if we can rebuild the interacting state
machines from the traces…?
 We don’t have to record the huge traces but only
need to generate them at runtime!
 How to do?
 Intuitive idea: find repetitive patterns in traces
and fold them
 Difficulties:
 May not be matched exactly
 May be discrete and fragmented in traces
 Need to find patterns across the traces resulted by
different PEs
4
Rebuild the State Machines (cont’d)
 State transitions in state machines are
often triggered by arrivals of packets
 Leverage packet dep. information in traces
 Forget about time sequencing in traces
 How long to wait before start detecting
receiving pattern?  interval-based
 Interval I for capturing receiving patterns
 Captured results are attackboards
 Interval I’ for reproducing traffic
 For driving the attackboards
5
PEj
I
Data Structure of an Attackboard
 Each PE has an attackboard
 An injection Y is allowed if the necessary conditions
are satisfied
 Necessary conditions = predecessors = must receive
packets from X
Requires
pkt from
1?
Requires
pkt from
2?
Requires
pkt from
3?
…
Requires
pkt from
X?
Inject to …
Yes Yes No Dest. Y
necessary conditions Inject if satisfied
…
6
Attackboard Traffic Generator
1. Rebuild the state machines as attackboards
 Use packet arrival patterns in traces to rebuild
the state machine
 Space complexity:
O(execution time)  O(# of patterns)
2. Compact the states
 Merge duplicated patterns
3. Drive the state machines, i.e., attackboard
 Inject traffic based on the rebuilt machine
7
An Illustration (Viewpoint of PE 4)
Parallel Program Execution
Executionflowofparallelprogram
PE1 PE2 PE3 PE4
recv 1
recv 2
send 1
Interval I
Packets Dependencies Injection Info.
1 1 0 0 (3, flit counts)
Attackboard entries of PE 4
 Compress the entries with the
same packet dependencies
 Merge duplicated entries
recv 5
recv 6
recv 7
send 3
Interval I
recv 3
recv 4
send 2
Interval I
Packets Dependencies Injection Info.
1 1 1 0 (2, flit counts)
Packets Dependencies Injection Info.
1 1 0 0 (3, flit counts)
Packets Dependencies Injection Info.
1 1 0 0 (3, flit counts)
recv 5
recv 6
recv 7
send 3
Interval I
How long to wait before start
detecting receiving pattern?
Interval I decides the pattern!
8
Driving the Attackboards!
 Traffic generation with
Attackboard
 Router receives stats are
cleared every I’ interval
 What if no exact match in I’?
 Match the entry which has the
highest similarity
 Similar to bloom filter (deny
for sure, allow with high
confidence)
Attackboard Traffic Generator (ATG)
1 1 0 0 traffic
Router
ATG
NoC
0 0 0 0
current
receive sourceInject!
Attackboard of PE4
router receive status
of PE4
1 0 0 01 1 0 0
Match!
Design details are left in the poster.
You are welcome to take a look!
9
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1200 1210 1220 1230 1240 1250 1260 1270 1280 1290
Averagenetworkdelay(normalized)
Interval of traffic generation (I')
100
10K
15K
Dependency
extraction
interval I
Space Overhead and Accuracy
1 S.Bell et al.[ISSCC 2008]
Storage Space Overhead
Storage space can be
greatly reduced!
20% storage space can be
reduced in computation-
intensive benchmark!
0
0.5
1
1.5
2
2.5
3
3.5
4
800 900 10001100120013001400150016001700
Averagenetworkdelay(normalized)
Interval of traffic generation (I')
100
1000
5000
10000
Dependency
extraction
interval I
IMB-BcastParallel Object Detection
I and I’ should be
properly selected to give
accurate results
In this case, I does not have much
impact compared with POD.
IMB-Bcast POD (2 frames)
Simulation Platform
Processor element Tilera TILE64 1
Processor frequency 700 Mhz
Simulated topology 4×4 mesh network
Routing algorithm Dimension-order
Bandwidth 1 flit/cycle per port
Benchmark Intel MPI Benchmarks
Parallel Object Detection
10
Conclusion & Future Works
 Attackboard not only compresses the NoC traces, but
generates them at runtime
 Key ideas:
 Programs  raw trace logs  use arrival patterns to
rebuild state machine (as Attackboards)  rebuild trace
logs
 Benefits
 Strikes a good tradeoff between accuracy and space overhead
 Limitations
 Currently only for message-passing programs
 Only suitable for injections with strong dependencies
 Future works
 Take the computation time before an injection into consideration
 Make the tricky parameters (I and I’) disappeared
11
Thank You!
To learn more, please come to my poster!
Selected References
1. Netrace: dependency-driven trace-based network-on-chip
simulation.
 Joel Hestness, Boris Grot, and Stephen W. Keckler. 2010.
 In Proceedings of the Third International Workshop on Network
on Chip Architectures (NoCArc '10)
2. Inferring packet dependencies to improve trace based
simulation of on-chip networks.
 Christopher Nitta, Matthew Farrens, Kevin Macdonald, and
Venkatesh Akella. 2011.
 In Proceedings of the Fifth ACM/IEEE International Symposium
on Networks-on-Chip (NOCS '11)
Backup: More Frames in POD
Trace logs O(execution time)
Attackboard O(# of patterns)
1
10
100
1000
10000
100000
1000000
10000000
2 50 100 1000
Trace logs
Attackboard
Size(bytes)
# of frames

More Related Content

What's hot

Chap06 block cipher operation
Chap06 block cipher operationChap06 block cipher operation
Chap06 block cipher operation
Nam Yong Kim
 
Debugging With Id
Debugging With IdDebugging With Id
Debugging With Id
guest215c4e
 

What's hot (18)

CNIT 141: 5. More About Block Ciphers + Modular Arithmetic 2
CNIT 141: 5. More About Block Ciphers + Modular Arithmetic 2CNIT 141: 5. More About Block Ciphers + Modular Arithmetic 2
CNIT 141: 5. More About Block Ciphers + Modular Arithmetic 2
 
Secured algorithm for gsm encryption & decryption
Secured algorithm for gsm encryption & decryptionSecured algorithm for gsm encryption & decryption
Secured algorithm for gsm encryption & decryption
 
CNIT 141: 5. Stream Ciphers
CNIT 141: 5. Stream CiphersCNIT 141: 5. Stream Ciphers
CNIT 141: 5. Stream Ciphers
 
Botnet detection using cluster ensemble with cart
Botnet detection using cluster ensemble with cartBotnet detection using cluster ensemble with cart
Botnet detection using cluster ensemble with cart
 
Block Ciphers and the Data Encryption Standard
Block Ciphers and the Data Encryption StandardBlock Ciphers and the Data Encryption Standard
Block Ciphers and the Data Encryption Standard
 
Block Cipher Modes of Operation And Cmac For Authentication
Block Cipher Modes of Operation And Cmac For AuthenticationBlock Cipher Modes of Operation And Cmac For Authentication
Block Cipher Modes of Operation And Cmac For Authentication
 
Chap06 block cipher operation
Chap06 block cipher operationChap06 block cipher operation
Chap06 block cipher operation
 
Next generation block ciphers
Next generation block ciphersNext generation block ciphers
Next generation block ciphers
 
Modern Cryptography
Modern CryptographyModern Cryptography
Modern Cryptography
 
Debugging With Id
Debugging With IdDebugging With Id
Debugging With Id
 
Cryptography
CryptographyCryptography
Cryptography
 
RC4&RC5
RC4&RC5RC4&RC5
RC4&RC5
 
Unit 2
Unit 2Unit 2
Unit 2
 
Software Security
Software SecuritySoftware Security
Software Security
 
Post quantum cryptography
Post quantum cryptographyPost quantum cryptography
Post quantum cryptography
 
An effective RC4 Stream Cipher
An effective RC4 Stream CipherAn effective RC4 Stream Cipher
An effective RC4 Stream Cipher
 
Unikernels, Multikernels, Virtual Machine-based Kernels
Unikernels, Multikernels, Virtual Machine-based KernelsUnikernels, Multikernels, Virtual Machine-based Kernels
Unikernels, Multikernels, Virtual Machine-based Kernels
 
RC 4
RC 4 RC 4
RC 4
 

Viewers also liked

Final keeping the promise
Final keeping the promiseFinal keeping the promise
Final keeping the promise
3GDR
 
11h04 lesley ann
11h04 lesley ann11h04 lesley ann
11h04 lesley ann
3GDR
 
História Japonesa
História JaponesaHistória Japonesa
História Japonesa
Isabel Rosa
 

Viewers also liked (20)

Final keeping the promise
Final keeping the promiseFinal keeping the promise
Final keeping the promise
 
Código de ética do profissional de Biblioteconomia
Código de ética do profissional de BiblioteconomiaCódigo de ética do profissional de Biblioteconomia
Código de ética do profissional de Biblioteconomia
 
11h04 lesley ann
11h04 lesley ann11h04 lesley ann
11h04 lesley ann
 
HEADACHE
HEADACHE HEADACHE
HEADACHE
 
Google IO 2015: Whats New?
Google IO 2015: Whats New?Google IO 2015: Whats New?
Google IO 2015: Whats New?
 
História Japonesa
História JaponesaHistória Japonesa
História Japonesa
 
Architectural modeling
Architectural modelingArchitectural modeling
Architectural modeling
 
Las familias navarras, las que mas gastan en ocio, cultura y espectaculos. di...
Las familias navarras, las que mas gastan en ocio, cultura y espectaculos. di...Las familias navarras, las que mas gastan en ocio, cultura y espectaculos. di...
Las familias navarras, las que mas gastan en ocio, cultura y espectaculos. di...
 
Redes Sociales
Redes SocialesRedes Sociales
Redes Sociales
 
Nuance Guide to Advancing the mHealth ecosystem
Nuance Guide to Advancing the mHealth ecosystemNuance Guide to Advancing the mHealth ecosystem
Nuance Guide to Advancing the mHealth ecosystem
 
Nikola Tesla's smart Home
Nikola Tesla's smart Home Nikola Tesla's smart Home
Nikola Tesla's smart Home
 
解密胡子节(Movember) -社会化媒体之现象2015
解密胡子节(Movember) -社会化媒体之现象2015解密胡子节(Movember) -社会化媒体之现象2015
解密胡子节(Movember) -社会化媒体之现象2015
 
Compétition Levis 501 Sup de Pub
Compétition Levis 501 Sup de PubCompétition Levis 501 Sup de Pub
Compétition Levis 501 Sup de Pub
 
Storyboard_APP_TV-ORANGE
Storyboard_APP_TV-ORANGEStoryboard_APP_TV-ORANGE
Storyboard_APP_TV-ORANGE
 
5 Proven Ways to Leverage Net Promoter Score
5 Proven Ways to Leverage Net Promoter Score 5 Proven Ways to Leverage Net Promoter Score
5 Proven Ways to Leverage Net Promoter Score
 
Vertigo
VertigoVertigo
Vertigo
 
Ερνέστο Τσε Γκεβάρα
Ερνέστο Τσε ΓκεβάραΕρνέστο Τσε Γκεβάρα
Ερνέστο Τσε Γκεβάρα
 
O átomo e sua estrutura -
O átomo e sua estrutura - O átomo e sua estrutura -
O átomo e sua estrutura -
 
Google Big Table
Google Big TableGoogle Big Table
Google Big Table
 
Micro plasma welding technology and applications ppt
Micro plasma welding technology and applications pptMicro plasma welding technology and applications ppt
Micro plasma welding technology and applications ppt
 

Similar to Attackboard slides dac12-0605

Procedia Computer Science 46 ( 2015 ) 1072 – 1078 187.docx
Procedia Computer Science   46  ( 2015 )  1072 – 1078 187.docxProcedia Computer Science   46  ( 2015 )  1072 – 1078 187.docx
Procedia Computer Science 46 ( 2015 ) 1072 – 1078 187.docx
AASTHA76
 
Network Flow Analysis
Network Flow AnalysisNetwork Flow Analysis
Network Flow Analysis
guest23ccda3
 
Network Flow Analysis
Network Flow AnalysisNetwork Flow Analysis
Network Flow Analysis
guest23ccda3
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
igeeks1234
 

Similar to Attackboard slides dac12-0605 (20)

ECET 465 help Making Decisions/Snaptutorial
ECET 465 help Making Decisions/SnaptutorialECET 465 help Making Decisions/Snaptutorial
ECET 465 help Making Decisions/Snaptutorial
 
ECET 465 Technology levels--snaptutorial.com
ECET 465 Technology levels--snaptutorial.comECET 465 Technology levels--snaptutorial.com
ECET 465 Technology levels--snaptutorial.com
 
Ecet 465 Massive Success / snaptutorial.com
Ecet 465  Massive Success / snaptutorial.comEcet 465  Massive Success / snaptutorial.com
Ecet 465 Massive Success / snaptutorial.com
 
IEEE 2014 JAVA NETWORKING PROJECTS Receiver based flow control for networks i...
IEEE 2014 JAVA NETWORKING PROJECTS Receiver based flow control for networks i...IEEE 2014 JAVA NETWORKING PROJECTS Receiver based flow control for networks i...
IEEE 2014 JAVA NETWORKING PROJECTS Receiver based flow control for networks i...
 
2014 IEEE JAVA NETWORKING PROJECT Receiver based flow control for networks in...
2014 IEEE JAVA NETWORKING PROJECT Receiver based flow control for networks in...2014 IEEE JAVA NETWORKING PROJECT Receiver based flow control for networks in...
2014 IEEE JAVA NETWORKING PROJECT Receiver based flow control for networks in...
 
Ecet 465 Success Begins / snaptutorial.com
Ecet 465   Success Begins / snaptutorial.comEcet 465   Success Begins / snaptutorial.com
Ecet 465 Success Begins / snaptutorial.com
 
Us 13-opi-evading-deep-inspection-for-fun-and-shell-wp
Us 13-opi-evading-deep-inspection-for-fun-and-shell-wpUs 13-opi-evading-deep-inspection-for-fun-and-shell-wp
Us 13-opi-evading-deep-inspection-for-fun-and-shell-wp
 
TCP/IP 3RD SEM.2012 AUG.ASSIGNMENT
TCP/IP 3RD SEM.2012 AUG.ASSIGNMENTTCP/IP 3RD SEM.2012 AUG.ASSIGNMENT
TCP/IP 3RD SEM.2012 AUG.ASSIGNMENT
 
OSI model (7 LAYER )
OSI model (7 LAYER )OSI model (7 LAYER )
OSI model (7 LAYER )
 
Analytical Modeling of End-to-End Delay in OpenFlow Based Networks
Analytical Modeling of End-to-End Delay in OpenFlow Based NetworksAnalytical Modeling of End-to-End Delay in OpenFlow Based Networks
Analytical Modeling of End-to-End Delay in OpenFlow Based Networks
 
Simulating the behavior of satellite Internet links to small islands
Simulating the behavior of satellite Internet links to small islandsSimulating the behavior of satellite Internet links to small islands
Simulating the behavior of satellite Internet links to small islands
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processors
 
Interactive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science CloudInteractive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science Cloud
 
Procedia Computer Science 46 ( 2015 ) 1072 – 1078 187.docx
Procedia Computer Science   46  ( 2015 )  1072 – 1078 187.docxProcedia Computer Science   46  ( 2015 )  1072 – 1078 187.docx
Procedia Computer Science 46 ( 2015 ) 1072 – 1078 187.docx
 
Network Flow Analysis
Network Flow AnalysisNetwork Flow Analysis
Network Flow Analysis
 
Network Flow Analysis
Network Flow AnalysisNetwork Flow Analysis
Network Flow Analysis
 
Network Bottleneck Avoidance Using Edge Routers
Network Bottleneck Avoidance Using Edge RoutersNetwork Bottleneck Avoidance Using Edge Routers
Network Bottleneck Avoidance Using Edge Routers
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
 
Me,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsiMe,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsi
 

More from Yoshi Shih-Chieh Huang (6)

Flash knowledge
Flash knowledgeFlash knowledge
Flash knowledge
 
Usage and Comparisons of Control Group in Android AOSP: Marshmallow and Before
Usage and Comparisons of Control Group in Android AOSP: Marshmallow and BeforeUsage and Comparisons of Control Group in Android AOSP: Marshmallow and Before
Usage and Comparisons of Control Group in Android AOSP: Marshmallow and Before
 
Short intro to project butter
Short intro to project butterShort intro to project butter
Short intro to project butter
 
On the End-to-End Traffic Prediction in the On-Chip Networks
On the End-to-End Traffic Prediction in the On-Chip NetworksOn the End-to-End Traffic Prediction in the On-Chip Networks
On the End-to-End Traffic Prediction in the On-Chip Networks
 
Logical Time
Logical TimeLogical Time
Logical Time
 
Introduction to amazon web service (clean)
Introduction to amazon web service (clean)Introduction to amazon web service (clean)
Introduction to amazon web service (clean)
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Attackboard slides dac12-0605

  • 1. Parallel And Distributed System Lab (PADS) National Tsing Hua University, Hsinchu, Taiwan Attackboard: A Novel Dependency-Aware Traffic Generator for Exploring NoC Design Space Yoshi Shih-Chieh Huang, June 5, 2012 Yu-Chi Chang Tsung-Chan Tsai Yuan-Ying Chang Chung-Ta King
  • 2. Problem Domain  We want to use trace-driven simulation to explore network-on- chip (NoC) performance for message-passing many-core architecture  Traces that record only send/receive events:  Simple and fast   Lacks information about interaction between NoC and PEs   Unable to reflect the effects of changes in design space   Storage space overhead (in terms of gigabytes)   Recent dependency-aware traces: [1][2]  Packet dependencies are embedded in traces   Packet injections can be adjusted based on dependencies when facing different NoC configurations   Trace logs are very complicated and require much more space  Traces with packet dependencies improve accuracy but require more storage space! 2
  • 3. The BIG Problem Is: Size!  How to reduce size of dependency-aware traces while maintaining its accuracy?  Lossless compression, e.g., Gzip?  not enough  Key insight:  Each PE has its own BIG trace for NoC operations  Each BIG trace is actually a log of the execution of the corresponding State Machine  10 KB codes may result in 1GB traces! S1 Sn S2 … S1 Sn S2 … S1 Sn S2 … … ……. ……. …… …. …... … …. … ……. …. ……. … 3
  • 4. Rebuild the State Machines  What if we can rebuild the interacting state machines from the traces…?  We don’t have to record the huge traces but only need to generate them at runtime!  How to do?  Intuitive idea: find repetitive patterns in traces and fold them  Difficulties:  May not be matched exactly  May be discrete and fragmented in traces  Need to find patterns across the traces resulted by different PEs 4
  • 5. Rebuild the State Machines (cont’d)  State transitions in state machines are often triggered by arrivals of packets  Leverage packet dep. information in traces  Forget about time sequencing in traces  How long to wait before start detecting receiving pattern?  interval-based  Interval I for capturing receiving patterns  Captured results are attackboards  Interval I’ for reproducing traffic  For driving the attackboards 5 PEj I
  • 6. Data Structure of an Attackboard  Each PE has an attackboard  An injection Y is allowed if the necessary conditions are satisfied  Necessary conditions = predecessors = must receive packets from X Requires pkt from 1? Requires pkt from 2? Requires pkt from 3? … Requires pkt from X? Inject to … Yes Yes No Dest. Y necessary conditions Inject if satisfied … 6
  • 7. Attackboard Traffic Generator 1. Rebuild the state machines as attackboards  Use packet arrival patterns in traces to rebuild the state machine  Space complexity: O(execution time)  O(# of patterns) 2. Compact the states  Merge duplicated patterns 3. Drive the state machines, i.e., attackboard  Inject traffic based on the rebuilt machine 7
  • 8. An Illustration (Viewpoint of PE 4) Parallel Program Execution Executionflowofparallelprogram PE1 PE2 PE3 PE4 recv 1 recv 2 send 1 Interval I Packets Dependencies Injection Info. 1 1 0 0 (3, flit counts) Attackboard entries of PE 4  Compress the entries with the same packet dependencies  Merge duplicated entries recv 5 recv 6 recv 7 send 3 Interval I recv 3 recv 4 send 2 Interval I Packets Dependencies Injection Info. 1 1 1 0 (2, flit counts) Packets Dependencies Injection Info. 1 1 0 0 (3, flit counts) Packets Dependencies Injection Info. 1 1 0 0 (3, flit counts) recv 5 recv 6 recv 7 send 3 Interval I How long to wait before start detecting receiving pattern? Interval I decides the pattern! 8
  • 9. Driving the Attackboards!  Traffic generation with Attackboard  Router receives stats are cleared every I’ interval  What if no exact match in I’?  Match the entry which has the highest similarity  Similar to bloom filter (deny for sure, allow with high confidence) Attackboard Traffic Generator (ATG) 1 1 0 0 traffic Router ATG NoC 0 0 0 0 current receive sourceInject! Attackboard of PE4 router receive status of PE4 1 0 0 01 1 0 0 Match! Design details are left in the poster. You are welcome to take a look! 9
  • 10. 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1200 1210 1220 1230 1240 1250 1260 1270 1280 1290 Averagenetworkdelay(normalized) Interval of traffic generation (I') 100 10K 15K Dependency extraction interval I Space Overhead and Accuracy 1 S.Bell et al.[ISSCC 2008] Storage Space Overhead Storage space can be greatly reduced! 20% storage space can be reduced in computation- intensive benchmark! 0 0.5 1 1.5 2 2.5 3 3.5 4 800 900 10001100120013001400150016001700 Averagenetworkdelay(normalized) Interval of traffic generation (I') 100 1000 5000 10000 Dependency extraction interval I IMB-BcastParallel Object Detection I and I’ should be properly selected to give accurate results In this case, I does not have much impact compared with POD. IMB-Bcast POD (2 frames) Simulation Platform Processor element Tilera TILE64 1 Processor frequency 700 Mhz Simulated topology 4×4 mesh network Routing algorithm Dimension-order Bandwidth 1 flit/cycle per port Benchmark Intel MPI Benchmarks Parallel Object Detection 10
  • 11. Conclusion & Future Works  Attackboard not only compresses the NoC traces, but generates them at runtime  Key ideas:  Programs  raw trace logs  use arrival patterns to rebuild state machine (as Attackboards)  rebuild trace logs  Benefits  Strikes a good tradeoff between accuracy and space overhead  Limitations  Currently only for message-passing programs  Only suitable for injections with strong dependencies  Future works  Take the computation time before an injection into consideration  Make the tricky parameters (I and I’) disappeared 11
  • 12. Thank You! To learn more, please come to my poster!
  • 13. Selected References 1. Netrace: dependency-driven trace-based network-on-chip simulation.  Joel Hestness, Boris Grot, and Stephen W. Keckler. 2010.  In Proceedings of the Third International Workshop on Network on Chip Architectures (NoCArc '10) 2. Inferring packet dependencies to improve trace based simulation of on-chip networks.  Christopher Nitta, Matthew Farrens, Kevin Macdonald, and Venkatesh Akella. 2011.  In Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip (NOCS '11)
  • 14. Backup: More Frames in POD Trace logs O(execution time) Attackboard O(# of patterns) 1 10 100 1000 10000 100000 1000000 10000000 2 50 100 1000 Trace logs Attackboard Size(bytes) # of frames

Editor's Notes

  1. Good afternoon. My name is Yoshi, from Taiwan National Tsing Hua University. It’s my pleasure to present our research work “Attackboard”, which is a novel dependency-aware traffic generator for NoC design spsace exploration.
  2. First of all, our target platform is a message-passing many-core architecture. Here we discuss trace-driven simulation rather than full-system simulation due to several considerations, especially the simulation speed. So what’s the advantages and disadvantages of trace-driven simulation? It is simple and fast, widely used by many research works. But it lacks the information of the interaction between NoC and processing elements, so many changes cannot be reflected by trace-driven simulator. For example, if we increase the speed of processing elements, the trace-driven simulator usually cannot reflect the changes . And the last one is, they are usually large logs, in terms of gigabytes. For tackling the problem of inaccuracy, recent works have done “dependency-aware” traces. The dependency, aka causality, are embedded into the trace logs. While injecting a packet, the packets before and after the injecting one are considered. As a result, it can improve the accuracy of trace-driven simulation. However, the causalities are usually complicated, and it makes the trace logs larger. In summary, traces with packet dependencies improve accuracy, but require more storage spsace!
  3. The BIG problem is: size. So how to reduce the size of traces while maintaining accuracy? Usually general compression is used, such as gzip. Certainly, it is not a good solution. Our key insight is that: (1) Each processing element has its own big trace for NoC operations. (2) Each big trace is actually a log of the execution of the corresponding State Machine. The left part shows the trace logs, and the right part shows the state machines. Certainly, there are relationships between them. For example, 10kb codes may cause more than 1GB traces. 10kb for the state machines, and 1GB for the trace logs.
  4. So now the problem is, we are trying to rebuild the state machine from the resulting traces, not from the source codes. We found that we can use simple and small tables to represent the state machines. That is Attackboard. There are several possible ways to capture the characteristics from the traces. Intuitively, finding the repetitive patterns and folding them are the simplest way. However, it is difficult because of the following considerations: The patterns may not be exactly matched. The repetitive patterns may be discrete and fragmented in trace logs To find the repetitive patterns, we need to traverse the independent trace logs across the processing elements.
  5. Our key observation is that, state transitions in state machine are usually triggered by the arrivals of packets. So we want to leverage packet dependency information in traces. Note that we focus on patterns of received packets. Our proposed solution is an interval-based solution, which periodically capture the traffic patterns. After collection the patterns, we can reproduce the traffic then. Here, I is the interval to capture patterns, and I’ is the interval to reproduce. Selecting I and I’ is tricky, and the details can be found in the paper.
  6. To summarize, we have three key ideas. Idea 1, we want to transform the time-driven logs into pattern-driven tables. Since the reason of causing large files are time, for example, cycle-accurate logs are usually huge files, so Attackboard forgets the time sequencing in traces, and uses packet arrival patterns to drive sequencing of states, and do the corresponding injections. After this, idea 2 is for minimizing the table sizes. Once all the trace logs are transformed into table with patterns, the following issue is how to match the patterns in the table. In the following slides, we use an illustration to show the three key ideas.
  7. Here is the illustration. Let’s use the view point of PE4. Assuming there are four PEs, and PE 1 and 2 send packets to PE four subsequently. By looking behind an interval I, we conclude that send 1 (payload = 2 flits) from core 4 to core 3, is caused by the previous 2 sends. The corresponding attackboard is here. As you can see, each column stands for the “necessary condition” for an injection. So the first two columns, standing for core 1 and 2 are marked as 1. Similarly, we can derive another attackboard, like this. Repeat this process, we can get many attackboards, here we show 3 different attackboards. One intuitive problem is: these attackboards may be similar, or even totally the same. So the following process is try to merge those attackboards which have similar column entries. Certainly, while merging the attackboards, some conditions muse be satisfied. The details are not shown in this slides.
  8. Here we show the process of generating traffic with the help of an attackboard. This simple example has only four columns (predecessors). First, the router receives a packet from node 1, so the first column of the “router receive status” is marked as 1. After a while, the router receives another packet from node 2, so the second column of the router receive status is also marked as 1. As you can see, the router receive status now is the same with the attackboard, so it strikes a match. The corresponding traffic will be injected by this router. After previous and this slide, you should know that how Attackboard works. Of course, there are still some problems to solve. One important problem is to select the suitable I and I’. As we mentioned before, the dependency of an injection may not be totally captured with an interval I. So our third key idea is for solving this problem. We use the idea of bloom filter. Which may inject the entry which has the highest similarity in the attackboard. The design details are left in our poster. You are welcome to join our discussion.
  9. In this slide, we show some experiment results. Our main considerations are space overhead and accuracy. We use Tilera’s TILE64, the settings are shown in this Table. We simulate a 4x4 mesh network, and use two benchmark programs. One is Broadcast in Intel MPI benchmark, which has simple behaviors. The other one is parallel object detection, relatively complicated communication causalities. First let’s see the IMB, the required space by Attackboard is more than 100x smaller than original traces. The improvement in parallel object detection is not that much, because the execution time is quite short and the traffic patterns are not obvious. Second, let’s see the accuracy, here two parameters are I and I’. The former is the interval for capturing dependencies, and the latter is the interval for generating injections. While the capturing interval is too small, the dependencies are hard to be observed, and the accuracy is quite low. While the interval increases, the traffic generated by Attackboard is quite similar to the original accuracy. In IMB-Broadcast, the selection of I and I’ are not that important. The accuracy ranges from 0.8 to 1.1. Very close to original accuracy. This is because that the dependency is quite simple in broadcast, and thus easier to be captured. However, selecting good I and I’ are important and tricky, which behaves as two magic numbers to compress the trace log. Or, in other words, two magic numbers to rebuild the behaviors of original state machines. In our following work, we have emphasized this problem and the values of I and I’ can be automatically selected.