SlideShare a Scribd company logo
1 of 23
Download to read offline
Novel Architectures for Applications in
Data Science and Beyond
Jason Riedy, Jeffrey Young, Tom Conte
Center for Research into Novel Computing Hierarchies at Georgia Tech
1 March 2019
Apps: Massive+-scale data analysis
Cyber-security Identify anomalies, malicious actors
Health care Find outbreaks, population epidemiology, similar
patient association
Social networks Advertising, searching, grouping
Intelligence Decisions at scale, regulating markets, smart &
sustainable cities
Systems biology Understanding interactions, drug design
Power grid / Smart cities Disruptions, conservation, prediction
Irregular data access. Changing data.
Rogues Gallery — 1 Mar 2019 2/23
High-Performance Data Analysis (HPDA)
Novel applications:
• Data at scale and speed needs new ideas for
computing analysis.
• “Big data” platforms fare poorly v. a single thread
plus large SSD even for static data sets. (McSherry,
Isard, Murray. “Scalability! But at what COST?” HotOS
XV, 2015.)
• Many high-level codes are written and re-written to
answer one question: need flexibility.
• Some primitives may be tuned and re-used.
Rogues Gallery — 1 Mar 2019 3/23
HPDA and Architectures
• Current architectures are hitting limits on
manufacturing, heat dissipation, memory latency...
• New architecture proposals are difficult to evaluate
via simulation and modeling alone.
• What happens when novel prototypes hit reality?
• Architects/designers need rapid feedback on new
ideas.
• New ideas only become successful with a
community: software ecosystem, trained students.
Need bridges between apps and architects.
Rogues Gallery — 1 Mar 2019 4/23
Introducing the CRNCH Rogues Gallery
A physical & virtual space for hosting novel computing
architectures, systems, and accelerators.
Emu Chick FPGAs & HMC/HBM
FPAA
Amortize effort and cost of trying novel architectures.
Break the “but it’s too much work” barrier.
http://crnch.gatech.edu/rogues-gallery
Rogues Gallery — 1 Mar 2019 5/23
Emu Technology’s PGAS Architecture
1 nodelet
Gossamer
Core 1
Memory-Side Processor
Gossamer
Core 4
...
Migration Engine
RapidIODisk I/O
8 nodelets
per node
64 nodelets
per Chick
RapidIO
Stationary
Core
• Multithreaded multicore
• Memory-side “processor” for
operations in
narrow-channel DRAM
• Stationary core for OS
• Threads migrate in
hardware on reads!
• Optimize for weak locality
Rogues Gallery — 1 Mar 2019 6/23
3D Stacked Memory and FPGAs
FPGAs enable flexible
“near-memory” research for both
hardware and programming
models:
• Micron/Pico EX700 & HMC
• Nallatech 385s (Arria 10)
• Intel Arria10 DevKit
• Nallatech 520N (Stratix 10)
• Xilinx MpSOC
Rogues Gallery — 1 Mar 2019 7/23
Neuromorphic Systems
• Field-Programmable Analog Array (FPAA) System-On
Chip, designed in the lab of Dr. Jennifer Hasler.
• Analog arrays can achieve unprecedented power and
size reductions.
FPAA “driven” by a RPi Neuromorphic Workshop,
27 Apr 2018
Rogues Gallery — 1 Mar 2019 8/23
Flexible Infrastructure
login /
notebook
rg-adm
Slurm Ctl
toolbox
(NFS)
Scheduling,
Tools, and
Admin
Key:
Schedulable Resource
Physical Resource
VM
USB device
User
Resources
fpaa-host
power-host
nvidia-tegra-N
nvidia-tegra-1
fpaa-dev
rg-db
Slurm DBD
emu-dev emu-chick
..Nfpga-dev-1
fpga-hmcfpga-intel
Powell, Riedy, Young, and Conte. “Wrangling Rogues: Managing
Experimental Post-Moore Architectures.”
https://arxiv.org/abs/1808.06334
• Available. Plans to
integrate with NSF
XSEDE.
• Scheduler being
deployed.
• Incorporates
Singularity and virtual
machines for
OS/library versioning.
• Fun tools: usbip for
FPAA... Analog inputs?
Rogues Gallery — 1 Mar 2019 9/23
Rogues Gallery Press
On The Next Platform, a partner of The Register:
https://www.nextplatform.com/2018/08/27/a-rogues-gallery-of-post-moores-law-options/
Rogues Gallery — 1 Mar 2019 10/23
Neuromorphic Workshop - April 2018
FPAA-focused workshop led by Dr.
Hasler introduced the wider
community to FPAA programming
for neuromorphic systems.
• Tutorial-style class
• Lightning talks from GT and
DoE researchers
• http://crnch.gatech.
edu/neuro-workshop18
Rogues Gallery — 1 Mar 2019 11/23
Rogues Gallery ASPLOS Tutorial
Programming Novel Architectures in the Post-Moore Era
with The Rogues Gallery
https://crnch-rg.gitlab.io/rg/asplos-2019/
• To be held at ASPLOS 2019 in Providence, RI.
• 14 April 2019, 8.30am – 12.00pm
• Architecture detailed descriptions
• Hands-on with the Emu Chick and (hopefully) the
FPAA
Rogues Gallery — 1 Mar 2019 12/23
Rogues Gallery VIP Team
Rogues Gallery VIP team has started in Spring 2019. This
course allows undergraduates to get research experience
working with software for novel architectures.
http://www.vip.gatech.edu/teams/
new-team-rogues-gallery
Rogues Gallery — 1 Mar 2019 13/23
Selected Results: Emu Pointer-Chasing Benchmark
Data-dependent loads, fine-grained access1
Ordered
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Intra-block shuffle: weak locality
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Full block shuffle: weak locality
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1
Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Vuduc, Riedy. “An Initial Characterization of the Emu
Chick,” Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 2018.
Rogues Gallery — 1 Mar 2019 14/23
Selected Results: x86 Pointer-Chasing Benchmark
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
0
20
40
60
80
100Memorybandwidth(GBs)
peak STREAM bandwidth
56 threads
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
peak STREAM bandwidth
112 threads
block_shuffle intra_block_shuffle full_block_shuffle
Haswell results, every pattern is different.2
2
Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Riedy, Vuduc, Conte. “A Microbenchmark Characterization
of the Emu Chick.” https://arxiv.org/abs/1809.07696
Rogues Gallery — 1 Mar 2019 15/23
Selected Results: Emu Pointer-Chasing Benchmark
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4MBlock size (number of 16B elements)
0
2
4
6
8
10
12
Memorybandwidth(GBs)
peak STREAM bandwidth
2048 threads
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
peak STREAM bandwidth
4096 threads
block_shuffle intra_block_shuffle full_block_shuffle
Mostly flat performance, high utilization.2
Rogues Gallery — 1 Mar 2019 16/23
Selected Results: BFS on a Dynamic Data Structure
15 16 17 18 19 20 21
scale
0
20
40
60
80
100
MTEPS
Emu single node - Cilk
Emu multi-node - Cilk
x86 Haswell - STINGER
x86 Haswell - Cilk
0
500
1000
1500
EdgeBandwidth(MB/s)
Note: Streaming data structure, not statically optimized. 3
3
Hein, Eswar, Abdurrahman Yasar, Prasanth Chatarasi, Li, Young, Conte, Ümit Çatalyürek, Vuduc, Riedy, Bora Uçar.
“Programming Strategies for Irregular Algorithms on the Emu Chick.” https://arxiv.org/abs/1901.02775
Rogues Gallery — 1 Mar 2019 17/23
Selected Results: Labeled Subgraph Alignment
1 2 4 8 16 32 64 128
Number of Threads
0
10
20
30
40
50
Speedup
Multi-BLK
Multi-HCB
Single-BLK
Single-HCB
gsaNA, the first parallel algorithm, strong scaling on DBLP
graph (2048 vertices)3
Rogues Gallery — 1 Mar 2019 18/23
Selected Results: FPGAs
Hadidi, Asgari, Young, Mudassar, Garg, Krishna, Kim. “Performance
Implications of NoCs on 3D-Stacked Memories: Insights from the
Hybrid Memory Cube (HMC),” ISPASS 2018
• Characterizations
with FPGA and Hybrid
Memory Cube show
latency/bandwidth
tradeoff.
• Other FPGA work is
focused on compilers,
HPC prototyping, and
sparse algorithms for
Intel and Xilinx FPGAS.
Rogues Gallery — 1 Mar 2019 19/23
Lessons Learned, Emu Chick i
• Finding appropriate metrics is difficult, e.g. for the
Emu:
• Comparing ASICs (e.g. x86) to FPGA-based prototypes
can be unfair either way.
• Fraction of peak bandwidth for the idealized
problem?
• SpMV: FLOP/s ∝ BW, level 2 sparse BLAS op.
• Graph500 BFS: TEPS ∝ BW
Rogues Gallery — 1 Mar 2019 20/23
Lessons Learned, Emu Chick ii
• Distilling observations on architecture ↔
programming model:
• Program data location for load (BW) balance.
• Remote memory operations v. migration exposes the
architecture.
• Migrations cost more than it appears. Computation?
• Stack spills/access can cause ping-ponging.
• How does HW support for top-down (Cilk-ish) affect
bottom-up (UPC/SHMEM) PGAS programming?
• Memory allocation similar to UPC, SHMEM
• UPC++ rpc_ff v. Emu thread migration?
Rogues Gallery — 1 Mar 2019 21/23
Acknowledgments
Fantastic students and colleagues:
• Srinivas Eswar (GT CSE)
• Dr. Eric Hein (GT ECE ⇒ Emu)
• Patrick Lavin (GT CSE)
• Dr. Jiajia Li (GT CSE ⇒ PNNL)
• Abdurrahman Yaşar (GT CSE)
• Dr. Ümit Çatalürek (GT CSE)
• Dr. Tom Conte (GT CS/ECE)
• Dr. Bora Uçar (ENS Lyon CNRS)
• Dr. Rich Vuduc (GT CSE)
• Dr. Jeffrey S. Young (GT CS)
Code (ideally will have links from crnch.gatech.edu):
• https://gitlab.com/crnch-rg (soon)
• https://github.com/ehein6/emu-microbench
Other testbeds:
• ORNL: ExCL
• PNNL: CENATE
• Argonne
• Sandia
• Berkeley: AQCT
• (others?)
Rogues Gallery — 1 Mar 2019 22/23
Rogues Gallery: Active and Growing
• Added FPGA resources, integrating FPAAs
• Tight development loop with Emu
• Active research projects and publications
• Community outreach and education underway
• One GT student has made the leap to industry
already...
CRNCH Rogues Gallery connects researchers and
students with novel architectures and architects with
upcoming applications.
Let us host / manage your neat stuff!
http://crnch.gatech.edu/rogues-gallery
Rogues Gallery — 1 Mar 2019 23/23

More Related Content

Similar to Novel Architectures for Applications in Data Science and Beyond

CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateJason Riedy
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Larry Smarr
 
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC and Hackathons Monthly Highlights: April  2023OpenACC and Hackathons Monthly Highlights: April  2023
OpenACC and Hackathons Monthly Highlights: April 2023OpenACC
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.Kyong-Ha Lee
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research PlatformLarry Smarr
 
Challenges in end-to-end performance
Challenges in end-to-end performanceChallenges in end-to-end performance
Challenges in end-to-end performanceJisc
 
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...KTN
 
Acceleration of XML Parsing through Prefetching
Acceleration of XML  Parsing through PrefetchingAcceleration of XML  Parsing through Prefetching
Acceleration of XML Parsing through PrefetchingRohit Deshpande
 
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
 Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...Larry Smarr
 
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...Larry Smarr
 
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...Geoffrey Fox
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksJason Riedy
 
[150824]symposium v4
[150824]symposium v4[150824]symposium v4
[150824]symposium v4yyooooon
 
Multicore series-1-0223
Multicore series-1-0223Multicore series-1-0223
Multicore series-1-0223Aysha Khan
 
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 FPGA-based soft-processors: 6G nodes and post-quantum security in space FPGA-based soft-processors: 6G nodes and post-quantum security in space
FPGA-based soft-processors: 6G nodes and post-quantum security in spaceFacultad de Informática UCM
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingeSAT Journals
 
Gridcomputing
GridcomputingGridcomputing
Gridcomputingpchengi
 
PRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path ForwardPRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path ForwardLarry Smarr
 

Similar to Novel Architectures for Applications in Data Science and Beyond (20)

CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC and Hackathons Monthly Highlights: April  2023OpenACC and Hackathons Monthly Highlights: April  2023
OpenACC and Hackathons Monthly Highlights: April 2023
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
Challenges in end-to-end performance
Challenges in end-to-end performanceChallenges in end-to-end performance
Challenges in end-to-end performance
 
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
 
Acceleration of XML Parsing through Prefetching
Acceleration of XML  Parsing through PrefetchingAcceleration of XML  Parsing through Prefetching
Acceleration of XML Parsing through Prefetching
 
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
 Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
 
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...
 
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
 
[150824]symposium v4
[150824]symposium v4[150824]symposium v4
[150824]symposium v4
 
Multicore series-1-0223
Multicore series-1-0223Multicore series-1-0223
Multicore series-1-0223
 
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 FPGA-based soft-processors: 6G nodes and post-quantum security in space FPGA-based soft-processors: 6G nodes and post-quantum security in space
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 
Priorities Shift In IC Design
Priorities Shift In IC DesignPriorities Shift In IC Design
Priorities Shift In IC Design
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passing
 
Gridcomputing
GridcomputingGridcomputing
Gridcomputing
 
PRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path ForwardPRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path Forward
 

More from Jason Riedy

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13Jason Riedy
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architecturesJason Riedy
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Jason Riedy
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs Jason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsJason Riedy
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsJason Riedy
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
Network Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisNetwork Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisJason Riedy
 
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Jason Riedy
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 

More from Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Network Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisNetwork Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity Analysis
 
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 

Recently uploaded

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 

Recently uploaded (20)

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 

Novel Architectures for Applications in Data Science and Beyond

  • 1. Novel Architectures for Applications in Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies at Georgia Tech 1 March 2019
  • 2. Apps: Massive+-scale data analysis Cyber-security Identify anomalies, malicious actors Health care Find outbreaks, population epidemiology, similar patient association Social networks Advertising, searching, grouping Intelligence Decisions at scale, regulating markets, smart & sustainable cities Systems biology Understanding interactions, drug design Power grid / Smart cities Disruptions, conservation, prediction Irregular data access. Changing data. Rogues Gallery — 1 Mar 2019 2/23
  • 3. High-Performance Data Analysis (HPDA) Novel applications: • Data at scale and speed needs new ideas for computing analysis. • “Big data” platforms fare poorly v. a single thread plus large SSD even for static data sets. (McSherry, Isard, Murray. “Scalability! But at what COST?” HotOS XV, 2015.) • Many high-level codes are written and re-written to answer one question: need flexibility. • Some primitives may be tuned and re-used. Rogues Gallery — 1 Mar 2019 3/23
  • 4. HPDA and Architectures • Current architectures are hitting limits on manufacturing, heat dissipation, memory latency... • New architecture proposals are difficult to evaluate via simulation and modeling alone. • What happens when novel prototypes hit reality? • Architects/designers need rapid feedback on new ideas. • New ideas only become successful with a community: software ecosystem, trained students. Need bridges between apps and architects. Rogues Gallery — 1 Mar 2019 4/23
  • 5. Introducing the CRNCH Rogues Gallery A physical & virtual space for hosting novel computing architectures, systems, and accelerators. Emu Chick FPGAs & HMC/HBM FPAA Amortize effort and cost of trying novel architectures. Break the “but it’s too much work” barrier. http://crnch.gatech.edu/rogues-gallery Rogues Gallery — 1 Mar 2019 5/23
  • 6. Emu Technology’s PGAS Architecture 1 nodelet Gossamer Core 1 Memory-Side Processor Gossamer Core 4 ... Migration Engine RapidIODisk I/O 8 nodelets per node 64 nodelets per Chick RapidIO Stationary Core • Multithreaded multicore • Memory-side “processor” for operations in narrow-channel DRAM • Stationary core for OS • Threads migrate in hardware on reads! • Optimize for weak locality Rogues Gallery — 1 Mar 2019 6/23
  • 7. 3D Stacked Memory and FPGAs FPGAs enable flexible “near-memory” research for both hardware and programming models: • Micron/Pico EX700 & HMC • Nallatech 385s (Arria 10) • Intel Arria10 DevKit • Nallatech 520N (Stratix 10) • Xilinx MpSOC Rogues Gallery — 1 Mar 2019 7/23
  • 8. Neuromorphic Systems • Field-Programmable Analog Array (FPAA) System-On Chip, designed in the lab of Dr. Jennifer Hasler. • Analog arrays can achieve unprecedented power and size reductions. FPAA “driven” by a RPi Neuromorphic Workshop, 27 Apr 2018 Rogues Gallery — 1 Mar 2019 8/23
  • 9. Flexible Infrastructure login / notebook rg-adm Slurm Ctl toolbox (NFS) Scheduling, Tools, and Admin Key: Schedulable Resource Physical Resource VM USB device User Resources fpaa-host power-host nvidia-tegra-N nvidia-tegra-1 fpaa-dev rg-db Slurm DBD emu-dev emu-chick ..Nfpga-dev-1 fpga-hmcfpga-intel Powell, Riedy, Young, and Conte. “Wrangling Rogues: Managing Experimental Post-Moore Architectures.” https://arxiv.org/abs/1808.06334 • Available. Plans to integrate with NSF XSEDE. • Scheduler being deployed. • Incorporates Singularity and virtual machines for OS/library versioning. • Fun tools: usbip for FPAA... Analog inputs? Rogues Gallery — 1 Mar 2019 9/23
  • 10. Rogues Gallery Press On The Next Platform, a partner of The Register: https://www.nextplatform.com/2018/08/27/a-rogues-gallery-of-post-moores-law-options/ Rogues Gallery — 1 Mar 2019 10/23
  • 11. Neuromorphic Workshop - April 2018 FPAA-focused workshop led by Dr. Hasler introduced the wider community to FPAA programming for neuromorphic systems. • Tutorial-style class • Lightning talks from GT and DoE researchers • http://crnch.gatech. edu/neuro-workshop18 Rogues Gallery — 1 Mar 2019 11/23
  • 12. Rogues Gallery ASPLOS Tutorial Programming Novel Architectures in the Post-Moore Era with The Rogues Gallery https://crnch-rg.gitlab.io/rg/asplos-2019/ • To be held at ASPLOS 2019 in Providence, RI. • 14 April 2019, 8.30am – 12.00pm • Architecture detailed descriptions • Hands-on with the Emu Chick and (hopefully) the FPAA Rogues Gallery — 1 Mar 2019 12/23
  • 13. Rogues Gallery VIP Team Rogues Gallery VIP team has started in Spring 2019. This course allows undergraduates to get research experience working with software for novel architectures. http://www.vip.gatech.edu/teams/ new-team-rogues-gallery Rogues Gallery — 1 Mar 2019 13/23
  • 14. Selected Results: Emu Pointer-Chasing Benchmark Data-dependent loads, fine-grained access1 Ordered 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Intra-block shuffle: weak locality 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Full block shuffle: weak locality 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Vuduc, Riedy. “An Initial Characterization of the Emu Chick,” Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 2018. Rogues Gallery — 1 Mar 2019 14/23
  • 15. Selected Results: x86 Pointer-Chasing Benchmark 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Block size (number of 16B elements) 0 20 40 60 80 100Memorybandwidth(GBs) peak STREAM bandwidth 56 threads 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Block size (number of 16B elements) peak STREAM bandwidth 112 threads block_shuffle intra_block_shuffle full_block_shuffle Haswell results, every pattern is different.2 2 Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Riedy, Vuduc, Conte. “A Microbenchmark Characterization of the Emu Chick.” https://arxiv.org/abs/1809.07696 Rogues Gallery — 1 Mar 2019 15/23
  • 16. Selected Results: Emu Pointer-Chasing Benchmark 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4MBlock size (number of 16B elements) 0 2 4 6 8 10 12 Memorybandwidth(GBs) peak STREAM bandwidth 2048 threads 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Block size (number of 16B elements) peak STREAM bandwidth 4096 threads block_shuffle intra_block_shuffle full_block_shuffle Mostly flat performance, high utilization.2 Rogues Gallery — 1 Mar 2019 16/23
  • 17. Selected Results: BFS on a Dynamic Data Structure 15 16 17 18 19 20 21 scale 0 20 40 60 80 100 MTEPS Emu single node - Cilk Emu multi-node - Cilk x86 Haswell - STINGER x86 Haswell - Cilk 0 500 1000 1500 EdgeBandwidth(MB/s) Note: Streaming data structure, not statically optimized. 3 3 Hein, Eswar, Abdurrahman Yasar, Prasanth Chatarasi, Li, Young, Conte, Ümit Çatalyürek, Vuduc, Riedy, Bora Uçar. “Programming Strategies for Irregular Algorithms on the Emu Chick.” https://arxiv.org/abs/1901.02775 Rogues Gallery — 1 Mar 2019 17/23
  • 18. Selected Results: Labeled Subgraph Alignment 1 2 4 8 16 32 64 128 Number of Threads 0 10 20 30 40 50 Speedup Multi-BLK Multi-HCB Single-BLK Single-HCB gsaNA, the first parallel algorithm, strong scaling on DBLP graph (2048 vertices)3 Rogues Gallery — 1 Mar 2019 18/23
  • 19. Selected Results: FPGAs Hadidi, Asgari, Young, Mudassar, Garg, Krishna, Kim. “Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube (HMC),” ISPASS 2018 • Characterizations with FPGA and Hybrid Memory Cube show latency/bandwidth tradeoff. • Other FPGA work is focused on compilers, HPC prototyping, and sparse algorithms for Intel and Xilinx FPGAS. Rogues Gallery — 1 Mar 2019 19/23
  • 20. Lessons Learned, Emu Chick i • Finding appropriate metrics is difficult, e.g. for the Emu: • Comparing ASICs (e.g. x86) to FPGA-based prototypes can be unfair either way. • Fraction of peak bandwidth for the idealized problem? • SpMV: FLOP/s ∝ BW, level 2 sparse BLAS op. • Graph500 BFS: TEPS ∝ BW Rogues Gallery — 1 Mar 2019 20/23
  • 21. Lessons Learned, Emu Chick ii • Distilling observations on architecture ↔ programming model: • Program data location for load (BW) balance. • Remote memory operations v. migration exposes the architecture. • Migrations cost more than it appears. Computation? • Stack spills/access can cause ping-ponging. • How does HW support for top-down (Cilk-ish) affect bottom-up (UPC/SHMEM) PGAS programming? • Memory allocation similar to UPC, SHMEM • UPC++ rpc_ff v. Emu thread migration? Rogues Gallery — 1 Mar 2019 21/23
  • 22. Acknowledgments Fantastic students and colleagues: • Srinivas Eswar (GT CSE) • Dr. Eric Hein (GT ECE ⇒ Emu) • Patrick Lavin (GT CSE) • Dr. Jiajia Li (GT CSE ⇒ PNNL) • Abdurrahman Yaşar (GT CSE) • Dr. Ümit Çatalürek (GT CSE) • Dr. Tom Conte (GT CS/ECE) • Dr. Bora Uçar (ENS Lyon CNRS) • Dr. Rich Vuduc (GT CSE) • Dr. Jeffrey S. Young (GT CS) Code (ideally will have links from crnch.gatech.edu): • https://gitlab.com/crnch-rg (soon) • https://github.com/ehein6/emu-microbench Other testbeds: • ORNL: ExCL • PNNL: CENATE • Argonne • Sandia • Berkeley: AQCT • (others?) Rogues Gallery — 1 Mar 2019 22/23
  • 23. Rogues Gallery: Active and Growing • Added FPGA resources, integrating FPAAs • Tight development loop with Emu • Active research projects and publications • Community outreach and education underway • One GT student has made the leap to industry already... CRNCH Rogues Gallery connects researchers and students with novel architectures and architects with upcoming applications. Let us host / manage your neat stuff! http://crnch.gatech.edu/rogues-gallery Rogues Gallery — 1 Mar 2019 23/23