SlideShare a Scribd company logo
1 of 27
Download to read offline
Characterization of the Emu Chick with
Microbenchmarks
E. Jason Riedy
Center for Research into Novel Computing Hierarchies at Georgia Tech
23 January 2019
Outline
Project Background
Microbenchmarks
STREAM ADD and Pointer Chasing
Sparse Matrix – Vector Product (SpMV)
Breadth-First Search (BFS)
Labeled Subgraph Alignment
Observations
Memory-centric HPDA
• “Big data” platforms fare poorly v. a single thread
plus large SSD. (McSherry, Isard, Murray. “Scalability!
But at what COST?” HotOS XV, 2015.)
• New architecture proposals are difficult to evaluate
via simulation and modeling alone.
Evaluate the FPGA-based prototype Emu Chick...
• But by what criteria?
• Chose memory bandwidth utilization.
• Memory-centric architecture
• BW is equivalent to MFLOP/s in SpMV, TEPS in BFS
Emu: µbenchmarks — 23 Jan 2019 3/27
Emu Technology’s PGAS Architecture
1 nodelet
Gossamer
Core 1
Memory-Side Processor
Gossamer
Core 4
...
Migration Engine
RapidIODisk I/O
8 nodelets
per node
64 nodelets
per Chick
RapidIO
Stationary
Core
• Multithreaded multicore
• Memory-side “processor” for
operations in
narrow-channel DRAM
• Stationary core for OS
• Threads migrate in
hardware on reads!
• Optimize for weak locality
Emu: µbenchmarks — 23 Jan 2019 4/27
Baseline: Emu STREAM ADD c[i] = a[i] + b[i]
GC Config Nodelets Scale Threads BW (MB/s)
1 8 30 512 1,599.86
3 4 29 384 1,288.39
1 64 31 4096 12,790.31
3 32 31 6144 7,241.07
Theor. Peak 8 9,600
Theor. Peak 64 76,800
STREAM results are used to compare bandwidth
utilization for the current prototype. 3 GC is experimental
and has (had?) half the memory controllers1
1
Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Riedy, Vuduc, Conte. “A Microbenchmark Characterization
of the Emu Chick,” (in submission, https://arxiv.org/abs/1809.07696 ).
Emu: µbenchmarks — 23 Jan 2019 5/27
Thread Spawning in STREAM ADD
64 128 256 512 1024 2048 4096
Number of threads
0
2
4
6
8
10
12
Memorybandwidth(GBs)
serial_spawn
recursive_spawn
serial_remote_spawn
recursive_remote_spawn
Global: 1GC / nodelet, 64 nodelets
Emu: µbenchmarks — 23 Jan 2019 6/27
Bandwidth Limited by Computation
STREAM and Memory Bandwidths (BW in MB/s)
Operation Nodelets Scale Threads BW
Current - arithmetic ops 1 200
Ideal - all ld ops 1 1,400
ADD (Measured) 8 30 512 1,600
ADD (Measured) 64 31 4096 12,790
NCDIMM 8 12,800
NCDIMM 64 102,400
Per-GC peak from instruction counts:
175MHz ⇒
175M cycles
second
×
1 instruction
cycle
×
3 mem ops
21 instructions
×
8 Bytes
1 mem op
= 200MB/s
One GC per nodelet hits this peak. Eight GC/nodelet may hit the ideal peak.
Emu: µbenchmarks — 23 Jan 2019 7/27
Emu Pointer-Chasing Benchmark
Data-dependent loads, fine-grained access2
Ordered
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Intra-block shuffle: weak locality
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Full block shuffle: weak locality
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2
Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Vuduc, Riedy. “An Initial Characterization of the Emu
Chick,” Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 2018.
Emu: µbenchmarks — 23 Jan 2019 8/27
x86 Pointer-Chasing Benchmark
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
0
20
40
60
80
100
Memorybandwidth(GBs) peak STREAM bandwidth
56 threads
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
peak STREAM bandwidth
112 threads
block_shuffle intra_block_shuffle full_block_shuffle
Haswell results, every pattern is different.1
Emu: µbenchmarks — 23 Jan 2019 9/27
Emu Pointer-Chasing Benchmark
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4MBlock size (number of 16B elements)
0
2
4
6
8
10
12
Memorybandwidth(GBs)
peak STREAM bandwidth
2048 threads
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
peak STREAM bandwidth
4096 threads
block_shuffle intra_block_shuffle full_block_shuffle
Mostly flat performance, high utilization.1
Emu: µbenchmarks — 23 Jan 2019 10/27
SpMV Layout, Synthetic (5pt Laplacian)
CSR:
Local 1D 2D
1 nodelet 8+ nodelets 8+ nodelets
X
row
v
col
= x
Y
X
Y =
x
Y
Xx
=
102 302 602 802 1002 2002 3002
Number of Rows
0
100
200
300
400
500
600
Bandwidth(MB/s)
Data Layout
Local layout
1D layout
2D layout
Single node, integer entries1
Emu: µbenchmarks — 23 Jan 2019 11/27
SpMV Synthetic, Replicated – Single node, 1 GC
0 500 1000 1500 2000 2500 3000 3500
Matrix Size (MB)
0
100
200
300
400
500
600
700
800
900
Bandwidth(MB/s) SpMV (Emu Chick, Single node)
No. Threads
64
128
256
512
Good bandwidth utilization with high thread counts and
replicated x.
Emu: µbenchmarks — 23 Jan 2019 12/27
SpMV Synthetic – Single node, 1 GC
0 500 1000 1500 2000 2500 3000 3500
Matrix Size (MB)
0
100
200
300
400
500
600
700
800
Bandwidth(MB/s) SpMV (Emu Chick, Single node)
No. Threads
256
512
The 5pt Laplacian without replicating x bounces between
migratory and non-migratory areas.1
Emu: µbenchmarks — 23 Jan 2019 13/27
SpMV Synthetic – Single node, 1 and 3 GC
102 502 1002 1502 2002 2502 3002 5002 10002 11002 14002 15002 20002 25002 30002 40002
Number of Rows
0
200
400
600
800
1000
Bandwidth(MB/s)
SpMV (Emu Chick, Single node, 512 threads)
1GC
3GC
3 GC version: half the nodes, half the memory controllers
Emu: µbenchmarks — 23 Jan 2019 14/27
SpMV Synthetic – Single node, 1 and 3 GC
0 200 400 600 800 1000 1200 1400 1600
Matrix Size (MB)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
BandwidthUtlization SpMV (Emu Chick, Single node, 512 threads)
ctype
1GC
3GC
3 GC results demonstrate that SpMV is compute-bound
from address computation.
Emu: µbenchmarks — 23 Jan 2019 15/27
SpMV Synthetic, Replicated – Multinode, 1 GC
0 2000 4000 6000 8000 10000 12000 14000
Matrix Size (MB)
0
1000
2000
3000
4000
5000
6000
7000
Bandwidth(MB/s)
SpMV (Emu Chick, Multi node)
No. Threads
64
128
256
512
1024
2048
4096
SpMV scales up to 50% of bandwidth for high thread
counts and replicated x.1
Emu: µbenchmarks — 23 Jan 2019 16/27
SpMV Synthetic – Multinode, 1 GC
0 2000 4000 6000 8000 10000 12000 14000
Matrix Size (MB)
0
1000
2000
3000
4000
5000
6000
Bandwidth(MB/s)
SpMV (Emu Chick, Multi node)
No. Threads
1024
2048
4096
But migrations for fetching x hurt with eight nodes.
Emu: µbenchmarks — 23 Jan 2019 17/27
SpMV Real-World Results, Replicated
SpMV multinode bandwidths (in MB/s) for real world graphs (Tim Davis’s collection)
along with matrix dimension, number of non-zeros (NNZ), and the average and
maximum row degrees. Run with 4K threads.
Matrix Rows NNZ Avg Deg Max Deg BW
mc2depi 526K 2.1M 3.99 4 3870.31
ecology1 1.0M 5.0M 5.00 5 4425.61
amazon03 401K 3.2M 7.99 10 4494.79
Delor295 296K 2.4M 8.12 11 4492.47
roadNet- 1.39M 3.84M 2.76 12 3811.57
mac_econ 206K 1.27M 6.17 44 3735.54
cop20k_A 121K 2.62M 21.65 81 4520.05
watson_2 352K 1.85M 5.25 93 3486.30
ca2010 710K 3.49M 4.91 141 4075.97
poisson3 86K 2.37M 27.74 145 4031.20
gyro_k 17K 1.02M 58.82 360 2446.36
vsp_fina 140K 1.1M 7.90 669 1335.59
Stanford 282K 2.31M 8.20 38606 287.82
ins2 309K 2.75M 8.89 309412 43.91
Emu: µbenchmarks — 23 Jan 2019 18/27
Breadth-First Search with Remote Writes
1. For each vertex in the frontier, try to set self as
parent of each neighbor vertex
• Done using remote writes, no migrations
• Last writer wins (benign race condition)
2. Double-buffer: Check to see which vertices acquired
a new parent, and add them to the queue
• This step is completely nodelet-local
• Caveat: also scans inactive vertices
Emu: µbenchmarks — 23 Jan 2019 19/27
BFS Pseudo-code
Listing 1: BFS algorithm using remote writes
queue.push(root)
while len(queue) > 0:
for src in queue:
for dst in out_edges(src):
# Remote write
new_parent[dst] = src
for v in range(num_vertices):
if parent[v] == -1:
if new_parent[v] != -1:
parent[v] = new_parent[v]
queue.push(v)
Emu: µbenchmarks — 23 Jan 2019 20/27
BFS on a Dynamic Data Structure
15 16 17 18 19 20 21
scale
0
20
40
60
80
100
MTEPS
Emu single node - Cilk
Emu multi-node - Cilk
x86 Haswell - STINGER
x86 Haswell - Cilk
0
500
1000
1500
EdgeBandwidth(MB/s)
Note: Streaming data structure, not statically optimized.
But Erdös-Rényi graphs. RMAT: Load imbalance. 3
3
Hein, Eswar, Abdurrahman Yasar, Prasanth Chatarasi, Li, Young, Conte, Ümit Çatalyürek, Vuduc, Riedy, Bora Uçar.
“Programming Strategies for Irregular Algorithms on the Emu Chick,” (in submission).
Emu: µbenchmarks — 23 Jan 2019 21/27
Labeled Subgraph Alignment
1 2 4 8 16 32 64 128
Number of Threads
0
10
20
30
40
50
Speedup
Multi-BLK
Multi-HCB
Single-BLK
Single-HCB
gsaNA, the first parallel algorithm, strong scaling on DBLP
graph (2048 vertices). Block (BLK) vertex layout is slightly
worse than Hilbert curve (HCB) layout.3
Emu: µbenchmarks — 23 Jan 2019 22/27
Lessons Learned i
• Finding appropriate metrics is difficult:
• Comparing ASICs (e.g. x86) to FPGA-based prototypes
can be unfair either way.
• Fraction of peak bandwidth for the idealized
problem?
• Measured peak is much lower than theoretical peak.
• The Chick is compute bound.
• SpMV: FLOP/s ∝ BW, level 2 sparse BLAS op.
• Graph500 BFS: TEPS ∝ BW
Emu: µbenchmarks — 23 Jan 2019 23/27
Lessons Learned ii
• Distilling observations on architecture ↔
programming model:
• Program data location for load (BW) balance.
• Remote memory operations v. migration exposes the
architecture.
• Migrations cost more than it appears. Computation?
• Stack spills/access can cause ping-ponging.
• How does HW support for top-down (Cilk-ish) affect
bottom-up (UPC) PGAS programming?
• Memory allocation similar to UPC, SHMEM
• UPC++ rpc_ff v. Emu thread migration?
Emu: µbenchmarks — 23 Jan 2019 24/27
Integrating the Chick with Flexible Infrastructure
login
rg-adm
Slurm Ctl
toolbox
(NFS)
Scheduling,
Tools, and
Admin
Key:
Schedulable Resource
Physical Resource
VM
USB device
User
Resources
fpaa-host
power-host
nvidia-tegra-N
nvidia-tegra-1
fpaa-dev
rg-db
Slurm DBD
emu-dev emu-chick
..Nfpga-dev-1
fpga-hmcfpga-intel
Powell, Riedy, Young, and Conte. “Wrangling Rogues: Managing
Experimental Post-Moore Architectures.”
https://arxiv.org/abs/1808.06334
• Available. Plans to
integrate with NSF
XSEDE.
• Scheduler being
deployed.
• Incorporates
Singularity and virtual
machines for
OS/library versioning.
Emu: µbenchmarks — 23 Jan 2019 25/27
Umbrella Project: CRNCH Rogues Gallery
A physical & virtual space for hosting novel computing
architectures, systems, and accelerators.
Host / manage remote access for novel architectures!
• Emu Chick
• FPGA + HMC: 3D stacked
• FPAA: Analog/Neuromorphic
Amortize effort and cost of trying novel architectures.
Break the “but it’s too much work” barrier.
http://crnch.gatech.edu/rogues-gallery
Emu: µbenchmarks — 23 Jan 2019 26/27
Acknowledgments
• Srinivas Eswar (GT CSE)
• Dr. Eric Hein (GT ECE ⇒ Emu)
• Patrick Lavin (GT CSE)
• Jiajia Li (GT CSE ⇒ PNNL)
• Abdurrahman Yaşar (GT CSE)
• Dr. Ümit Çatalürek (GT CSE)
• Dr. Tom Conte (GT CS/ECE)
• Dr. Bora Uçar (ENS Lyon CNRS)
• Dr. Rich Vuduc (GT CSE)
• Dr. Jeffrey S. Young (GT CS)
Code:
• https://gitlab.com/crnch-rg (soon)
• https://github.com/ehein6/emu-microbench
Emu: µbenchmarks — 23 Jan 2019 27/27

More Related Content

What's hot

HPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPCHPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPCRyousei Takano
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudRyousei Takano
 
Joel Gibson - Challenge 2 - Virtual Design Master
Joel Gibson - Challenge 2 - Virtual Design MasterJoel Gibson - Challenge 2 - Virtual Design Master
Joel Gibson - Challenge 2 - Virtual Design Mastervdmchallenge
 
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...Rakuten Group, Inc.
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
Working together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURFWorking together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURFCommunicatieSURF
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Fisnik Kraja
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersCastLabKAIST
 
AES encryption on modern consumer architectures
AES encryption on modern consumer architecturesAES encryption on modern consumer architectures
AES encryption on modern consumer architecturesGrigore Lupescu
 
IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告Ryousei Takano
 
R&D work on pre exascale HPC systems
R&D work on pre exascale HPC systemsR&D work on pre exascale HPC systems
R&D work on pre exascale HPC systemsJoshua Mora
 
AI Chip Trends and Forecast
AI Chip Trends and ForecastAI Chip Trends and Forecast
AI Chip Trends and ForecastCastLabKAIST
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDASavith Satheesh
 
Accelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUsAccelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUsVivek Venugopalan
 
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ..."NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...Edge AI and Vision Alliance
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 

What's hot (20)

HPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPCHPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPC
 
Anegdotic Maxeler (Romania)
  Anegdotic Maxeler (Romania)  Anegdotic Maxeler (Romania)
Anegdotic Maxeler (Romania)
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
Memoryhierarchy
MemoryhierarchyMemoryhierarchy
Memoryhierarchy
 
Joel Gibson - Challenge 2 - Virtual Design Master
Joel Gibson - Challenge 2 - Virtual Design MasterJoel Gibson - Challenge 2 - Virtual Design Master
Joel Gibson - Challenge 2 - Virtual Design Master
 
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Working together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURFWorking together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURF
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
AES encryption on modern consumer architectures
AES encryption on modern consumer architecturesAES encryption on modern consumer architectures
AES encryption on modern consumer architectures
 
IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告
 
R&D work on pre exascale HPC systems
R&D work on pre exascale HPC systemsR&D work on pre exascale HPC systems
R&D work on pre exascale HPC systems
 
AI Chip Trends and Forecast
AI Chip Trends and ForecastAI Chip Trends and Forecast
AI Chip Trends and Forecast
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic Computing
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
Accelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUsAccelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUs
 
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ..."NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 

Similar to Characterization of Emu Chick with Microbenchmarks

GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and EmusJason Riedy
 
Performance analysis of 3D Finite Difference computational stencils on Seamic...
Performance analysis of 3D Finite Difference computational stencils on Seamic...Performance analysis of 3D Finite Difference computational stencils on Seamic...
Performance analysis of 3D Finite Difference computational stencils on Seamic...Joshua Mora
 
Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT Ceph Community
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondJason Riedy
 
Design and Performance Evaluation of a 64-bit SRAM Memory Array Utilizing Mod...
Design and Performance Evaluation of a 64-bit SRAM Memory Array Utilizing Mod...Design and Performance Evaluation of a 64-bit SRAM Memory Array Utilizing Mod...
Design and Performance Evaluation of a 64-bit SRAM Memory Array Utilizing Mod...IRJET Journal
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
A Simplied Bit-Line Technique for Memory Optimization
A Simplied Bit-Line Technique for Memory OptimizationA Simplied Bit-Line Technique for Memory Optimization
A Simplied Bit-Line Technique for Memory Optimizationijsrd.com
 
Advantages of 64 Bit 5T SRAM
Advantages of 64 Bit 5T SRAMAdvantages of 64 Bit 5T SRAM
Advantages of 64 Bit 5T SRAMIJSRED
 
Advantages of 64 Bit 5T SRAM
Advantages of 64 Bit 5T SRAMAdvantages of 64 Bit 5T SRAM
Advantages of 64 Bit 5T SRAMIJSRED
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010TELECOM I+D
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 
Process Variation and Radiation-Immune Single Ended 6T SRAM Cell
Process Variation and Radiation-Immune Single Ended 6T SRAM CellProcess Variation and Radiation-Immune Single Ended 6T SRAM Cell
Process Variation and Radiation-Immune Single Ended 6T SRAM CellIDES Editor
 
Design and Simulation of a 16kb Memory using Memory Banking technique
Design and Simulation of a 16kb Memory using Memory Banking techniqueDesign and Simulation of a 16kb Memory using Memory Banking technique
Design and Simulation of a 16kb Memory using Memory Banking techniqueIRJET Journal
 
End nodes in the Multigigabit era
End nodes in the Multigigabit eraEnd nodes in the Multigigabit era
End nodes in the Multigigabit erarinnocente
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1wjunjmt
 
VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...
VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...
VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...VLSICS Design
 
VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...
VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...
VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...VLSICS Design
 

Similar to Characterization of Emu Chick with Microbenchmarks (20)

GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and Emus
 
Accelerix ISSCC 1998 Paper
Accelerix ISSCC 1998 PaperAccelerix ISSCC 1998 Paper
Accelerix ISSCC 1998 Paper
 
Performance analysis of 3D Finite Difference computational stencils on Seamic...
Performance analysis of 3D Finite Difference computational stencils on Seamic...Performance analysis of 3D Finite Difference computational stencils on Seamic...
Performance analysis of 3D Finite Difference computational stencils on Seamic...
 
Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and Beyond
 
Design and Performance Evaluation of a 64-bit SRAM Memory Array Utilizing Mod...
Design and Performance Evaluation of a 64-bit SRAM Memory Array Utilizing Mod...Design and Performance Evaluation of a 64-bit SRAM Memory Array Utilizing Mod...
Design and Performance Evaluation of a 64-bit SRAM Memory Array Utilizing Mod...
 
MaPU-HPCA2016
MaPU-HPCA2016MaPU-HPCA2016
MaPU-HPCA2016
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
A Simplied Bit-Line Technique for Memory Optimization
A Simplied Bit-Line Technique for Memory OptimizationA Simplied Bit-Line Technique for Memory Optimization
A Simplied Bit-Line Technique for Memory Optimization
 
Advantages of 64 Bit 5T SRAM
Advantages of 64 Bit 5T SRAMAdvantages of 64 Bit 5T SRAM
Advantages of 64 Bit 5T SRAM
 
Advantages of 64 Bit 5T SRAM
Advantages of 64 Bit 5T SRAMAdvantages of 64 Bit 5T SRAM
Advantages of 64 Bit 5T SRAM
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Process Variation and Radiation-Immune Single Ended 6T SRAM Cell
Process Variation and Radiation-Immune Single Ended 6T SRAM CellProcess Variation and Radiation-Immune Single Ended 6T SRAM Cell
Process Variation and Radiation-Immune Single Ended 6T SRAM Cell
 
Design and Simulation of a 16kb Memory using Memory Banking technique
Design and Simulation of a 16kb Memory using Memory Banking techniqueDesign and Simulation of a 16kb Memory using Memory Banking technique
Design and Simulation of a 16kb Memory using Memory Banking technique
 
End nodes in the Multigigabit era
End nodes in the Multigigabit eraEnd nodes in the Multigigabit era
End nodes in the Multigigabit era
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1
 
Dual-core processor
Dual-core processorDual-core processor
Dual-core processor
 
VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...
VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...
VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...
 
VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...
VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...
VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...
 

More from Jason Riedy

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13Jason Riedy
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architecturesJason Riedy
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...Jason Riedy
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateJason Riedy
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Jason Riedy
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs Jason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsJason Riedy
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsJason Riedy
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
Network Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisNetwork Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisJason Riedy
 

More from Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Network Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisNetwork Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity Analysis
 

Recently uploaded

RBS学位证,鹿特丹商学院毕业证书1:1制作
RBS学位证,鹿特丹商学院毕业证书1:1制作RBS学位证,鹿特丹商学院毕业证书1:1制作
RBS学位证,鹿特丹商学院毕业证书1:1制作f3774p8b
 
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一(办理学位证)韩国汉阳大学毕业证成绩单原版一比一
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一C SSS
 
定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一
定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一
定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一ss ss
 
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...ttt fff
 
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...Amil Baba Dawood bangali
 
定制(UI学位证)爱达荷大学毕业证成绩单原版一比一
定制(UI学位证)爱达荷大学毕业证成绩单原版一比一定制(UI学位证)爱达荷大学毕业证成绩单原版一比一
定制(UI学位证)爱达荷大学毕业证成绩单原版一比一ss ss
 
Vip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Vip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best ServicesVip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Vip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best Servicesnajka9823
 
定制(USF学位证)旧金山大学毕业证成绩单原版一比一
定制(USF学位证)旧金山大学毕业证成绩单原版一比一定制(USF学位证)旧金山大学毕业证成绩单原版一比一
定制(USF学位证)旧金山大学毕业证成绩单原版一比一ss ss
 
existing product research b2 Sunderland Culture
existing product research b2 Sunderland Cultureexisting product research b2 Sunderland Culture
existing product research b2 Sunderland CultureChloeMeadows1
 
Dubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
Dubai Call Girls O525547819 Spring Break Fast Call Girls DubaiDubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
Dubai Call Girls O525547819 Spring Break Fast Call Girls Dubaikojalkojal131
 
Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作
Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作
Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作f3774p8b
 
the cOMPUTER SYSTEM - computer hardware servicing.pptx
the cOMPUTER SYSTEM - computer hardware servicing.pptxthe cOMPUTER SYSTEM - computer hardware servicing.pptx
the cOMPUTER SYSTEM - computer hardware servicing.pptxLeaMaePahinagGarciaV
 
(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一
(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一
(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一Fi sss
 
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCRReal Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCRdollysharma2066
 
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degreeyuu sss
 
Call Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile serviceCall Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile servicerehmti665
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...Amil Baba Dawood bangali
 
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degreeyuu sss
 

Recently uploaded (20)

RBS学位证,鹿特丹商学院毕业证书1:1制作
RBS学位证,鹿特丹商学院毕业证书1:1制作RBS学位证,鹿特丹商学院毕业证书1:1制作
RBS学位证,鹿特丹商学院毕业证书1:1制作
 
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一(办理学位证)韩国汉阳大学毕业证成绩单原版一比一
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一
 
定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一
定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一
定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一
 
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...
 
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
 
定制(UI学位证)爱达荷大学毕业证成绩单原版一比一
定制(UI学位证)爱达荷大学毕业证成绩单原版一比一定制(UI学位证)爱达荷大学毕业证成绩单原版一比一
定制(UI学位证)爱达荷大学毕业证成绩单原版一比一
 
9953330565 Low Rate Call Girls In Jahangirpuri Delhi NCR
9953330565 Low Rate Call Girls In Jahangirpuri  Delhi NCR9953330565 Low Rate Call Girls In Jahangirpuri  Delhi NCR
9953330565 Low Rate Call Girls In Jahangirpuri Delhi NCR
 
Vip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Vip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best ServicesVip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Vip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best Services
 
定制(USF学位证)旧金山大学毕业证成绩单原版一比一
定制(USF学位证)旧金山大学毕业证成绩单原版一比一定制(USF学位证)旧金山大学毕业证成绩单原版一比一
定制(USF学位证)旧金山大学毕业证成绩单原版一比一
 
existing product research b2 Sunderland Culture
existing product research b2 Sunderland Cultureexisting product research b2 Sunderland Culture
existing product research b2 Sunderland Culture
 
Dubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
Dubai Call Girls O525547819 Spring Break Fast Call Girls DubaiDubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
Dubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
 
Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作
Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作
Erfurt FH学位证,埃尔福特应用技术大学毕业证书1:1制作
 
the cOMPUTER SYSTEM - computer hardware servicing.pptx
the cOMPUTER SYSTEM - computer hardware servicing.pptxthe cOMPUTER SYSTEM - computer hardware servicing.pptx
the cOMPUTER SYSTEM - computer hardware servicing.pptx
 
young call girls in Khanpur,🔝 9953056974 🔝 escort Service
young call girls in  Khanpur,🔝 9953056974 🔝 escort Serviceyoung call girls in  Khanpur,🔝 9953056974 🔝 escort Service
young call girls in Khanpur,🔝 9953056974 🔝 escort Service
 
(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一
(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一
(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一
 
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCRReal Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
 
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
 
Call Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile serviceCall Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile service
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uk England Northern ...
 
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
 

Characterization of Emu Chick with Microbenchmarks

  • 1. Characterization of the Emu Chick with Microbenchmarks E. Jason Riedy Center for Research into Novel Computing Hierarchies at Georgia Tech 23 January 2019
  • 2. Outline Project Background Microbenchmarks STREAM ADD and Pointer Chasing Sparse Matrix – Vector Product (SpMV) Breadth-First Search (BFS) Labeled Subgraph Alignment Observations
  • 3. Memory-centric HPDA • “Big data” platforms fare poorly v. a single thread plus large SSD. (McSherry, Isard, Murray. “Scalability! But at what COST?” HotOS XV, 2015.) • New architecture proposals are difficult to evaluate via simulation and modeling alone. Evaluate the FPGA-based prototype Emu Chick... • But by what criteria? • Chose memory bandwidth utilization. • Memory-centric architecture • BW is equivalent to MFLOP/s in SpMV, TEPS in BFS Emu: µbenchmarks — 23 Jan 2019 3/27
  • 4. Emu Technology’s PGAS Architecture 1 nodelet Gossamer Core 1 Memory-Side Processor Gossamer Core 4 ... Migration Engine RapidIODisk I/O 8 nodelets per node 64 nodelets per Chick RapidIO Stationary Core • Multithreaded multicore • Memory-side “processor” for operations in narrow-channel DRAM • Stationary core for OS • Threads migrate in hardware on reads! • Optimize for weak locality Emu: µbenchmarks — 23 Jan 2019 4/27
  • 5. Baseline: Emu STREAM ADD c[i] = a[i] + b[i] GC Config Nodelets Scale Threads BW (MB/s) 1 8 30 512 1,599.86 3 4 29 384 1,288.39 1 64 31 4096 12,790.31 3 32 31 6144 7,241.07 Theor. Peak 8 9,600 Theor. Peak 64 76,800 STREAM results are used to compare bandwidth utilization for the current prototype. 3 GC is experimental and has (had?) half the memory controllers1 1 Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Riedy, Vuduc, Conte. “A Microbenchmark Characterization of the Emu Chick,” (in submission, https://arxiv.org/abs/1809.07696 ). Emu: µbenchmarks — 23 Jan 2019 5/27
  • 6. Thread Spawning in STREAM ADD 64 128 256 512 1024 2048 4096 Number of threads 0 2 4 6 8 10 12 Memorybandwidth(GBs) serial_spawn recursive_spawn serial_remote_spawn recursive_remote_spawn Global: 1GC / nodelet, 64 nodelets Emu: µbenchmarks — 23 Jan 2019 6/27
  • 7. Bandwidth Limited by Computation STREAM and Memory Bandwidths (BW in MB/s) Operation Nodelets Scale Threads BW Current - arithmetic ops 1 200 Ideal - all ld ops 1 1,400 ADD (Measured) 8 30 512 1,600 ADD (Measured) 64 31 4096 12,790 NCDIMM 8 12,800 NCDIMM 64 102,400 Per-GC peak from instruction counts: 175MHz ⇒ 175M cycles second × 1 instruction cycle × 3 mem ops 21 instructions × 8 Bytes 1 mem op = 200MB/s One GC per nodelet hits this peak. Eight GC/nodelet may hit the ideal peak. Emu: µbenchmarks — 23 Jan 2019 7/27
  • 8. Emu Pointer-Chasing Benchmark Data-dependent loads, fine-grained access2 Ordered 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Intra-block shuffle: weak locality 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Full block shuffle: weak locality 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Vuduc, Riedy. “An Initial Characterization of the Emu Chick,” Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 2018. Emu: µbenchmarks — 23 Jan 2019 8/27
  • 9. x86 Pointer-Chasing Benchmark 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Block size (number of 16B elements) 0 20 40 60 80 100 Memorybandwidth(GBs) peak STREAM bandwidth 56 threads 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Block size (number of 16B elements) peak STREAM bandwidth 112 threads block_shuffle intra_block_shuffle full_block_shuffle Haswell results, every pattern is different.1 Emu: µbenchmarks — 23 Jan 2019 9/27
  • 10. Emu Pointer-Chasing Benchmark 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4MBlock size (number of 16B elements) 0 2 4 6 8 10 12 Memorybandwidth(GBs) peak STREAM bandwidth 2048 threads 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Block size (number of 16B elements) peak STREAM bandwidth 4096 threads block_shuffle intra_block_shuffle full_block_shuffle Mostly flat performance, high utilization.1 Emu: µbenchmarks — 23 Jan 2019 10/27
  • 11. SpMV Layout, Synthetic (5pt Laplacian) CSR: Local 1D 2D 1 nodelet 8+ nodelets 8+ nodelets X row v col = x Y X Y = x Y Xx = 102 302 602 802 1002 2002 3002 Number of Rows 0 100 200 300 400 500 600 Bandwidth(MB/s) Data Layout Local layout 1D layout 2D layout Single node, integer entries1 Emu: µbenchmarks — 23 Jan 2019 11/27
  • 12. SpMV Synthetic, Replicated – Single node, 1 GC 0 500 1000 1500 2000 2500 3000 3500 Matrix Size (MB) 0 100 200 300 400 500 600 700 800 900 Bandwidth(MB/s) SpMV (Emu Chick, Single node) No. Threads 64 128 256 512 Good bandwidth utilization with high thread counts and replicated x. Emu: µbenchmarks — 23 Jan 2019 12/27
  • 13. SpMV Synthetic – Single node, 1 GC 0 500 1000 1500 2000 2500 3000 3500 Matrix Size (MB) 0 100 200 300 400 500 600 700 800 Bandwidth(MB/s) SpMV (Emu Chick, Single node) No. Threads 256 512 The 5pt Laplacian without replicating x bounces between migratory and non-migratory areas.1 Emu: µbenchmarks — 23 Jan 2019 13/27
  • 14. SpMV Synthetic – Single node, 1 and 3 GC 102 502 1002 1502 2002 2502 3002 5002 10002 11002 14002 15002 20002 25002 30002 40002 Number of Rows 0 200 400 600 800 1000 Bandwidth(MB/s) SpMV (Emu Chick, Single node, 512 threads) 1GC 3GC 3 GC version: half the nodes, half the memory controllers Emu: µbenchmarks — 23 Jan 2019 14/27
  • 15. SpMV Synthetic – Single node, 1 and 3 GC 0 200 400 600 800 1000 1200 1400 1600 Matrix Size (MB) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 BandwidthUtlization SpMV (Emu Chick, Single node, 512 threads) ctype 1GC 3GC 3 GC results demonstrate that SpMV is compute-bound from address computation. Emu: µbenchmarks — 23 Jan 2019 15/27
  • 16. SpMV Synthetic, Replicated – Multinode, 1 GC 0 2000 4000 6000 8000 10000 12000 14000 Matrix Size (MB) 0 1000 2000 3000 4000 5000 6000 7000 Bandwidth(MB/s) SpMV (Emu Chick, Multi node) No. Threads 64 128 256 512 1024 2048 4096 SpMV scales up to 50% of bandwidth for high thread counts and replicated x.1 Emu: µbenchmarks — 23 Jan 2019 16/27
  • 17. SpMV Synthetic – Multinode, 1 GC 0 2000 4000 6000 8000 10000 12000 14000 Matrix Size (MB) 0 1000 2000 3000 4000 5000 6000 Bandwidth(MB/s) SpMV (Emu Chick, Multi node) No. Threads 1024 2048 4096 But migrations for fetching x hurt with eight nodes. Emu: µbenchmarks — 23 Jan 2019 17/27
  • 18. SpMV Real-World Results, Replicated SpMV multinode bandwidths (in MB/s) for real world graphs (Tim Davis’s collection) along with matrix dimension, number of non-zeros (NNZ), and the average and maximum row degrees. Run with 4K threads. Matrix Rows NNZ Avg Deg Max Deg BW mc2depi 526K 2.1M 3.99 4 3870.31 ecology1 1.0M 5.0M 5.00 5 4425.61 amazon03 401K 3.2M 7.99 10 4494.79 Delor295 296K 2.4M 8.12 11 4492.47 roadNet- 1.39M 3.84M 2.76 12 3811.57 mac_econ 206K 1.27M 6.17 44 3735.54 cop20k_A 121K 2.62M 21.65 81 4520.05 watson_2 352K 1.85M 5.25 93 3486.30 ca2010 710K 3.49M 4.91 141 4075.97 poisson3 86K 2.37M 27.74 145 4031.20 gyro_k 17K 1.02M 58.82 360 2446.36 vsp_fina 140K 1.1M 7.90 669 1335.59 Stanford 282K 2.31M 8.20 38606 287.82 ins2 309K 2.75M 8.89 309412 43.91 Emu: µbenchmarks — 23 Jan 2019 18/27
  • 19. Breadth-First Search with Remote Writes 1. For each vertex in the frontier, try to set self as parent of each neighbor vertex • Done using remote writes, no migrations • Last writer wins (benign race condition) 2. Double-buffer: Check to see which vertices acquired a new parent, and add them to the queue • This step is completely nodelet-local • Caveat: also scans inactive vertices Emu: µbenchmarks — 23 Jan 2019 19/27
  • 20. BFS Pseudo-code Listing 1: BFS algorithm using remote writes queue.push(root) while len(queue) > 0: for src in queue: for dst in out_edges(src): # Remote write new_parent[dst] = src for v in range(num_vertices): if parent[v] == -1: if new_parent[v] != -1: parent[v] = new_parent[v] queue.push(v) Emu: µbenchmarks — 23 Jan 2019 20/27
  • 21. BFS on a Dynamic Data Structure 15 16 17 18 19 20 21 scale 0 20 40 60 80 100 MTEPS Emu single node - Cilk Emu multi-node - Cilk x86 Haswell - STINGER x86 Haswell - Cilk 0 500 1000 1500 EdgeBandwidth(MB/s) Note: Streaming data structure, not statically optimized. But Erdös-Rényi graphs. RMAT: Load imbalance. 3 3 Hein, Eswar, Abdurrahman Yasar, Prasanth Chatarasi, Li, Young, Conte, Ümit Çatalyürek, Vuduc, Riedy, Bora Uçar. “Programming Strategies for Irregular Algorithms on the Emu Chick,” (in submission). Emu: µbenchmarks — 23 Jan 2019 21/27
  • 22. Labeled Subgraph Alignment 1 2 4 8 16 32 64 128 Number of Threads 0 10 20 30 40 50 Speedup Multi-BLK Multi-HCB Single-BLK Single-HCB gsaNA, the first parallel algorithm, strong scaling on DBLP graph (2048 vertices). Block (BLK) vertex layout is slightly worse than Hilbert curve (HCB) layout.3 Emu: µbenchmarks — 23 Jan 2019 22/27
  • 23. Lessons Learned i • Finding appropriate metrics is difficult: • Comparing ASICs (e.g. x86) to FPGA-based prototypes can be unfair either way. • Fraction of peak bandwidth for the idealized problem? • Measured peak is much lower than theoretical peak. • The Chick is compute bound. • SpMV: FLOP/s ∝ BW, level 2 sparse BLAS op. • Graph500 BFS: TEPS ∝ BW Emu: µbenchmarks — 23 Jan 2019 23/27
  • 24. Lessons Learned ii • Distilling observations on architecture ↔ programming model: • Program data location for load (BW) balance. • Remote memory operations v. migration exposes the architecture. • Migrations cost more than it appears. Computation? • Stack spills/access can cause ping-ponging. • How does HW support for top-down (Cilk-ish) affect bottom-up (UPC) PGAS programming? • Memory allocation similar to UPC, SHMEM • UPC++ rpc_ff v. Emu thread migration? Emu: µbenchmarks — 23 Jan 2019 24/27
  • 25. Integrating the Chick with Flexible Infrastructure login rg-adm Slurm Ctl toolbox (NFS) Scheduling, Tools, and Admin Key: Schedulable Resource Physical Resource VM USB device User Resources fpaa-host power-host nvidia-tegra-N nvidia-tegra-1 fpaa-dev rg-db Slurm DBD emu-dev emu-chick ..Nfpga-dev-1 fpga-hmcfpga-intel Powell, Riedy, Young, and Conte. “Wrangling Rogues: Managing Experimental Post-Moore Architectures.” https://arxiv.org/abs/1808.06334 • Available. Plans to integrate with NSF XSEDE. • Scheduler being deployed. • Incorporates Singularity and virtual machines for OS/library versioning. Emu: µbenchmarks — 23 Jan 2019 25/27
  • 26. Umbrella Project: CRNCH Rogues Gallery A physical & virtual space for hosting novel computing architectures, systems, and accelerators. Host / manage remote access for novel architectures! • Emu Chick • FPGA + HMC: 3D stacked • FPAA: Analog/Neuromorphic Amortize effort and cost of trying novel architectures. Break the “but it’s too much work” barrier. http://crnch.gatech.edu/rogues-gallery Emu: µbenchmarks — 23 Jan 2019 26/27
  • 27. Acknowledgments • Srinivas Eswar (GT CSE) • Dr. Eric Hein (GT ECE ⇒ Emu) • Patrick Lavin (GT CSE) • Jiajia Li (GT CSE ⇒ PNNL) • Abdurrahman Yaşar (GT CSE) • Dr. Ümit Çatalürek (GT CSE) • Dr. Tom Conte (GT CS/ECE) • Dr. Bora Uçar (ENS Lyon CNRS) • Dr. Rich Vuduc (GT CSE) • Dr. Jeffrey S. Young (GT CS) Code: • https://gitlab.com/crnch-rg (soon) • https://github.com/ehein6/emu-microbench Emu: µbenchmarks — 23 Jan 2019 27/27