The Rogues Gallery is a new experimental testbed that is focused on tackling "rogue'' architectures for the Post-Moore era of computing. While some of these devices have roots in the embedded and high-performance computing spaces, managing current and emerging technologies provides a challenge for system administration that are not always foreseen in traditional data center environments.
We present an overview of the motivations and design of the initial Rogues Gallery testbed and cover some of the unique challenges that we have seen and foresee with upcoming hardware prototypes for future post-Moore research. Specifically, we cover the networking, identity management, scheduling of resources, and tools and sensor access aspects of the Rogues Gallery and techniques we have developed to manage these new platforms. We argue that current tools like the Slurm resource manager can support new rogues without major infrastructure changes.
Dubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore Architectures
1. Wrangling Rogues: A Case Study on
Managing Experimental Post-Moore
Architectures
Will Powell, Jason Riedy, Jeffrey Young, Tom Conte
Center for Research into Novel Computing Hierarchies at Georgia Tech
1 August 2019
2. Outline
What is the CRNCH Rogues Gallery?
Current Rogues
Emu Chick
3D Stacked Memories and FPGAs
Neuromorphic / Analog Hardware (FPAA)
Management lessons learned
Helpful points
Painful points
3. Apps: Massive+-scale data analysis
Cyber-security Identify anomalies, malicious actors
Health care Find outbreaks, population epidemiology, similar
patient association
Social networks Advertising, searching, grouping
Intelligence Decisions at scale, regulating markets, smart &
sustainable cities
Systems biology Understanding interactions, drug design
Power grid / Smart cities Disruptions, conservation, prediction
Irregular data access. Changing data.
Wrangling Rogues — 1 Aug 2019 3/22
4. High-Performance Data Analysis (HPDA)
Novel applications:
• Data at scale and speed needs new ideas for
computing analysis.
• “Big data” platforms fare poorly v. a single thread
plus large SSD even for static data sets. (McSherry,
Isard, Murray. “Scalability! But at what COST?” HotOS
XV, 2015.)
• Many high-level codes are written and re-written to
answer one question: need flexibility.
• Some primitives may be tuned and re-used.
Wrangling Rogues — 1 Aug 2019 4/22
5. Why do we need rogues?
Rogue: Someone who goes their own way, who breaks
away from the crowd.
• Current architectures are hitting limits on
manufacturing, heat dissipation, memory latency...
• What happens when novel prototypes hit reality?
• Designers need feedback, a software ecosystem, and
trained students.
Wrangling Rogues — 1 Aug 2019 5/22
6. What is the Rogues Gallery?
Hardware! “I’ll tell you later.”
Wrangling Rogues — 1 Aug 2019 6/22
7. Introducing the CRNCH Rogues Gallery
CRNCH Rogues Gallery
A physical & virtual space for hosting novel computing
architectures, systems, and accelerators since fall 2017.
Host / manage remote access for novel architectures to
• kick-start software ecosystems (e.g. Kokkos),
• leverage real applications to train students, and
• provide rapid feedback to architects.
Amortize effort and cost of trying novel architectures.
Break the “but it’s too much work” barrier.
http://crnch.gatech.edu/rogues-gallery
Wrangling Rogues — 1 Aug 2019 7/22
8. Rogues Gallery summary
Programmable
Interconnection
Networks
Neuromorphic
Accelerators
FPGA
Traditional Computation
and Prototype Accelerators
Near-memory Computation
and Data Rearrangement
Portability APIs – Kokkos, GraphBLAS,
Neuromorphic APIs
Training materials and Tutorials
Emu
Chick FPAA (GT) Others..
Future Devices
RISC-V
RQL Devices
Quantum
Benchmarks and Data Sets for
Irregular and ML applications
Rogues Gallery
Hosted Hardware
Metastrider
Tools and Resources
Non-volatile
Memory
High-bandwidth Memory
Wrangling Rogues — 1 Aug 2019 8/22
9. Current Rogues
Current Rogues
Emu Chick
3D Stacked Memories and FPGAs
Neuromorphic / Analog Hardware (FPAA)
Wrangling Rogues — 1 Aug 2019 9/22
10. Emu Technology’s Chick
• “Migratory Memory Side Processing” to
exploit weak locality.
• Data for graph edge attributes,
documents / medical records, etc. reside
nearby even if accessed irregularly.
• Moving threads to data on reads: all
reads are local, so lower latency.
1 nodelet
Gossamer
Core 1
Memory-Side Processor
Gossamer
Core 4
...
Migration Engine
RapidIODisk I/O
8 nodelets
per node
64 nodelets
per Chick
RapidIO
Stationary
Core
Wrangling Rogues — 1 Aug 2019 10/22
12. 3D Stacked Memory and FPGAs
• FPGA + HMC / DRAM: Enable experiments with
“near-memory” and memory-centric processing.
• FPGA platforms prototype non-traditional
accelerators like Automata, sparse data engines, etc.
• Current work is supported in part by Micron
hardware donation.
Wrangling Rogues — 1 Aug 2019 12/22
13. FPGA & memory results
Hadidi, Asgari, Young, Mudassar, Garg, Krishna, Kim. “Performance
Implications of NoCs on 3D-Stacked Memories: Insights from the
Hybrid Memory Cube (HMC),” ISPASS 2018
• Characterizations
with FPGA and Hybrid
Memory Cube show
latency/bandwidth
tradeoff.
• Other FPGA work is
focused on compilers,
HPC prototyping, and
sparse algorithms for
Intel and Xilinx FPGAS.
Wrangling Rogues — 1 Aug 2019 13/22
14. Neuromorphic systems
• Field-Programmable Analog Array
(FPAA) System-On Chip, designed in
the lab of Dr. Jennifer Hasler.
• Analog + digital to achieve
unprecedented power and size
reductions.
• Potential on-chip/package accelerator.
• Adding other neuromorphic systems
Wrangling Rogues — 1 Aug 2019 14/22
17. Rogues Gallery structure
login /
notebook
rg-adm
Slurm Ctl
toolbox
(NFS)
Scheduling,
Tools, and
Admin
Key:
Schedulable Resource
Physical Resource
VM
USB device
User
Resources
fpaa-host
power-host
nvidia-tegra-N
nvidia-tegra-1
fpaa-dev
rg-db
Slurm DBD
emu-dev emu-chick
..Nfpga-dev-1
fpga-hmcfpga-intel
Wrangling Rogues — 1 Aug 2019 17/22
18. Management lessons learned
• Invest in rogues, but realize some technology may be
short-lived.
• Minimize custom management effort.
• Physical hardware resources not dedicated to
rogues should be kept to a minimum.
• Don’t spend $ on non-rogues.
• Collaboration and commiseration is key.
• Rogues need a community to succeed.
• Licensing and appropriate identity management are
tough but necessary challenges.
• Use network isolation when needed.
Wrangling Rogues — 1 Aug 2019 18/22
19. Helpful points
• Network isolation provides security.
• Well, enough given limited usefulness.
• Singularity is great for build environments.
• HW start-ups cannot afford supporting every
OS/arch.
• IT cannot afford supporting every OS/arch.
• Companies must be friendly...
• Inspired undergrads are wonderful!
• Modernizing tools (FPAA)
• Building out demonstrations
• http://www.vip.gatech.edu/teams/
rogues-gallery
Wrangling Rogues — 1 Aug 2019 19/22
20. Painful points
• SLURM aspects:
• Managing slurmd.conf.
• Building on all the OS/arch combos.
• Few light-weight management options.
• salt-ssh, ansible on some
• Hardware access for rebooting, reseating.
• Many programming interfaces, few people
• Kokkos, TENNLab, more...
• Still need to tackle “sensitive” data, including some
FPGA IP
• Reproducible / replicable / audit-able results
Wrangling Rogues — 1 Aug 2019 20/22
21. Rogues Gallery: Active and Growing
• Integrating FPAAs and toolchain
• Tight development loop with Emu
• Active research projects and publications
• Community building via tutorials & talks
• New approaches to benchmarking, quantum
software stacks, neuromorphic toolchains, ...
CRNCH Rogues Gallery connects researchers and
students with novel architectures and architects with
upcoming applications.
Let us host / manage your neat stuff!
http://crnch.gatech.edu/rogues-gallery
Wrangling Rogues — 1 Aug 2019 21/22
22. Acknowledgments
Fantastic students and colleagues:
• Srinivas Eswar (GT CSE)
• Dr. Eric Hein (GT ECE ⇒ Emu)
• Patrick Lavin (GT CSE)
• Dr. Jiajia Li (GT CSE ⇒ PNNL)
• Abdurrahman Yaşar (GT CSE)
• Chunxing Yin (GT CSE)
• Dr. Jeffrey S. Young (GT CS)
• Dr. Tom Conte (GT CS/ECE)
• Dr. Vivek Sarkar (GT CS)
• Dr. Ümit Çatalürek (GT CSE)
• Dr. Bora Uçar (ENS Lyon CNRS)
• Dr. Rich Vuduc (GT CSE)
Code (ideally will have links from crnch.gatech.edu):
• https://gitlab.com/crnch-rg
• https://github.com/ehein6/emu-microbench
Other testbeds:
• ORNL: ExCL
• PNNL: CENATE
• Argonne
• Sandia HAAPS
• Berkeley: AQCT
• (others?)
Wrangling Rogues — 1 Aug 2019 22/22
23. External Image Credits
• “What’s that watermelon doing there?”: copyright MGM, used for identification
• Oscar Wilde: public domain, obtained from Wikipedia
• Edna St. Vincent Millay: public domain, obtained from Wikipedia
• Dread Pirate Roberts: copyright 20th
Century Fox, used for identification
• Mary Jackson, Katherine Goble Johnson, Dorothy Vaughan (Hidden Figures): copyright 20th
Century Fox,
used for identification
• Malcolm Reynolds: copyright Universal Pictures, used for identification
• Rogue One: copyright Walt Disney Studios Motion Pictures, used for identification
• The Story of Karrawingi, the Emu (cover): copyright estate of Leslie Rees, used for identification
• Big Hero 6: copyright Walt Disney Studios Motion Pictures, used for identification
Wrangling Rogues — 1 Aug 2019