5. 5
SMB 2.0以降とレガシーな SMB 1.0/CIFS とは別物
より速く! FC SANと同等のパフォーマンスをシンプルに
より安全に! 成熟した管理機能、セキュリティ、高可用性
SMB
LAN Manager
Windows 95
Windows NT
SMB 2.0
Windows Vista
& Server 2008
SMB 2.1
Windows 7
& Server 2008 R2
SMB 3.0/3.0.2
Windows 8
& Server 2012
SMB 3.1.1
Windows 10
& Server 2016
1980年代~ 2006年 2009年 2012年 2015年
・コマンドの結合
・読取/書込サイズの拡大
・切断時の透過的再接続
・メッセージ署名の改善
(HMAC SHA-256)
・スケーラビリティの向上
・シンボリックリンクのサポート
・クライアント oplock リース モデ
ル
・大きい MTU のサポート
(最大64KB → 1MB)
・スリープモード移行の強化
18. 18
アプリケーション
(local or remote)
Source Server
(複製元ノード)
Data
Log
1
t 2
Destination
Server
(複製先ノード)
Data
Log
t1 3
2
5
4
• I/O 発生時に、複製元/複製先両方のログディスクへの書き込みを保証する
19. 19
アプリケーション
(local or remote)
Source Server
(複製元ノード)
Data
Log
1
t 2
Destination
Server
(複製先ノード)
Data
Log
t1 5
4
3
6
• 複製先ノードでの書き込みを待たずに I/O 完了とみなす
• ネットワークの帯域幅や遅延で同期モードを利用できない場合に検討
23. 23
同期/非同期モードを選択可能
手動フェールオーバーのみ対応
(PowerShell / Azure Site Recovery)
「汎用ファイルサーバー」シナリオに最適
《Server to Server》 2つのサーバー間でのレプリケーション
《Cluster to Cluster》2つの異なるクラスター間でのレプリケーション
SRV1
SR over SMB3
SRV2
ManhattanDC
JerseyCityDC
NODE1 in FSCLUS NODE2 in DRCLUS
NODE3 in FSCLUS NODE4 in DRCLUS
NODE2 in FSCLUS
NODE4 in FSCLUS
NODE1 in DRCLUS
NODE4 in DRCLUS
SR over SMB3
ManhattanDC
JerseyCityDC
Sever to Server のクラスター版
(機能/制約等は基本的に同一)
クラスターによる高可用性 (HA) と、
レプリケーションによる災害対策
(DR) をそれぞれ独立した機能として提供
「スケールアウトファイルサーバー」シナリオに最適
24. 24
SRV1
SR over SMB3
NODE1 in HVCLUS
SR over SMB3
NODE3 in HVCLUS
NODE2 in HVCLUS NODE4 in HVCLUS
ManhattanDC
JerseyCityDC
高可用性 (HA) と災害対策 (DR)
を組み合わせて提供
自動的なフェールオーバーが可能
フェールオーバー クラスター
マネージャーによる GUI 管理が可能
同期モードのみサポート
「Hyper-V」や「汎用ファイルサーバー」に最適 (スケールアウトファイルサーバーには不適)
《Stretch Cluster》 単一クラスター内でのレプリケーション
《Server to Self》 サーバー内でのボリューム間レプリケーション
ボリュームの移設・転送に最適
(データコピーが実施できない環境など)
25. 25
運用Tips
◦ 「New-SRPartnership」のログサイズが小さすぎるとコマンドが実行できない(エ
ラーになる)
◦ 初回実行には結構時間がかかる
◦ Destination側のデータVolumeは表示されなくなる
◦ Destination側を表示する場合は「Remove-SRPartnership」で一旦関連を解除
構成Tips
◦ Workgrpoup認証 未サポート
◦ 別なWindows Domain同士のStorage Replicaは可能
◦ Cluster to Server , Server to Cluster の複製は未サポート。この場合はServer側を
シングルノードクラスタ構成にする
◦ GUIでコントロールできるのはクラスタ構成のみ
43. Gen-Z(ジェンズィー)とは
44
は
プロトコル です
プロトコル とは 伝送の決まり
DDR3
SAS 12G
Inter-Processor Links
(many variants)
SATA 24G
SAS 24G
NVLink
コンピューターの世界に溢れるプロトコル
PCIe Gen 3
PCIe Gen 2
SAS 6G
46. 参加している企業/組織
47
2018年5月現在のメンバー企業
*HPEはファウンダーですが主催はしていません
Alpha Data IDT PLDA Group
AMD IntelliProp, Inc. Qualcomm Technologies, Inc.
Amphenol Corporation Jabil Circuit Red Hat
ARM Jess-Link Products Co., Ltd. Samsung
Avery Design Systems Keysight Technologies Seagate
Broadcom Ltd. Lenovo Senko Advanced Components, Inc.
Cadence Design Systems, Inc. Lotes Ltd. Simula Research Laboratory
Cavium Inc. Luxshare-ICT SK hynix
Cisco Systems Inc. Mellanox Technologies Ltd. SMART Modular Technologies
Cray Mentor Graphics Spin Transfer Technologies
Dell EMC Micron TE Connectivity Corporation
Electronics and Telecommunications Research Institute Microsemi Storage Solutions, Inc. Toshiba Memory Corporation
Everspin Technologies Mobiveil, Inc. Tyco Electronics (Shanghai) Co., Ltd.
FoxConn Internconnect Technologies Molex University of New Hampshire InterOperability Laboratory
Hirose Electric NetApp VMware
HPE Nokia Western Digital Technologies, Inc. (Sandisk)
Huawei R&D USA Numascale Xilinx
IBM Oak Ridge National Laboratory YADRO Company
Two curves on chart build from left to right. Numbers appear over each data point. “Capability gap” arrow appears once the two curves start to diverge
Every two years, we’re creating more data than through all of history
Our ambitions are growing faster than our computers can improve.
The definition of real time is changing
The old world analyzes the past, which gives us hindsight
Real time means analyzing the new while it’s still new
Real time means insight and foresight
For example, Walmart’s transactional database is about 40 petabytes. And they changed not just the face, but the entire practice of in-store retail.
By May 2016, Facebook was processing 4 petabytes of data a day. That’s the size of Walmart’s database every ten days.
We’re about to take another leap in magnitude in the amount of data available to give us insight.
By 2020, the driver assistance systems alone of 10 million autonomous cars will generate 40,000 petabytes every day. And that’s just self-driving cars. We’ll be generating such levels of data from IoT sensors all over our “physical world”.
Can you capitalize on the data explosion?
99% of data created at the edge is discarded today
Deciding what to keep introduces bias
Bias precludes new insight
Raw data is priceless.
Someone is going to work this out and create whole new industries.
Will it be you, or your competition?
Data Source: K. Rupp, http://www.karlrupp.net/2015/06/40-years-of-microprocessor-trend-data/
DC’s Data Age 2025 study, sponsored by Seagate, April 2017
[Next slide: The New Normal if you want to go deeper into the this graph]
The point here is that hardware stopped getting better a while ago and can’t keep up with the data explosion nearly doubling every couple of years.
The IEEE Computer Society, International Technology Roadmap of Semiconductor (ITRS), R&D investments and capital expenses through government consortiums’ come together to help shape the development of modern fab structure of semiconductors and computer architecture. They’ve also been saying the same thing.
The graph on the left shows the different aspect of the microprocessor improvement performance since the dawn of Moore’s Law. Every time you go up the graph, the performance is 10 times better. The orange line is transistors per square unit and will start to flatten out when Moore’s law ends (about 4-5 years out if things don’t change).
This is called, Dennard Scaling, a scaling law that as transistors get smaller their power density stays constant, so that the power use stays in proportion with area. You can see that the single-thread performance, frequency and typical power has already flattening out in 2015. It started flattening out approximately a decade ago, which in turn led to the rise of multicore processors. Since one processor could no longer be run faster, successive generations of Moore’s law transistors were consumed by making copies of the processor core. This increased the demands on memory systems because of the increase in voracious consumers of data as well as multiple independent access patterns to memory, which defeated decades’ worth of hierarchy-hiding techniques.
Additionally, memory is closer to the end of scaling than processors, due to the physical characteristics of semiconductor memory devices. There are a few more generations left to processor designers, but as they eke out those last few generations, the increases in performance are stagnating and the operational and capital costs associated with these penultimate systems are rising.
(FYI: Penultimate is the next to last, as in 7nm is the penultimate silicon process step, while 5nm is the ultimate or terminal improvement, but neither offer the step over step improvement of prior years.)
The only thing that is not flattening out is the data growth graph on the right. We generate zettabytes of data which nearly doubles every two years according to IDC. There is an exponential growth curve in rich data that conventional technology with linear responses just can’t keep up with.
Link: but it’s not just about the sheer size of the data. You’ve all heard that story a million times. It’s about new use cases that use data in a whole different way.
[Next: systems of record, engagement and action]
Detail dive on “The End of Cheap Hardware”.
The point here is that hardware stopped getting better a while ago.
Dennard Scaling ended approximately a decade ago, which in turn led to the rise of multicore processors. Since one processor could no longer be run faster, successive generations of Moore’s law transistors were consumed by making copies of the processor core. This increased the demands on memory systems because of the increase in voracious consumers of data as well as multiple independent access patterns to memory, which defeated decades’ worth of hierarchy-hiding techniques.
Additionally, memory is closer to the end of scaling than processors, due to the physical characteristics of semiconductor memory devices. There are a few more generations left to processor designers, but as they eke out those last few generations, the increases in performance are stagnating and the operational and capital costs associated with these penultimate systems are rising.
So what are we actually going to do? Let’s start with this picture, one that would be very familiar to Dr. von Neumann and one that you’ll find in everything from a smartphone to a supercomputer. We have compute on the left, attached to a precious amount of high performance volatile memory on a dedicated connection and then connected over a copper interconnect we have I/O devices that allow us to send information through time (storage) or through space (network), but when we do it is thousands to tens of thousands of times slower than getting to memory, so we do it by blocking and copying, we do it with file opens, seeks and sockets, hundreds of thousands of lines of library, kernel and driver software.
[Next: MDC concept build out…]
So the first step is to look at where the data lives, where are we going to contain that exponential growth? We start by pooling the vast majority of information in a high performance pool with the performance characteristics of memory and the cost characteristics of storage.
[Next: MDC concept build out…]
We pool the memory and place it on a memory-semantic fabric, and interconnection that is specifically designed to access memory as memory, directly from individual load and store instructions from the microprocessor. You’ll see that we’ve drawn here the symbol for the Memristor, the fourth fundamental electrical device first theorized by Leon Chua at UC Berkeley and first realized as a practical device by our senior Fellow Stan Williams and now being co-developed for production by HPE and Western Digital, but it is one of a new class of memory devices, Intel 3D Xpoint, Phase Change, Spin Torque, Resistive RAM, that share breakthroughs in capacity and cost and are also natively non-volatile.
When you can’t shrink devices any more in X or Y, there is only one way to continue to grow and that is up and the regular arrays of rows and columns these new memory devices are much more amenable to Z axis scaling than the high power random logic of computation.
[Next: MDC concept build out…]
The next step is to look at the interconnect and here is where we bring in the photonics. Now we’ve had fiber optic communications for decades, it’s the basic technology for the global telecommunications backbone. Whenever we examine a point to point communications link and we want to decide between photons and electrons, we compute a ratio: how many bits do you want to send versus how much time (seconds), money (dollars), energy (watts) and distance(meters) do you have. Think of trying to talk to someone in a crowded bar. You can talk slower [seconds], you can lean in [meters], you can talk louder [watts] or you can go to better place [$$$$]. Plug in all those numbers for electronics and photonics and you’ll come up with two ratios and you’ll pick the best. At top end Ethernet or IB speeds, if you’re going over 10m you’ll want at least an active optical electrical cable. With our VCSEL and Silicon Photonics and the ability to integrated these with both computation and memory devices, that crossover point at top end speeds will drop from tens of meters to tens of centimeters.
But the real win isn’t in individual point to point link efficiency, it is in design freedom. If all you do is adjudicated point-to-point connections between electronics and photonics, you’ll get an ROI, but the real payoff comes when you fully capitalize on the relatively lossless nature of photonics. What’s really amazing is that whether a photon travels 10cm or 1000m, it does it for the same energy profile. The only difference is 5 nanoseconds per meter of fiber travelled. Lossless also means no emissions or interference. It all adds up to enabling topologies that simply cannot be execute in electronic communications and that ends up being fundamental because when the software teams look at that memory pool they have only two questions “what’s the capacity and what’s the latency?” There isn’t a firm line, but most software teams begin to lose interest in memory if it’s more than 300ns to 500ns away, beyond that they’ll treat it as an I/O device. Photonics allows us to realize the memory pool accessible as memory at every scale we find interesting, rack to aisle to datacenter.
Now the last bit is the compute and here the electron is still king. But when the transistors stop getting smaller (and they’ve already stopped getting cooler, faster, and cheaper) there has to be something better to do with them than stamp out yet another copy of a general purpose standard instruction set core. We need to be able to quickly and economically add application specific accelerators to compute, and the memory semantic fabric allows us to do just that. Heterogeneous compute with GPUs, DSPs, ASICs, FPGAs as peers with general purpose compute all sharing access to the same massive memory pools.
And we still have networks, but we’ve now re-established the I/O in the role it was originally designed for, I/O as a peripheral because it’s at the periphery of the system. So much of the “I/O” we do today is actually memory sharing between those isolated islands of memory segregated behind compute. With memory emancipated away from compute, we can now let all the compute elements share memory structures directly, never having to copy, serialize, buffer or expose information to corruption or compromise.
[Next: MDC concept build out…]
[This is why メモリ-Driven Computing is “Powerful” (addressing the first pillar). Note, this slide contains the single most important concept in the Memory-Driven Computing architecture.]
The challenges are always on building enough memory to keep up with compute. Memory has always been the scarce resource (never enough volume/resources).
Traditional computers chop up your information – the data – to match the limitations of the processor
Processor as gatekeeper
We flip that around and puts the data first – bringing processing to the data
Processor almost irrelevant – can swap out to suit task
We call this Memory-Driven Computing
SoC, universal memory and photonics are the key parts of the architecture of the future
With this architecture, we can ingest, store and manipulate truly massive datasets while simultaneously achieving multiple orders of magnitude less energy/bit
Q: What is HPE doing here that is truly different?
A: New technologies are not substitutional - we’re re-architecting
[Next: Memory-Driven Computing concept build out…]
The point of this slide is that that you have to have all of these in order to be MDC
[Next: MDC is Open]
What is Gen-Z and how does it set the stage for true Memory-Driven Computing for HPE?
Gen-Z is an open systems interconnect designed to provide memory semantic access to data and devices via direct attached, switched or fabric topologies. We need a new standard because existing interconnects have critical limitations that prevent them being useful in future computing architecture. Basically, we need more speed, the ability to address a bigger memory space, and more flexibility. Today, each computer component is connected using a different type of interconnect. Memory connects using DDR, hard drives via SATA, flash drives and graphics processing units via PCIe, and so on.
This technology will also set the stage for true Memory-Driven Computing. Computing, allowing the manipulation of huge data sets residing in large pools of fast, persistent memory. We also see Gen-Z as a critical component of HPE’s Memory-Driven Computing architecture. The memory fabric for The Machine program will contribute to shaping specifications of Gen-Z. This work will also innovate the future of High-Performance Computing platforms, as well as extending our composable infrastructure technology will contribute to.
Referring back to how Photonics Fabric destroys distance and how 100 of racks can behave as a single server.
The Gen-Z protocol can address a flat fabric of 16 million components.
Gen-Z could theoretical address 292 bytes, or 4,096 yottabytes. Or a thousand times bigger than our digital universe today."
Please avoid characterizing Gen-Z as an HPE-led initiative or driving The Machine research project
HPE is a founding member, but is one of the many companies participating in the Gen-Z Consortium
HPE hopes to benefit from the consortium just as any other participating consortium member – Gen-Z is owned by the consortium
The Machine prototype is where we’ve been working out our memory-semantic fabric ideas and this has shaped our contributions into the Gen-Z consortium. We don’t want to and shouldn’t claim that it is Gen-Z because that specification is still in draft form and we can’t claim compliance to it.
[Next: Gen-Z Consortium]
The point here is that HPE isn’t the only one seeing the need for a new interconnect. Industry leaders across the value chain agree with us.
A consortium called Gen-Z was unveiled on Oct. 11, 2016, which comprises of leading technology companies dedicated to creating and commercializing the Gen-Z technology. HPE is a member of the consortium, which includes: AMD, ARM, Broadcom, Cray, Dell, Hewlett Packard Enterprise, Huawei, IDT, Micron, Samsung, SK Hynix, and Xilinx and is constantly expanding and recruiting the list of members.
Please avoid characterizing Gen-Z as an HPE-led initiative or driving The Machine research project
HPE is a founding member, but is one of the many companies participating in the Gen-Z Consortium
HPE hopes to benefit from the consortium just as any other participating consortium member – Gen-Z is owned by the consortium
The Machine prototype is where we’ve been working out our memory-semantic fabric ideas and this has shaped our contributions into the Gen-Z consortium. We don’t want to and shouldn’t claim that it is Gen-Z because that specification is still in draft form and we can’t claim compliance to it.
For more information…
Gen-Z Consortium Press Release http://genzconsortium.org/news-type/press-release/
Gen-Z Website http://genzconsortium.org/
Alpha Data Jabil Circuit Red Hat
AMD Jess-Link Products Co., Ltd. Samsung
Amphenol Corporation Keysight Technologies Seagate
ARM Lenovo Senko Advanced Components, Inc.
Avery Design Systems Lotes Ltd. Simula Research Laboratory
Broadcom Ltd. Luxshare-ICT SK hynix
Cadence Design Systems, Inc. Mellanox Technologies Ltd. SMART Modular Technologies
Cavium Inc. Mentor Graphics Spin Transfer Technologies
Cray Micron TE Connectivity Corporation
Dell EMC Microsemi Storage Solutions, Inc. Toshiba Memory Corporation
Everspin Technologies Mobiveil, Inc. Tyco Electronics (Shanghai) Co., Ltd.
FoxxConn Internconnect Technologies Molex University of New Hampshire InterOperability Laboratory
Hirose Electric NetApp VMware
HPE Nokia Western Digital Technologies, Inc. (Sandisk)
Huawei R&D USA Numascale Xilinx
IBM Oak Ridge National Laboratory YADRO Company
IDT PLDA Group
IntelliProp, Inc. Qualcomm Technologies, Inc.
[Next: Memory-Driven security framework]
[This is why メモリ-Driven Computing is “Powerful” (addressing the first pillar). Note, this slide contains the single most important concept in the Memory-Driven Computing architecture.]
The challenges are always on building enough memory to keep up with compute. Memory has always been the scarce resource (never enough volume/resources).
Traditional computers chop up your information – the data – to match the limitations of the processor
Processor as gatekeeper
We flip that around and puts the data first – bringing processing to the data
Processor almost irrelevant – can swap out to suit task
We call this Memory-Driven Computing
SoC, universal memory and photonics are the key parts of the architecture of the future
With this architecture, we can ingest, store and manipulate truly massive datasets while simultaneously achieving multiple orders of magnitude less energy/bit
Q: What is HPE doing here that is truly different?
A: New technologies are not substitutional - we’re re-architecting
[Next: Memory-Driven Computing concept build out…]
Enabling new software to accomplish the (hitherto) impossible
Changing physics to enable Memory-Driven Computing to create optimal algorithms to take advantage of the new architecture:
Memory abundance. Memory was scarce, now it’s abundant, how can we attack a problem differently? Precompute and look up, not recalculate. Much faster, but also more energy efficient: One energy expenditure per answer – sustainability. Need non-volatility to make this work – want the “energy tax of storing” to be zero.
Memory shared with just the right compute– Heterogeneous compute – right compute the right distance from memory – all looking at the same memory simultaneously. No passing data around. Neural net teams are “in love” with this concept
Non-volatility of memory – store in perpetuity with no energy except when accessing
Dynamic range – one code from gigabytes to petabytes – no re-implementing as you scale up
Different problems will find one or more of these axes transformational
[Next: Memory-Driven Computing for Machine Learning]
How do we get t MDC?
The Machine project
Introduction to The Machine program – This is an example of how we’re making MDC happen from HPE/Hewlett Packard Labs.
[Next: First instantiation of The Machine prototype unveiled]
This is what you can achieve if you throw away 60 years of software assumptions and legacy.
Spark is one of the leading open source tools to do in-memory analytics on a cluster of servers.
As part of another project, we wanted to see what we could do with Spark if we adapted to a large memory system. The results were astounding.
15X s compared to “vanilla” Spark. We can also run 20x bigger data sets that won’t run all at all normally. All from rewriting a couple of hundred lines of code.
[harnesses the “well-connected” vector. Not big memory, just many processing elements on the pool of memory]
[The other three harnesses abundance: need all the memory the same “distance” away]
Similarity search is used for things like image search, genomics etc. and is outpacing supercomputer development today.
Comparison is with standard disk-based scale-out Map/Reduce. This is also on a 20x bigger data set.
Graphs increasingly represent our connected world today. Graph inference is how you make predictions using a small known set of data.
Comparison is with GraphLab, the state-of-the-art today.
Financial modelling is Monte Carlo simulations used to predict things like derivatives pricing and portfolio risk management.
Comparison is with open source QuantLib package.
[Next: What makes up MDC]
Developer Experience for The Machine
Programming and analytics tools:
Sparkle: Optimized Spark to run 10x faster on large-memory systems.
Managed Data Structures (MDS): Enables developers to declare in-memory data structures as persistent, and directly reuse them across programming languages and processes.
Fabric optimistic engine for data unification: A database engine that speeds up applications by taking advantage of a large number of CPU cores and persistent memory.
Fault-tolerant programming model for NVM: Adapts existing multi-threaded code to store and use data directly in persistent memory. Provides simple, efficient fault-tolerance in the event of power failures or program crashes.
Persistent memory toolkit: We have tools that support memory whose contents outlive the processes that allocate, populate, and manipulate the memory.
Operating system support:
Linux for Memory-Driven Computing: Modifications to the Linux operating system necessary to support Memory-Driven Computing.
LFS: Exposes Fabric-attached memory as a memory-mapped shared file system.
FAM-Atomics Library: Provides synchronization and locking capabilities that are atomic across the whole fabric attached system.
Librarian: Manages cross-node allocation of fabric memory.
Emulation/simulation tools:
Fabric Attached Memory Emulation: An environment designed to allow users to explore the new architectural paradigm of The Machine
Performance emulation for NVM latency and bandwidth: A DRAM-based performance emulation platform that leverages features available in commodity hardware to emulate different latency and bandwidth characteristics of future byte-addressable NVM technologies.
[Next: The Machine User Group]
On May 16, 2017, Hewlett Packard Enterprise announced prototype of the first Memory-Driven Computing is the largest single-memory system on the planet,
It’s capable of working with up to 160 terabytes (TB) of data at the same time.
Linux-based operating system running on ThunderX2
Cavium’s flagship ARMv8-A workload optimized System on a Chip.
Interconnect between enclosures are using photnics with our new X1 photonics module.
The image on this slide is the real 40 nodes that lives in Ft. Collins, CO. Here are other assets that you might find helpful to show:
Memory-driven computing using large volumes of nonvolatile memory is expected to dramatically increase the ability to analyze very large and various types of data.
In the German Center for Neurodegenerative Diseases (DZNE) is an organization focused on the research of Alzheimer's, which is said to be suffering by 1 in 10 people aged over 65 in the world.
In order to comprehensively analyze patient's diagnosis, genetic information, MRI images and so on
They thought that this research required a new computer architecture capable of processing large amounts of data.
with one component of their overall data analytics pipeline and are already getting over 40X speed improvements.Furthermore, it is thought that there is a possibility to speed up to 100 times by raising the accuracy of machine learning.
DZNE has not been able to handle large amounts of data at once,
But By processing large-scale data at the same time, enables to find hidden correlations that have not been found so far
For more technical explanation:
For that application, we saw an improvement of getting over 40x speed improvements (on the Superdome X) over published results. Thus some of the advantage we saw was from better hardware (Superdome X), and some was from our code refactoring to Memory-Driven Computing principles.
Our results on the 40-Node Prototype showed that as number of nodes are increased, we can increase the number of application instances linearly. This means that the overall work DZNE can do will scale linearly as number of nodes are added for their pipeline.
This is just one of many applications that are used by DZNE. If other applications show similar results, we believe that the overall computational pipeline can be made up to two orders of magnitude (100x) faster.
[Next: talk about our The Machine prototype node boards]
This is our longer term research.
After several years of intense activity from the big key players, all indicators show that cognitive research and development, tools and applications for business solutions are accelerating. While many of the trends focus on machine and deep learning, the whole field of artificial intelligence and cognitive systems is attracting high interest. At Hewlett Packard Labs we’re explore futuristic research in silicon photonics, neuromorphic computing and optical computing. Here are 2 examples of the types of accelerators that would highly benefit from Memory-Driven Computing.
Neuromorphic Computing:
Images of memristor array & the Dot Product Engine testbed (at the bottom)
At Hewlett Packard Labs, we’re doing research based on brain inspired architectures, not to replace humans, but to replicate the tricks the brain uses. We call this “Neuromorphic Computing”.
Our goal is to duplicate the fact that humans consume very little energy to do the computations that we do. Tens to thousands times less energy than digital computers.
And brains don’t look like the processor units in computers. They are tremendous networks of neurons and synapses – vast webs of interconnections.
We are taking advantage of an emerging technology developed at HPE over the past few years – Memristor. It turns out that the brain’s interconnectedness can be duplicated perfectly in Memristor crossbar arrays They are very low power and shrinkable devices in order to build lower power computing systems.
According to our forecasts, this will have 5 orders of magnitude improvement in processing throughput divided by power for some important applications.
Matrix multiplication lies at the heart of many computationally intensive applications and algorithms today – everything from signal processing, image processing, fraud detection, language transcription and translation
Most of these applications are performing very similar computations (or matrix multiplications) over and over again. Instead of moving those stored values around every time we want to do this multiplication, we are bringing the computation to the data in memory. This can be incredibly power efficient, save time and reduce computing complexity.
We’ve developed our Memristor technology to mimic brains synapses in our Dot Product Engine to perform these parallel operations and complex computations at lower power and faster speeds.
The key behind the Dot Product Engine is what we are doing with in-memory computing. This ties into the Memory-Driven Computing path where we go from high density memory, non-volatile memory, that is accessible to every accelerator to perform computation in memory. This way it can plug into the Memory-Driven Computing architecture. It is not general purpose and part of general cores and system of accelerators that tackle problems for intelligent system workloads to handle changing workloads for the business.
The notion of a system having heterogeneous types of processors ties into the BIG advantages of Memory-Driven Computing. Ultimately, we want to build systems that determine the best type of processing for the task at hand. It is about having the right processing for what is needed, and we are positioning this next phase of computing architecture to be open to what comes along the way in the future. The Memory-Driven Computing architecture is designed for this; a future neuromorphic chip could slot right in.
Optical Computing:
Images of a mock up of an optical neuromorphic chip and optical computing chip (at the bottom)
Optical computing is the attempt to use photons instead of electrons for computation, in other words: we process data signals in the optical domain instead of the traditional electronic domain. This is what is cool about Optical Computing!
The data is processed in the optical domain while it is travelling, instead of transferring the data to where we can do processing on it.
Similar to Memory-Driven Computing, you process data in memory, but here you can process the data while it is in flight. You don’t have to stop the data movement and reconvert the data. We are bringing processing closer to data, instead of data to the processor.
Today, every time we send and receive optical information through optical fibers, when we want to process it, we need to convert it to the electronic domain, process the data there and then convert the result back to the optical domain.
With over 1,000 optical parts in chip, a record number of photonics components working together to compute with light.
This is what is cool about Optical Computing. The data is processed in the optical domain while it is travelling, instead of transferring the data to where we can do processing on it. Similar to Memory-Driven Computing, you process data in memory, but here you can process the data while it is in flight. You don’t have to stop the data movement and reconvert the data. We are bringing processing closer to data, instead of data to the processor.
Today, every time we send and receive optical information through optical fibers, when we want to process it, we need to convert it to the electronic domain, process the data there and then convert the result back to the optical domain.
[Next: Roadmap for Persistent Memory]