SlideShare a Scribd company logo
1 of 23
M. S. Ramaiah School of Advanced Studies 1
CSN2502 ACA Presentation
Anshuman Biswal
PT 2012 Batch, Reg. No.: CJB0412001
M. Sc. (Engg.) in Computer Science and Networking
Module Leader: Padma Priya Dharishini P.
Module Name: Advanced Computer Architecture
Module Code : CSN2502
Array Processor
M. S. Ramaiah School of Advanced Studies 2
Marking
Head Maximum Score
Technical Content 10
Grasp and Understanding 10
Delivery – Technical and
General Aspects
10
Handling Questions 10
Total 40
M. S. Ramaiah School of Advanced Studies 3
Presentation Outline
• History of Array Processor
• Array Processor
• How Array Processor can help?
• Array Processor classification
• Array Processor architecture
• Performance and scalability of array processor
• Why use the array processor?
• When to use and when not to use the array processor?
M. S. Ramaiah School of Advanced Studies 4
History of Array processor
• Array processor development began in early 1960’s at Westinghouse
in their Solomon project.
• Solomon’s goal was to improve the math performance by using large
number of math co-processors under the control of a single CPU.
• The CPU fed a single common instruction to all of the arithmetic logic
units (ALUs), one per "cycle", but with a different data point for each
one to work on.
• This allowed the Solomon machine to apply a single algorithm to a
large data set, fed in the form of an array.
• In 1962, Westinghouse cancelled the project, but the effort was
restarted at the University of Illinois as the ILLIAC IV.
• In 1972 , it was finally delivered to the world and till 1990’s it formed
the basic design of the fastest machine.
M. S. Ramaiah School of Advanced Studies 5
Array Processor
• Array processor is a synchronous parallel computer with multiple ALU
called processing elements ( PE) that can operate in parallel in lock
step fashion.
• It is composed of N identical PE under the control of a single control
unit and a number of memory modules.
• Array processors also frequently use a form of parallel computation
called pipelining where an operation is divided into smaller steps and
the steps are performed simultaneously.
• It can greatly improve performance on certain workloads mainly in
numerical simulation.
• These machines appeared in the 1970’s and
dominated supercomputer design through the 1970s into the 90s,
notably the various Cray platforms.
• The rapid rise in the price-to-performance ratio of
conventional microprocessor designs led to the vector supercomputer's
demise in the later 1990s.
M. S. Ramaiah School of Advanced Studies 6
How array processor can help?
• In general terms, CPUs are able to manipulate one or two pieces of
data at a time. For instance, most CPUs have an instruction that
essentially says "add A to B and put the result in C". The data for A, B
and C could be—in theory at least—encoded directly into the
instruction. However, in efficient implementation things are rarely that
simple. The data is rarely sent in raw form, and is instead "pointed to"
by passing in an address to a memory location that holds the data.
Decoding this address and getting the data out of the memory takes
some time.
• In order to reduce the amount of time this takes, most modern CPUs
use a technique known as instruction pipelining in which the
instructions pass through several sub-units in turn.
• Array processors take this concept one step further. Instead of
pipelining just the instructions, they also pipeline the data itself. This
allows for significant savings in decoding time.
M. S. Ramaiah School of Advanced Studies 7
How Array Processor can help?: An Example
• Consider the simple task of adding two groups of 10 numbers
together. In a normal programming language you might have done
something as
• execute this loop 10 times
• read the next instruction and decode it
• fetch this number fetch that number
• add them
• put the result here
• End loop
• But to an array processor this tasks looks as
• read instruction and decode it
• fetch these 10 numbers
• fetch those 10 numbers
• add them
• put the results here
M. S. Ramaiah School of Advanced Studies 8
How Array Processor can help?
• There are several savings inherent in this approach.(Based on the
example in previous slide)
A. Only two address translations are needed
B. Fetching and decoding the instruction is done only one time
instead of ten times
C. The code itself is also smaller, which can lead to more efficient
memory use.
D. It improve performance by avoiding stalls.
M. S. Ramaiah School of Advanced Studies 9
Array Processor Classification
• SIMD ( Single Instruction Multiple Data ): is an array processor that has a
single instruction multiple data organization.
 It manipulates vector instructions by means of multiple functional unit responding to a
common instruction.
 ILLIAC-IV, CM -2( Connection Machine ),MP-1(MasPar-1), BSP (Bulk Synchronous
Parallel )
• Attached array processor: is an auxiliary processor attached to a
general purpose computer.
 Its intent is to improve the performance of the host computer in specific numeric
calculation tasks.
M. S. Ramaiah School of Advanced Studies 10
Array Processor Architecture - SIMD
• SIMD has two basic configuration
– a. Array processors using RAM also known as ( Dedicated
memory organization )
• ILLIAC-IV, CM-2,MP-1
– b. Associative processor using content accessible memory also
known as ( Global Memory Organization)
• BSP
M. S. Ramaiah School of Advanced Studies 11
SIMDArchitecture – Array Processor using RAM
Host
Computer
•Here we have a Control Unit
and multiple synchronized
PE.
•The control unit controls all
the PE below it .
•Control unit decode all the
instructions given to it and
decides where the decoded
instruction should be
executed.
•The vector instructions are
broadcasted to all the PE.
This broad casting is to get
spatial parallelism through
duplicate PE.
•The scalar instructions are
executed directly inside the
CU.
M. S. Ramaiah School of Advanced Studies 12
SIMDArchitecture – Array Processor using RAM
Control Unit
• A simple CPU
• Can execute instructions w/o PE intervention
• Coordinates all PE’s
• 64 64b registers, D0-D63
• 4 64b Accumulators A0-A3
• Ops:
– Integer ops
– Shifts
– Boolean
– Loop control
– Index Memory
D0
D63
A0
A3
A1
A2
ALU
CU
M. S. Ramaiah School of Advanced Studies 13
SIMDArchitecture – Array Processor using RAM
Processing Element
• A PE consists of an ALU with working registers and
a local memory PMEMi which is used to store
distributed data.
• All PE do the same function synchronously under the
super vision of CU in a lock-step fashion.
• Before execution in a PE the vector instructions
should be loaded into its PMEM .
• Data can be added into the PEM from an external
source or by the CU.
• When executing a instruction all the PE doesn't have
to work ,only the enabled PE have to work. For
enabling and disabling a PE during the execution of a
instruction we can used several masking schemes.
A
S
B
R
ALU
PEi
X
D
0
1
2043
PMEMi
PEi-1
PEi+1
PEi-8
PEi+8
• A PE consists of the following:
• 64 bit regs
• A: Accumulator
• B: 2nd operand for binary ops
• R: Routing – Inter-PE Communication
• S: Status Register
• X: Index for PMEM 16bits
• D: mode 8bits
• Communication:
– PMEM only from local PE
– Amongst PE with R
M. S. Ramaiah School of Advanced Studies 14
• IN: All communication between PE’s are done by the interconnection
network. It does all the routing and manipulation function . This
interconnection network is under the control of CU.
• Host Computer: The array processor is interfaced to the host controller
using host computer. The host computer does the resource
management and peripheral and I/O supervisions.
SIMDArchitecture – Array Processor using RAM
Interconnection Network and Host Computer
M. S. Ramaiah School of Advanced Studies 15
SIMDArchitecture – Masking and data routing
organization
Ai Bi
Di Ii Ri
Xi
PEi
Si
ALU
PEMi
For i=0,1,2…,N-1
.
.
.
To other PE’s via
interconnected
network
To CU
M. S. Ramaiah School of Advanced Studies 16
• One PE is connected to another PE via its routing register R.
• When one PE is communicating with the other PE ,it is the contents of
the R register that is transferred.
• All the inputs and output goes through this register , the inputs and
outputs are isolated by master-slave-flip-flops.
• The D register is the address register and it stores the 8 bit address of
the PE.
• During a instruction cycle only the enabled PE will take the operand
send to them while the other PE will discard the operands send to
them. For an enabled PE the status register S =1 and for a masked PE
status register S =0 .
• A = accumulator, B= 2nd operand of binary operations,
SIMDArchitecture – Masking and data routing
organization
M. S. Ramaiah School of Advanced Studies 17
SIMDArchitecture – Associative processor using
content accessible memory
Host
Compute
r
• In this configuration PE does not
have private memory. Memories
attached to PE are replaced by
parallel memory modules shared to
all PE via an alignment network.
• Alignment network does path
switching between PE and parallel
memory.
• The PE to PE communication is
also via alignment network .
• The alignment network is
controlled by the CU.
• The number of PE (N) and the number of memory modules (K)
may not be equal , in fact they are chosen to be prime to each
other.
• An alignment network should allow conflict free access of shared
memories by as many PEs as possible.
AlignmentNetwork
M. S. Ramaiah School of Advanced Studies 18
• In this configuration the attached array processor has an input output
interface to common processor and another interface with a local
memory.
• The local memory connects to the main memory with the help of a
high speed memory bus.
Attached Array Processor
M. S. Ramaiah School of Advanced Studies 19
Performance and Scalability of array processor
M. S. Ramaiah School of Advanced Studies 20
• The principal reason for using the array processor is speed.
• The design of most array processors optimizes its performance for
repetitive arithmetic operations , making it much faster at the vector
arithmetic than the host CPU. Since most array processors operate
asynchronously from the host CPU, they constitute a co-processor
which increases the capacity of the system.
• The second advantage is that AP consists of its own local memory. On
systems with limited physical memory, or address space, this can be an
important consideration.
Why use the array processor?
M. S. Ramaiah School of Advanced Studies 21
• The AP (array processor) is most efficient in doing repetitive
operations such as doing FFT’s and multiplying large vectors. Its
efficiency degrades for non repetitive operations, or operations
requiring a great number of decisions based on the results of
computations.
• Since the AP’s have their own program and data memory, the AP
instruction and data must be transferred to , and the results transferred
from the AP. These I/O operations may cost more CPU time than the
amount saved by using the array processor.
• As a general rule , use of AP is most efficient than the CPU when
multiple or complex (such as FFT) operations, which are highly
repetitious, are going to be done on relatively large amount of data (
thousands of words or more.). In other cases use of AP will not help
much and will keep other processes from using valuable resource.
When to use and not to use the array processor?
M. S. Ramaiah School of Advanced Studies 22
Conclusion
• Though array processor can improve the performance but all problems can not
be attacked with this sort of solution. Instructions of array processor to process
an array of data at a time necessarily adds complexity to the core CPU. That
complexity typically makes other instructions run .The more complex
instructions also add to the complexity of the decoders, which might slow
down the decoding of the more common instructions such as normal adding.
• So the array processors work best only when there are large amounts of data to
be worked on. For this reason, these sorts of CPUs were found primarily
in supercomputers, as the supercomputers themselves were, in general, found
in places such as weather prediction centres and physics labs, where huge
amounts of data are "crunched".
• This architecture relies on the fact that the data sets are all acting on a single
instruction. However if these data sets somewhat rely on each other then you
cannot apply parallel processing. For example if data A has to be processed
before data B then you cannot do both A and B simultaneously. This
dependency is what makes parallel processing difficult to implement and it is
why sequential machines are extremely common.
M. S. Ramaiah School of Advanced Studies 23
References
[1] Array or Vector Processing [Online] Available From: http://www.teach-
ict.com/as_as_computing/ocr/H447/F453/3_3_3/parallel_processors/miniweb/
pg3.htm# (Accessed:05 December 2012)
[2] Hennessy J. and Patterson D. (2007) Computer Architecture: A Quantitative
Approach, 4th edition, Morgan Kaufmann.
[3] Martin,J.(November 2011) Array processors- SIMD computer organisations
[Online] Available from :http://www.martinjacob.info/2011/11/17/array-
processors-simd-computer-organizations/ (Accessed:05 December 2012)
[4] Schaum.(2009)Theory and Problems of Computer Architecture, Indian special
edition,McGraw-Hill Companies Inc.

More Related Content

What's hot

instruction cycle ppt
instruction cycle pptinstruction cycle ppt
instruction cycle ppt
sheetal singh
 

What's hot (20)

Superscalar processor
Superscalar processorSuperscalar processor
Superscalar processor
 
pipelining
pipeliningpipelining
pipelining
 
Instruction cycle
Instruction cycleInstruction cycle
Instruction cycle
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUB
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
General register organization (computer organization)
General register organization  (computer organization)General register organization  (computer organization)
General register organization (computer organization)
 
Context switching
Context switchingContext switching
Context switching
 
Instruction Formats
Instruction FormatsInstruction Formats
Instruction Formats
 
instruction cycle ppt
instruction cycle pptinstruction cycle ppt
instruction cycle ppt
 
DMA and DMA controller
DMA and DMA controllerDMA and DMA controller
DMA and DMA controller
 
Computer architecture multi processor
Computer architecture multi processorComputer architecture multi processor
Computer architecture multi processor
 
Lect 2 ARM processor architecture
Lect 2 ARM processor architectureLect 2 ARM processor architecture
Lect 2 ARM processor architecture
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
 
Instruction Cycle in Computer Organization.pptx
Instruction Cycle in Computer Organization.pptxInstruction Cycle in Computer Organization.pptx
Instruction Cycle in Computer Organization.pptx
 
Unit 3-pipelining & vector processing
Unit 3-pipelining & vector processingUnit 3-pipelining & vector processing
Unit 3-pipelining & vector processing
 
Demand paging
Demand pagingDemand paging
Demand paging
 
Instruction Execution Cycle
Instruction Execution CycleInstruction Execution Cycle
Instruction Execution Cycle
 
Paging and segmentation
Paging and segmentationPaging and segmentation
Paging and segmentation
 
Computer organization memory
Computer organization memoryComputer organization memory
Computer organization memory
 
ARM Exception and interrupts
ARM Exception and interrupts ARM Exception and interrupts
ARM Exception and interrupts
 

Similar to Array Processor

4.1 Introduction 145• In this section, we first take a gander at a.pdf
4.1 Introduction 145• In this section, we first take a gander at a.pdf4.1 Introduction 145• In this section, we first take a gander at a.pdf
4.1 Introduction 145• In this section, we first take a gander at a.pdf
arpowersarps
 
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptxassignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
23mu36
 
Basic operational concepts.ppt
Basic operational concepts.pptBasic operational concepts.ppt
Basic operational concepts.ppt
ssuser586772
 

Similar to Array Processor (20)

Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
 
CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptx
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
 
Overview of HPC.pptx
Overview of HPC.pptxOverview of HPC.pptx
Overview of HPC.pptx
 
Aca module 1
Aca module 1Aca module 1
Aca module 1
 
Computer system Architecture. This PPT is based on computer system
Computer system Architecture. This PPT is based on computer systemComputer system Architecture. This PPT is based on computer system
Computer system Architecture. This PPT is based on computer system
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
4.1 Introduction 145• In this section, we first take a gander at a.pdf
4.1 Introduction 145• In this section, we first take a gander at a.pdf4.1 Introduction 145• In this section, we first take a gander at a.pdf
4.1 Introduction 145• In this section, we first take a gander at a.pdf
 
Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
 
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptxassignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
 
Co notes3 sem
Co notes3 semCo notes3 sem
Co notes3 sem
 
Co module 1 2019 20-converted
Co module 1 2019 20-convertedCo module 1 2019 20-converted
Co module 1 2019 20-converted
 
Hpc 4 5
Hpc 4 5Hpc 4 5
Hpc 4 5
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processors
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
 
Basic operational concepts.ppt
Basic operational concepts.pptBasic operational concepts.ppt
Basic operational concepts.ppt
 
Advanced processor principles
Advanced processor principlesAdvanced processor principles
Advanced processor principles
 

More from Anshuman Biswal

Ir da in_linux_presentation
Ir da in_linux_presentationIr da in_linux_presentation
Ir da in_linux_presentation
Anshuman Biswal
 
Message Signaled Interrupts
Message Signaled InterruptsMessage Signaled Interrupts
Message Signaled Interrupts
Anshuman Biswal
 
Bangalore gayatri pariwar gayatri ashwamedha mahayagya
Bangalore gayatri pariwar gayatri ashwamedha mahayagyaBangalore gayatri pariwar gayatri ashwamedha mahayagya
Bangalore gayatri pariwar gayatri ashwamedha mahayagya
Anshuman Biswal
 
Six Sigma and/For Software Engineering
Six Sigma and/For Software EngineeringSix Sigma and/For Software Engineering
Six Sigma and/For Software Engineering
Anshuman Biswal
 
Fast web development using groovy on grails
Fast web development using groovy on grailsFast web development using groovy on grails
Fast web development using groovy on grails
Anshuman Biswal
 

More from Anshuman Biswal (13)

भक्ति वृक्षा – CHAPTER 1 (1).pptx
भक्ति वृक्षा – CHAPTER 1 (1).pptxभक्ति वृक्षा – CHAPTER 1 (1).pptx
भक्ति वृक्षा – CHAPTER 1 (1).pptx
 
Wireless Networking Security
Wireless Networking SecurityWireless Networking Security
Wireless Networking Security
 
Pervasive Computing
Pervasive ComputingPervasive Computing
Pervasive Computing
 
Observer Pattern
Observer PatternObserver Pattern
Observer Pattern
 
Undecidabality
UndecidabalityUndecidabality
Undecidabality
 
Turing Machine
Turing MachineTuring Machine
Turing Machine
 
Ir da in_linux_presentation
Ir da in_linux_presentationIr da in_linux_presentation
Ir da in_linux_presentation
 
Message Signaled Interrupts
Message Signaled InterruptsMessage Signaled Interrupts
Message Signaled Interrupts
 
Bangalore gayatri pariwar gayatri ashwamedha mahayagya
Bangalore gayatri pariwar gayatri ashwamedha mahayagyaBangalore gayatri pariwar gayatri ashwamedha mahayagya
Bangalore gayatri pariwar gayatri ashwamedha mahayagya
 
Six Sigma and/For Software Engineering
Six Sigma and/For Software EngineeringSix Sigma and/For Software Engineering
Six Sigma and/For Software Engineering
 
SNMP
SNMPSNMP
SNMP
 
Fibonacci Heap
Fibonacci HeapFibonacci Heap
Fibonacci Heap
 
Fast web development using groovy on grails
Fast web development using groovy on grailsFast web development using groovy on grails
Fast web development using groovy on grails
 

Recently uploaded

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Recently uploaded (20)

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Array Processor

  • 1. M. S. Ramaiah School of Advanced Studies 1 CSN2502 ACA Presentation Anshuman Biswal PT 2012 Batch, Reg. No.: CJB0412001 M. Sc. (Engg.) in Computer Science and Networking Module Leader: Padma Priya Dharishini P. Module Name: Advanced Computer Architecture Module Code : CSN2502 Array Processor
  • 2. M. S. Ramaiah School of Advanced Studies 2 Marking Head Maximum Score Technical Content 10 Grasp and Understanding 10 Delivery – Technical and General Aspects 10 Handling Questions 10 Total 40
  • 3. M. S. Ramaiah School of Advanced Studies 3 Presentation Outline • History of Array Processor • Array Processor • How Array Processor can help? • Array Processor classification • Array Processor architecture • Performance and scalability of array processor • Why use the array processor? • When to use and when not to use the array processor?
  • 4. M. S. Ramaiah School of Advanced Studies 4 History of Array processor • Array processor development began in early 1960’s at Westinghouse in their Solomon project. • Solomon’s goal was to improve the math performance by using large number of math co-processors under the control of a single CPU. • The CPU fed a single common instruction to all of the arithmetic logic units (ALUs), one per "cycle", but with a different data point for each one to work on. • This allowed the Solomon machine to apply a single algorithm to a large data set, fed in the form of an array. • In 1962, Westinghouse cancelled the project, but the effort was restarted at the University of Illinois as the ILLIAC IV. • In 1972 , it was finally delivered to the world and till 1990’s it formed the basic design of the fastest machine.
  • 5. M. S. Ramaiah School of Advanced Studies 5 Array Processor • Array processor is a synchronous parallel computer with multiple ALU called processing elements ( PE) that can operate in parallel in lock step fashion. • It is composed of N identical PE under the control of a single control unit and a number of memory modules. • Array processors also frequently use a form of parallel computation called pipelining where an operation is divided into smaller steps and the steps are performed simultaneously. • It can greatly improve performance on certain workloads mainly in numerical simulation. • These machines appeared in the 1970’s and dominated supercomputer design through the 1970s into the 90s, notably the various Cray platforms. • The rapid rise in the price-to-performance ratio of conventional microprocessor designs led to the vector supercomputer's demise in the later 1990s.
  • 6. M. S. Ramaiah School of Advanced Studies 6 How array processor can help? • In general terms, CPUs are able to manipulate one or two pieces of data at a time. For instance, most CPUs have an instruction that essentially says "add A to B and put the result in C". The data for A, B and C could be—in theory at least—encoded directly into the instruction. However, in efficient implementation things are rarely that simple. The data is rarely sent in raw form, and is instead "pointed to" by passing in an address to a memory location that holds the data. Decoding this address and getting the data out of the memory takes some time. • In order to reduce the amount of time this takes, most modern CPUs use a technique known as instruction pipelining in which the instructions pass through several sub-units in turn. • Array processors take this concept one step further. Instead of pipelining just the instructions, they also pipeline the data itself. This allows for significant savings in decoding time.
  • 7. M. S. Ramaiah School of Advanced Studies 7 How Array Processor can help?: An Example • Consider the simple task of adding two groups of 10 numbers together. In a normal programming language you might have done something as • execute this loop 10 times • read the next instruction and decode it • fetch this number fetch that number • add them • put the result here • End loop • But to an array processor this tasks looks as • read instruction and decode it • fetch these 10 numbers • fetch those 10 numbers • add them • put the results here
  • 8. M. S. Ramaiah School of Advanced Studies 8 How Array Processor can help? • There are several savings inherent in this approach.(Based on the example in previous slide) A. Only two address translations are needed B. Fetching and decoding the instruction is done only one time instead of ten times C. The code itself is also smaller, which can lead to more efficient memory use. D. It improve performance by avoiding stalls.
  • 9. M. S. Ramaiah School of Advanced Studies 9 Array Processor Classification • SIMD ( Single Instruction Multiple Data ): is an array processor that has a single instruction multiple data organization.  It manipulates vector instructions by means of multiple functional unit responding to a common instruction.  ILLIAC-IV, CM -2( Connection Machine ),MP-1(MasPar-1), BSP (Bulk Synchronous Parallel ) • Attached array processor: is an auxiliary processor attached to a general purpose computer.  Its intent is to improve the performance of the host computer in specific numeric calculation tasks.
  • 10. M. S. Ramaiah School of Advanced Studies 10 Array Processor Architecture - SIMD • SIMD has two basic configuration – a. Array processors using RAM also known as ( Dedicated memory organization ) • ILLIAC-IV, CM-2,MP-1 – b. Associative processor using content accessible memory also known as ( Global Memory Organization) • BSP
  • 11. M. S. Ramaiah School of Advanced Studies 11 SIMDArchitecture – Array Processor using RAM Host Computer •Here we have a Control Unit and multiple synchronized PE. •The control unit controls all the PE below it . •Control unit decode all the instructions given to it and decides where the decoded instruction should be executed. •The vector instructions are broadcasted to all the PE. This broad casting is to get spatial parallelism through duplicate PE. •The scalar instructions are executed directly inside the CU.
  • 12. M. S. Ramaiah School of Advanced Studies 12 SIMDArchitecture – Array Processor using RAM Control Unit • A simple CPU • Can execute instructions w/o PE intervention • Coordinates all PE’s • 64 64b registers, D0-D63 • 4 64b Accumulators A0-A3 • Ops: – Integer ops – Shifts – Boolean – Loop control – Index Memory D0 D63 A0 A3 A1 A2 ALU CU
  • 13. M. S. Ramaiah School of Advanced Studies 13 SIMDArchitecture – Array Processor using RAM Processing Element • A PE consists of an ALU with working registers and a local memory PMEMi which is used to store distributed data. • All PE do the same function synchronously under the super vision of CU in a lock-step fashion. • Before execution in a PE the vector instructions should be loaded into its PMEM . • Data can be added into the PEM from an external source or by the CU. • When executing a instruction all the PE doesn't have to work ,only the enabled PE have to work. For enabling and disabling a PE during the execution of a instruction we can used several masking schemes. A S B R ALU PEi X D 0 1 2043 PMEMi PEi-1 PEi+1 PEi-8 PEi+8 • A PE consists of the following: • 64 bit regs • A: Accumulator • B: 2nd operand for binary ops • R: Routing – Inter-PE Communication • S: Status Register • X: Index for PMEM 16bits • D: mode 8bits • Communication: – PMEM only from local PE – Amongst PE with R
  • 14. M. S. Ramaiah School of Advanced Studies 14 • IN: All communication between PE’s are done by the interconnection network. It does all the routing and manipulation function . This interconnection network is under the control of CU. • Host Computer: The array processor is interfaced to the host controller using host computer. The host computer does the resource management and peripheral and I/O supervisions. SIMDArchitecture – Array Processor using RAM Interconnection Network and Host Computer
  • 15. M. S. Ramaiah School of Advanced Studies 15 SIMDArchitecture – Masking and data routing organization Ai Bi Di Ii Ri Xi PEi Si ALU PEMi For i=0,1,2…,N-1 . . . To other PE’s via interconnected network To CU
  • 16. M. S. Ramaiah School of Advanced Studies 16 • One PE is connected to another PE via its routing register R. • When one PE is communicating with the other PE ,it is the contents of the R register that is transferred. • All the inputs and output goes through this register , the inputs and outputs are isolated by master-slave-flip-flops. • The D register is the address register and it stores the 8 bit address of the PE. • During a instruction cycle only the enabled PE will take the operand send to them while the other PE will discard the operands send to them. For an enabled PE the status register S =1 and for a masked PE status register S =0 . • A = accumulator, B= 2nd operand of binary operations, SIMDArchitecture – Masking and data routing organization
  • 17. M. S. Ramaiah School of Advanced Studies 17 SIMDArchitecture – Associative processor using content accessible memory Host Compute r • In this configuration PE does not have private memory. Memories attached to PE are replaced by parallel memory modules shared to all PE via an alignment network. • Alignment network does path switching between PE and parallel memory. • The PE to PE communication is also via alignment network . • The alignment network is controlled by the CU. • The number of PE (N) and the number of memory modules (K) may not be equal , in fact they are chosen to be prime to each other. • An alignment network should allow conflict free access of shared memories by as many PEs as possible. AlignmentNetwork
  • 18. M. S. Ramaiah School of Advanced Studies 18 • In this configuration the attached array processor has an input output interface to common processor and another interface with a local memory. • The local memory connects to the main memory with the help of a high speed memory bus. Attached Array Processor
  • 19. M. S. Ramaiah School of Advanced Studies 19 Performance and Scalability of array processor
  • 20. M. S. Ramaiah School of Advanced Studies 20 • The principal reason for using the array processor is speed. • The design of most array processors optimizes its performance for repetitive arithmetic operations , making it much faster at the vector arithmetic than the host CPU. Since most array processors operate asynchronously from the host CPU, they constitute a co-processor which increases the capacity of the system. • The second advantage is that AP consists of its own local memory. On systems with limited physical memory, or address space, this can be an important consideration. Why use the array processor?
  • 21. M. S. Ramaiah School of Advanced Studies 21 • The AP (array processor) is most efficient in doing repetitive operations such as doing FFT’s and multiplying large vectors. Its efficiency degrades for non repetitive operations, or operations requiring a great number of decisions based on the results of computations. • Since the AP’s have their own program and data memory, the AP instruction and data must be transferred to , and the results transferred from the AP. These I/O operations may cost more CPU time than the amount saved by using the array processor. • As a general rule , use of AP is most efficient than the CPU when multiple or complex (such as FFT) operations, which are highly repetitious, are going to be done on relatively large amount of data ( thousands of words or more.). In other cases use of AP will not help much and will keep other processes from using valuable resource. When to use and not to use the array processor?
  • 22. M. S. Ramaiah School of Advanced Studies 22 Conclusion • Though array processor can improve the performance but all problems can not be attacked with this sort of solution. Instructions of array processor to process an array of data at a time necessarily adds complexity to the core CPU. That complexity typically makes other instructions run .The more complex instructions also add to the complexity of the decoders, which might slow down the decoding of the more common instructions such as normal adding. • So the array processors work best only when there are large amounts of data to be worked on. For this reason, these sorts of CPUs were found primarily in supercomputers, as the supercomputers themselves were, in general, found in places such as weather prediction centres and physics labs, where huge amounts of data are "crunched". • This architecture relies on the fact that the data sets are all acting on a single instruction. However if these data sets somewhat rely on each other then you cannot apply parallel processing. For example if data A has to be processed before data B then you cannot do both A and B simultaneously. This dependency is what makes parallel processing difficult to implement and it is why sequential machines are extremely common.
  • 23. M. S. Ramaiah School of Advanced Studies 23 References [1] Array or Vector Processing [Online] Available From: http://www.teach- ict.com/as_as_computing/ocr/H447/F453/3_3_3/parallel_processors/miniweb/ pg3.htm# (Accessed:05 December 2012) [2] Hennessy J. and Patterson D. (2007) Computer Architecture: A Quantitative Approach, 4th edition, Morgan Kaufmann. [3] Martin,J.(November 2011) Array processors- SIMD computer organisations [Online] Available from :http://www.martinjacob.info/2011/11/17/array- processors-simd-computer-organizations/ (Accessed:05 December 2012) [4] Schaum.(2009)Theory and Problems of Computer Architecture, Indian special edition,McGraw-Hill Companies Inc.