DSP Processor

2nd Half of UNIT 6 --- DSP Processor

BY
Prof L.S.Kalkonde
Department of Electronics & Telecommunication
Prof Ram Meghe College of Engineering & Technology,Badnera

Digital Signal Processor---- Definition

A digital signal processor (DSP) is an integrated
circuit designed for high-speed data manipulations,
and is used in
Audio
Communications
image manipulation
Other data-acquisition and
Data-control applications.

How Digital Signal Processing Works
1. To explain how digital signal processing works, you must
understand the difference between analog and digital signals.
2. Analog signals, which include sound intensity, pressure, light
intensity, etc., are continuously variable.
3. Each of our senses is sensitive to different kinds of analog
signals.
4. Our ears are sensitive to sound, our eyes are sensitive to light,
and so on.
5. Once we receive a signal, our sensory organs convert it to an
electrical signal and send it to our analog computer (the brain).

How Digital Signal Processing Works
6. Our brains are very powerful parallel computer whose
performance currently is unmatched by any digital computer.
7. Our brains not only analyze the information received, but also
make decisions using this data.

8. Digital signals are those that are transmitted within or
between computers, in which information is represented by
discrete states –

How Analog and Digital Signals Work Together

How Analog and Digital Signals Work Together
Digital technology such as personal computers (PCs), assist us in
many ways: writing documents, spell checking, and drawing.
Unfortunately, the world is analog, and electronic analog
computers are not as versatile as digital computers.

Therefore, in order to make use of the tremendous processing
power that digital technology offers us, we must do the following:
Convert the analog signals into electrical signals, using a transducer
(such as a microphone, as shown in the diagram).
· Digitize these signals (i.e., convert them from analog to digital using an
analog-to-digital converter (ADC)), as shown in the diagram.

Why Do We Need Digital Signal Processors?

Why Do We Need Digital Signal Processors?
Add and Subtract
Add and subtract operations are performed quite simply by general-purpose
microprocessors in a single or very few clock cycles. Digital addition is similar to
decimal add. Our example shows adding 1 plus 2. The result is the decimal 3.

Multiply and Divide
The multiply and divide operations are more complex. A digital multiply operation
consists of a series of shift and add operations. example shows a multiplication of 3 &
5. General-purpose microprocessors are quite slow in performing multiply and divide
operations. They will typically sequentially execute a series of shift, add, and subtract
operations from their microcode i.e.

to perform a single multiply operation, it may consume many cycles
to complete
The DSP performs multiplication in a single cycle by implementing all
shift and add operations in parallel.

What’s Inside DSP (Elements of DSP)

• Program Memory:
– Stores the programs the DSP will use to process data

• Data Memory:
– Stores the information to be processed

• Compute Engine:
– Performs the math processing, accessing the program from the Program Memory and
the data from the Data Memory

• Input / Output:
– Serves a range of functions to connect to the outside world

Types of Architecture

Harvard
Architecture
Von
Neumann
Architecture

Super/ Modified
Harvard Architecture

Von Neumann Architecture

Memory
Instruction
&
Data

Address Bus

CPU
Data Bus

Harvard Architecture
Address
Bus

Address Bus

Program
Memory

Data
Memory

CPU
Data Bus

Data Bus

Which Architecture is Best Suited for DSP?
1. Common general-purpose personal computers use processors designed with the
von Neuman architecture while the Harvard architecture is more commonly used in
specialized microprocessors for real-time and embedded applications.
2. DSPs typically use Harvard architecture, although von Neuman DSPs also exist.
3. Many signal and image processing applications require fast, real-time machines.
4. The drawback to using a true Harvard architecture is that since it uses separate
program and data memories, it needs twice as many address and data pins on the chip
and twice as much external memory. Unfortunately, as the number of pins or chips
increases, so does the price.

Which Architecture is Best Suited for DSP?
An elegant solution:
A single data and address bus is used externally.
Two (or more) separate buses for program and data are used internally.
Timing (multiplexing) handles the separation of program and data information.

In one clock cycle, the program information flows on the pins, and
In the second cycle, data follows on the same pins.
Program and data information is then routed onto separate internal program and
data buses. Such machines are called modified Harvard architecture processors
because
the internal architecture is Harvard
external architecture is von Neuman.
Also Multiple internal RAM/ROM cells for high-use instructions and data.

Fixed vs. Floating Point
Characteristic

Floating point
32-bit

Fixed point
16-bit

Dynamic range

much larger

smaller

Resolution

comparable

comparable

Ease of programming

comparable

comparable

Compiler efficiency

much easier

more difficult

Power consumption

more efficient

less efficient

Chip cost

comparable

comparable

System cost

comparable

comparable

Design cost

less

more

faster

slower

Time to market

TMS320 Family

16-Bit Fixed Point Devices

C5x
Voice Processing
C54x
Digital Cellular Phones

32-Bit Floating Point Devices

C6x Advanced VLIW
Processor
Wireless Base
Stations/Pooled
Modems

Features of TMS32054XX
•
•
•
•
•
•
•
•
•

16 bit CPU
Can execute 40 to 120 Million Instructions Per Second
17×17 bit MAC
64k × 16 bit physical program memory address space
64k × 16 bit external data memory address space
64k × 16 bit external IO address space
Programmable timer & PLL
DMA interface
100/128/144 TQFP & BGA packages

Functional Units
•
•
•
•
•
•
•
•
•

40 bit ALU
2- 40 bit accumulators ACCA & ACCB
Barrel shifter
17X17 bit multiplier
40 bit adder
CSSU-Compare, Select & store unit
Exponent Encoder
Data Address generation
Program & address generation unit

TMS32054XX
•Uses an advanced , Modified Harvard
architecture
•Maximizes processing power by providing

4 pairs
Bus Structure

3 Pairs

1 Pair

Data Memory

Program Memory

ALU
• 40 Bit ALU
• Wide range of Arithmetic & Logic Operation in
single clock cycle.
• After ALU operation destination of result
– Accumulator or
– Memory

Accumulators
• 40 bit ACCA & ACCB
• To store result for ALU & Multiply/Add.
• Temporary storage for other.

Barrel Shifter
• The barrel shifter can produce a left shift of 0 to 31
bits and a right shift of 0 to 16 bits on the input data.
• The shift requirements are defined in
– the shift count field of the instruction, the shift count field
(ASM) of status register ST1, or
– In the temporary register T.

Multiplier/Adder Unit
• The multiplier/adder block consists of several elements:
–
–
–
–

a multiplier, an adder, signed/unsigned input
control logic, fractional control logic,
zero detector, a rounder , overflow/saturation logic
and a 16-bit temporary storage register (T).

• The multiplier/adder unit performs 17 x 17-bit 2scomplement multiplication with a 40–bit addition in a
single instruction cycle

CSSU- Compare, Select, and Store Unit
• The compare, select, and store unit (CSSU) performs
maximum comparisons between
• the accumulator’s high and low word, allows both
the test/control flag bit (TC) in status register ST0 and
the transition register (TRN) to keep their transition
histories.

Exponent Encoder
• To implement floating point arithmetic in
fixed point processor require separation
of exponent & mantissa of the floating
point data.

Data Address Generation Unit
• 2 Auxiliary Register Arithmetic Units
ARAU0 & ARAU1 (Address Generation for
indirect addressing mode i.e. increment,
decrement, indexing, bit reverse ,circular
addressing )
• 8- AR0 to AR7 (To generate 2 data
memory address simult.)

JTAG-Joint Test Action Group
JTAG, as defined by the IEEE Std.-1149.1 standard.
An integrated method for testing interconnects on printed circuit
boards (PCBs) that are implemented at the integrated circuit (IC) level.

The JTAG test architecture provides a means to test interconnects
between integrated circuits on a board without using physical test
probes.
Potential benefits from JTAG
Shorter test times,
Higher test coverage,
Increased diagnostic capability and
Lower capital equipment cost.

Instruction Pipelining in TMS320C54X Processors
1.Program Pre fetch
PAB is loaded with the address next instruction to be fetched

2. Program Fetch
The op-code is fetched from PB & loaded into Instruction Register

3.Decode
The opcode is decoded to determiine access operation

Instruction Pipelining in TMS320C54X Processors

4.Access
Operand address is loaded on data DAB – Data Address Bus. If 2nd operand
is required , then another address is loaded into CAB

5. Read
The operands are read from the buses DB & CB

6.Execute
Perform the task specified by the instruction

Sr no

Parameter

DSP Processor

GPP Processor

1

Instruction Cycle

Single Cycle
( i.e., true instruction cycle)

Multiple instruction cycle for one
instruction

2

Instruction Execution

Parallel execution is possible

Always sequential execution is possible

3

Operand fetched from Multiple operands are fetch simultaneously

Operands are fetch sequentially

memory
4

present

On-chip/off-chip

Program memory and data memory are

Normally on-chip cache memory is present

present on-chip and expandable off-chip.

.Main memory is off-chip.

Address generation

Addresses are generated combinely by

Program counter is incremented

DAGs and program sequencer.
7

Normally no such separate memories are

memories
6

Separate program memory and data
memory

5

Memories

sequentially to generate addresses.

Address and data buses are not multiplexed. Address/data buses can be separate on the

multiplexing
8

Address/data bus

They are separate on chip as well as off chip. chip but usually multiplexed off-chip.

Computational units

Three separate computational units:

ALU is the main computational unit.

ALU,MAC and shifter.
9

Suitable for

Array processing operations

10

Queuing/Pipelining

Queuing is implemented through instruction Queuing is performed explicitly by queuing
register and instruction cache

Genral purpose processing

register for pipelining of instructions

DSP Processor

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DSP Processor

Similar to DSP Processor (20)

Recently uploaded

Recently uploaded (20)

DSP Processor