Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...
Unit7 & 8 performance analysis and optimization
1. Unit 7 & 8
Performance Analysis and Optimization
By
Leena Chandrashekar,
Assistant Professor, ECE Dept,
RNSIT, Bangalore
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 1
2. Performance or Efficiency Measures
• Means time, space, power, cost
• Depends on input data, hardware platform,
compiler, compiler options.
• Measure based on complexity, time and
power, memory, cost and weight.
• Development time, Ease of maintainance and
extensibility.
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 2
3. The System
• Hardware
oComputational and control elements
oCommunication system
oMemory
• Software
oAlgorithms and data Structures
oControl and Scheduling
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 3
4. Some Limitations
• Amdahl’s Law
Example
Consider a system with the following characteristics: The task
to be analyzed and improved currently executes in 100 time
units, and the goal is to reduce execution time in 80 time
units. The algorithm under consideration in the task uses 40
time units.
n=2; If execution speed is decreased by 20 time units ,
required result is met. Indicates the necessary requirement.
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 4
5. • Example
Consider a system with the following characteristics: The task
to be analyzed and improved currently executes in 100 time
units, ad the goal is to reduce execution time to 50 time units.
The algorithm to be improved uses 40 time units.
Simplifying n=-4. The algorithm speed will have to run in
negative time to meet the new specification. This is non-causal
system.
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 5
6. Complexity Analysis – A High-Level Measure
Intructions Operations
int total (int myArray[], int n) --- 2
{
int sum=0; ---1
int i =0; ---1
for (i=0;i<n;i++) ---2*n +1
{
sum= sum + myArray[i]; --- 3*n
}
return sum; ---1
}
Total = 5n+6 operations
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 6
7. • 5n+6; for given n, the no. of operations are
n=10 ; 56
n=100; 506
n=1000; 5006
n= 10,000; 50,006
Linear proportion to n; and final number is
decreasing.
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 7
8. The Methodology
1. Decompose the problem into a set of operations
2. Count the total number of such operations
3. Derive a formula, based on some parameter n that
is size of the problem
4. Use order of magnitudes estimation to assess
behavior
Most
Important
Slide
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 8
9. A Simple experiment
• Linear
• Quadratic
• Logarithmic
• Exponential
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 9
10. Asymptotic Complexity
• F(n)=5n+6
• The function grows asymptotically and referred to as
asymptotic complexity
• This is only an approximation as many other factors
need to be considered like operations requiring
varying amounts of time
• As n increases, concentrate on the highest order
term and drop the lower order term such as
6(constant term)
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 10
11. Comparing Algorithms
Based on
• Worst case performance(upper bound)
• Average case
• Best performance(lower bound)
F(N) = O(g(N)) – complexity function – Big-O notation
The complexity of an algorithm approaches a bound called order
of the bound.
If such a function is expressed as a function of the problem size N,
and that function is called g(N), then comparison can be written as
f(N)=O(g(N)).
If there is a constant c such that f(N)<cg(N) then f(N) is of order of
g(N).
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 11
13. Analyzing Code
• Constant Time Statements
int x,y; Declarations & Initializations
char myChar=‘a’;
x=y; Assignment
x=5*y+4*z; Arithmetic
A[j] Array Referencing
if(x<12) Conditional tests
Cursor = Head -> Next; Referencing/deferencing pointers.
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 13
14. Looping Constants
• For Loops, While Loops
• Determine number of iterations and number of steps
per iteration.
int sum=0; 1
for (int j=0;j<N;j++) 3*N
sum=sum+j; 1*N
Total time for loop = 4 steps=O(1) steps per iteration.
Total time is N.O(1)= O(N.1)=O(N) complexity of the
loop is a constant.
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 14
15. While Loop
bool done=false;
int result=1;
int n;
While(!done)
{
result=result*n; ----1(multiply)+1(assignment)
n-; -----1(decrement)
if(n<=1)
done=true;
}
Total time is N.O(1)=O(N)
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 15
16. Sequences of Statements
int j,k,sum=0;
for (j=0;j<N;j++)
for(k=0;K<j;k++)
sum=sum+k*j;
for(i=0;i<N;i++)
sum=sum+i;
The complexity is given by
Total time is N3+N=O(N3)
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 16
17. Conditional Statements
if(condition)
{
statement1; ----- O(n2)
else
statement2; ----- O(n)
}
Consider worst case complexity/maximum running time.
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 17
18. Function Calls
• Cost = the call+ passing the arguments+ executing
the function/=returning a value.
• Making and returning from call – O(1)
• Passing arguments – depends on how it is passed –
passed by value/reference
• Cost of execution – body of function
• Determining cost of return – values returned
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 18
20. Analyzing Data Structures
• Insert/delete at the beginning
• Insert/delete at the end
• Insert/delete in the middle
• Access at the beginning, the end and in the
middle.
• Each has a complexity function of O(N)
Array
Linked List
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 20
21. Instructions in Detail
• Addressing Mode
• Flow of control – Sequential
Branch
Loop
Function Call
• Analyzing the flow of control – Assembly and C language
• Example
ld r0,#0AAh --- 400ns
push r0 ---600ns
add r0,r1 ----400ns
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 21
22. Co-routine
• A co-routine is a special kind of procedure call in
which there is a mutual call exchange between
cooperating procedures – 2 procedures sharing time.
• Similar to procedure and time budget.
• Procedures execute till the end whereas co-routine
exit and return throughout the body of the
procedure.
• The control procedure starts the process. Each
context switch is determined by any of the of the
following – Control procedure, External event – a
timing signal, internal event – a data value.
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 22
23. • The process continues until both procedures are
completed.
• It is time burdened and for faster response
preemption must be used.
Control Procedure
Procedure2 Procedure3
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 23
24. Interrupt call
Interrupt
Handler
Foreground
Task
ISR
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 24
25. Time Metrics
• Response Time
• Execution time
• Throughput
• Time loading – percentage of time that CPU is
doing useful work.
• Memory loading – percentage of usable
memory.
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 25
26. Response Time
• Time interval between the event and completion of
associated action
• Ex – A/D command and acquisition
• Polled Loops – The response time consists of 3
components
Hardware delays in external device to set the
signaling event
Time to test the flag
Time needed to respond to and process the
event associated with the flag.
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 26
27. External Hardware Device Delay
• Two Cases considered
a) Case 1 - The response through external system to
prior internal event
b) Case 2- An asynchronous external event
Internal Event
Casual System Responding
System
Response from External system
Delay through
External System
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 27
28. • Time to get the polling loop from the internal causal event
• The delay through an external device
• The time to generate the response
• Flag time - Determined from the execution time of the
machine's bit test instruction
• Processing time – time to perform the task associated with
triggering event
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 28
29. Case 2 Asynchronous Event from External Device
• The occurrence of event cannot be determined.
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 29
31. Interrupt Driven Environment
• Context switch to interrupt handler
• To acknowledge the interrupt
• Context switch to processing routine Context switch
back to original routine
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 31
32. Preemptive Schedule
• Context Switch
• Task Execution
• Interrupt latency – Highest Priority
Lowest Priority
Case 1 Highest Priority – 3 Factors
• The time from the leading edge of the interrupt in the
external device until that edge is recognized inside the
system.
• The time to complete the current instruction if interrupts are
enabled. Most processors complete the current instruction
before switching context. Some permit an interrupt to be
recognized at the micro instruction level. Thus the time is
going to be bounded by the longest instruction.
• The time to complete the current task if interrupts are
disabled. This time will be bounded by the task size.
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 32
33. Case 2 Low Priority Task
• 2 Cases
First, the interrupt occurs and is processed.
Second, the interrupt occurs and is interrupted. Unless
interrupts are disabled, the situation is non-deterministic.
In critical cases, one may have to change
the priority or place limits on the number of
preemptions.
• Non-Preemptive Schedule
Since preemption is not allowed, times are computed as
in highest priority case.
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 33
34. Time Loading
• Is percentage of time that the CPU is doing useful
work – execution of tasks assigned to embedded
system
• The time loading is measured in terms of execution
times of primary and secondary(supported) tasks.
• Time loading = primary/primary+secondary
• To compute the time, 3 methods are used
Instruction counting
Simulation
Physical measurement
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 34
35. Instruction Counting
• For periodic systems, the total execution time is
computed and then divided by time for the individual
module
• For sporadic systems, the maximum task execution
rates are used, and the percentages are combined
over all of the tasks.
• Effective instruction counting requires understanding
of basic flow of control through a piece of software.
Altering the flow involves context switch
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 35
36. Simulation
• Complete understanding of the system and accurate
workload, accurate model of system
• Model can include hardware or software or both
• Tools like Verilog or VHDL is used for hardware
modeling
• System C or a variety of software languages can be
used for software modeling
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 36
37. Model
• 2 major categories of models are behavior or conceptual and
structural or analytic
• Behavioral – symbols for qualitative aspects
• Structural – mathematical or logical relations to represent the
behavior
System-level model
Functional model
Physical model
Structural model
Behavioral model
Data model
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 37
38. Timers
• Timers can be associated with various buses
or pieces of code in the system
• Start timer at beginning of the code and end
timer at end of code
• For determining the timing of blocks
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 38
39. Instrumentation
• Numerous instruments – logic analyzer, code
analyzer
• Maximum and minimum times, time loops, identify
non executed code, capture the rates of execution,
frequently used code
• Limitation – there are like input to the system, not
good for typical and boundary conditions
• They are not predictive – don’t guarantee
performance under all circumstance
• Provide significant information
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 39
40. Memory Loading
• Most devices come with large
memory
• But amount of memory may be
reduced to save weight
(aircraft/spacecraft)
• Memory loading is defined as
percentage of usable memory for
a application
• Memory map – useful in
understanding the allocation and
use of available memory
Memory
mapped I/O
and DMA
Firmware
RAM
Stack Space
System
Memory
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 A Memory Map 40
41. • The total memory loading will be sum of individual
loadings for instructions, stack and RAM
• The values of Mi reflect memory loading for each
portion of memory
• Pi represent the percentage of total memory
allocated for program
• MT is represented as percentage
• Memory mapped I/O and DMA are not included in
the calculation, these are fixed by hardware design
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 41
42. Example
• Let the system be implemented as follows
Mi=15Mb;MR=100Kb;MS=150Kb
PT=55%;PR=33%;PS=10%
Find value of MT
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 42
43. Designing a Memory Map
• Allocate minimum amount of memory necessary for
the instructions and the stack
• The firmware contains the program that implements
the application
• Memory loading is computed by dividing the number
of user locations by the maximum allowable
• Ram area – global variables, registers
• Ram improves the instruction fetch speed
• Size of Ram area is decided at design time
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 43
44. Stack Area
• Stores context information and auto variables
• Multiple stacks depending on design
• Capacity – design time
• Maximum stack size can be computed using
• US=Smax*Tmax
• Memory loading
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 44
45. Evaluating Performance
• Depends on information
• Exact times if computable
• Measurement technique
Criterion Analytic
method
Simulation Measurement
Stage Any Any Post prototype
Time Required SSmmaallll MMeeddiiuumm VVaarriieess
Tools Analysis Computer languages Instrumentation
Accuracy Low Moderate Varies
Trade-off Evaluation Easy Moderate Difficult
Cost Small Medium High
Scalability Low Medium High
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 45
46. Early Stages
• The model should be hierarchical. Complex system
can be modeled by decomposing it to simpler parts.
Progressive refinement, abstraction, reuse of existing
components
• The model should express concurrent and temporal
interdependencies among physical and modeled
elements. Understand dynamic performance and
interaction between other elements
• Model should be graphical; not necessary
• Permit worst case and scenario analysis, boundary
condition
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 46
47. Mid Stages
• Real components of design
• Prototype modules and integrate them into
subsystems
Later Stages
• Integrate into larger system
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 47
48. Performance Optimization
• What is being optimized ?
• Why is it being optimized?
• What is the effect on overall system?
• Is optimization appropriate operating context?
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 48
49. Common Mistakes
• Expecting improvement in one aspect of the design
to improve overall performance proportional to
improvement
• Using hardware independent metrics to predict
performance
• Using peak performance
• Comparing performance based on couple of metrics
• Using synthetic benchmarks
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 49
50. Tricks of the Trade
Response times and time loading can be reduced in
number of ways
1. Perform measurements and computations at a rate
of change and values of the data, type of data,
number of significant digits and operations
2. Use of look up tables or combinational logic
3. Modification of certain operations to reduce certain
parameters
4. Learn from compiler experts
5. Loop management
6. Flow of control optimization
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 50
51. Tricks of the Trade
7. Use registers and caches
8. Use of only necessary values
9. Optimize a common path of frequently used
code block
10.Use page mode accesses
11.Know when to use recursion vs. iteration
12.Macros and Inlining functions
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 51
52. Hardware Accelerators
• One technique to improve the performance of software
implementation is to move some functionality to hardware
• Such a collection of components is called hardware
accelerators
• Often attached to CPU bus
• Communication with CPU is accomplished by – shared
variables, shared memory
• An accelerator is distinguished from coprocessor
• The accelerator does not execute instructions; its interface
appears as I/O
• Designed to perform a specific operation and is generally
implemented as an ASIC,FPGA, CPLD
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 52
53. Hardware Accelerators
• Hardware accelerators are used when there are
functions whose operations do not map onto the
CPU
• Examples – bit and bit field operations, differing
precisions, high speed arithmetic, FFT calculations,
high speed/demand input output operations,
streaming applications
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 53
54. Optimizing for Power Consumption
• Safe mode, low power mode, sleep mode
• Advanced Configuration and power interface (ACPI)
is international standard
Software
Hardware
•Software
The algorithms used
Location of code
Use of software to control various subsystems
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 54
55. Techniques to measure power consumption
• Identify the portion of the code to be analyzed
• Measure the current power consumed by processor while
code is being executed
• Modify the loop, such that code comprising the loop is
disabled. Ensure compiler has not optimized the loop or
section of code out
• Measure current power consumed by processor
• Kind the instructions
• Collection or sequence of instructions executed
• Locations of the instructions and their operands
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 55
57. Relative Power consumption for Common Processor
Operation
Operation Relative Power
Consumption
16-Bit Add 1
16-Bit Multiply 3.6
8x128x16
4.4
SRAM Read
8x128x16
SRAM Write
9
I/O access 10
16-bit DRAM
33
Memory
transfer
Using cache have significant effect on system power consumption, SRAM consumes
more power than DRAM on per-cell basis and cache is generally SRAM. The size of
cache should be optimized.
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 57
58. Other Techniques
• Power aware compilers
• Use of registers effectively
• Look for Cache conflicts and eliminate if
possible
• Unroll loops
• Eliminate recursive procedures
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 58
59. Hardware Power Optimization Techniques
Power Management Schemes
• Best option to turn off system when not in use- power
consumption is limited to leakage-lower bound of
consumption- static power
• Upper bound – apply power to all parts of the system –
maximum value – dynamic power
• The goal is to find a mid power consumption value, governed
by specs
• ex – topographic mapping satellite
• Approaches
Decide which portion of system to power down
Decide components which have to shut down instantly
Recognize which components do not power up instantly
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 59
60. Basic for System power down-power up sequence
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 60
61. Predictive Shutdown
• The approaches discussed in previous slide is not possible
everywhere.
• Knowledge of current status and previous state must be
considered to shutdown the system – predictive shutdown
• Such a technique is used in branch prediction logic in
instruction prefetch pipeline
• This can lead to premature shutdown or restart
Timers
• Another technique is to use timers
• Timers monitor the system behavior and turn off when timer
expires
• Device turns on again based on demand
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 61
62. Producer, service, consumer
• Based on queuing theory
• Producer is the system which is to be powered on
• Consumer is part of a system which needs a service
• A power manager monitors behavior of system and utilizes a
schedule based on Markov modeling which maximizes system
computational performance satisfying power budget
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 62
63. Example
• The operating system is responsible for dynamically
controlling the power in a simple I/O subsystems
• The dynamically controlled portion supports two modes – OFF
and ON
• The dynamic subcomponents consume 10watts when on and
0 watts when off
• Switching takes 2 seconds and consumes 40joules to switch
from off state to on state and one second and 10joules to
switch from on to off
• The request has a period of 25 seconds
• Graphically 3 alternate schemes as illustrated
• Observe same average throughput with substantially reduced
power consumption
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 63
65. Advanced Configuration and Power interface (ACPI)
• ACPI is an industry standard power management scheme that
was initially applied to PC specifically Windows.
• This standard provides some basic power management
facilities as well as interfaces to the hardware
• The software more specifically operating systems provides
management module
• It is responsibility of OS to specify the power management
policy for the system
• The OS uses ACPI module to send the required controls to
hardware and to monitor the state of hardware as an input to
power manager
• The behavior of the ACPI scheme is expressed in the state
diagram
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 65
67. • The standard supports 5 global power states
1. G3- hard off or full off – defined as physically off
state – system consumes no power
2. G2- soft off requires full OS reboot to restore
system to full operational condition
3. G1- sleeping state – the system appears to be off.
The time required to return to an operational
condition is inversely proportional to power
consumption
4. G0 – working state in which the system is fully
usable
5. Legacy state – the system doesnot comply with ACPI
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 67
68. • Substates
1. S1- low wakeup latency – ensures no loss of
system context
2. S2- low wakeup latency state – has loss of
CPU and system cache state
3. S3- low wakeup latency state – all system
state except for main memory is lost
4. S4- lowest power sleeping state – all the
devices are off
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 68
69. Caches and Performance
• Based on locality of reference characteristics, small amounts
of high speed memory to hold a subset of instructions and
data for immediate use can be used
• Such a scheme gives the illusion that the program has
unlimited amounts of high speed memory
• The bulk of instructions and data are held in memory with
much longer cycle/access times than available in the system
CPU
• One major problem in real time embedded application is that
cache behavior is non deterministic
• It is difficult to predict when there will be a cache hit or miss
• It is difficult to set reasonable upper bounds on execution
times for tasks
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 69
70. Pipelining
• The problem is due to 2 sources – conditional branches and
shared access with preemption
• Conditional branches are handled with good branch
prediction algorithms, but cannot be solved completely
• The path taken and a successful cache access may vary with
iteration
• This is overcome with pipelined architectures
• Pipelining techniques are used to prefetch data and
instructions while other activities are taking place
• The selection of an alternate branch requires that the pipe be
flushed and refilled
• This may lead to cache miss and time delay
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 70
71. Preemption and multi tasking
• In a multi tasking interrupt context, one task may
preempt the other
• This requires different block of data/instructions that
will have significant number of cache misses as task
switch
• Similar situation arises during Von Neuman machine
– same memory for code and data
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 71
72. Shared Access
• Example – consider a direct mapping caching scheme
• If 1K cache with blocks of 64 words, such blocks from main
memory addresses 0,1024,2048 and so on
• Assume a following memory map
• Instructions are loaded starting at location 1024, and data is
loaded starting at location 8192. consider the simple code
fragment
for(i=0;i<10:i++)
{
a[i]= b[i]+4;
}
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 72
73. • On the first access, the instruction access will miss
and bring in the appropriate block from main
memory
• The instruction will execute and have to bring in data
• The data access will miss and bring in the
appropriate block from main memory
• Because block 0 is occupied, the data block will
overwrite the instructions in cache block 0
• On second access, the instruction access will again
miss and bring in the appropriate block from the
main memory
• The miss occurs because the instructions had been
over written by the incoming data
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 73
74. • The instruction will execute and have to bring the
data again. Because block 0 is again occupied, the
data block will over write block 0 again
• This process repeats causing serious degradation
• There is also a time burden of searching and
managing the cache
• The continuing main memory accesses can also
increase the power consumption of the system
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 74
75. Possible solutions
1. Use a set associative rather than direct
mapping scheme
2. Move to Harvard or Aiken Architecture
3. Support an instruction cache and data cache
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 75
76. Smart Memory Allocation for Real time (SMART)
• Cache is decomposed into restricted regions and
common portions
• A critical task is assigned a restricted portion on start
up
• All cache accesses are restricted to those partitions
and to common area
• The task retains exclusive rights to such areas until
terminated or aborted
• This remains an open problem and various heuristic
schemes have been explored and utilized
09-Nov-14 ECE Dept, RNSIT,VTU, Aug - Dec 2014 76