The document provides an overview of techniques for faster computation in MATLAB, including:
- Closing other programs and setting the CPU to high performance to improve performance.
- Preallocating variables to avoid recalculating memory requirements in loops.
- Using suppression with semicolons to avoid displaying variable outputs.
- Converting variables to more efficient data types like single precision.
- Implementing parallel processing on CPUs and GPUs for large datasets using techniques like vectorization and avoiding dependencies between iterations.
4. Performance
❑Performance is key when we find ourselves running code that takes
a long time to finish.
❑The following videos outline many facets of theory, but there are
some common factors we can eliminate before we delve into any
further material.
5. Performance
❑MATLAB® is best run with no other applications open. If your code
takes a long time to compile, consider closing other programs e.g.
browsers. These programs add a load to the CPU and use up memory
that could be used elsewhere to handle variables.
❑Under your computers power options make sure your CPU is not
set to a power saver profile, instead, set it to high performance to
get the best performance out of your CPU.
7. Memory Types
❑When we think of memory whilst discussing computers, we
typically think of hard drives, our standard storage devices.
❑Virtual memory is the memory that drives our computation. We
usually think of RAM (Random-Access Memory) when we think of
virtual memory but virtual memory can be a mix of both storage
memory from hard drives and RAM.
8. Memory Types
❑RAM is much faster and built for high speed
calculation/computation, storage memory is slower.
❑By adjusting the page file one can add storage memory to the pool
of virtual memory.
❑This can be thought of as adding slower RAM to your system
without having to buy any more.
❑We would typically do this if we needed to hold a large variable and
the amount of RAM available was not sufficient.
10. Preallocation
❑Memory preallocation is the act of setting aside memory for a
variable so that the memory that variable requires does not need to
be recalculated and allotted.
❑In iterative processes we want to have as few steps as possible.
❑By preallocating variables their sizes and memory requirements
stay constant.
❑This results in fewer steps that need to be computed on every
iteration.
❑This saves time and increases our performance, as we have
removed a potential bottleneck.
11. Preallocation
❑You are encouraged to run the code examples on the following
slides.
❑The preallocated code took 0.031262 seconds while the non-
preallocated code took roughly 15 times longer at 0.472885.
12. Non-Preallocated Variable X
tic
x = 0;
%% As the dimensions of x change with each iteration an extra
step to set aside memory takes place on each iteration.
for k = 2:1000000
x(1,k) = x(1,k-1) + 5;
end
toc
%%
13. Preallocated Variable X
tic
x =zeros(1,1000000);
%% As the dimensions of x remain constant, performance is higher
here as we only carry out steps we need to on each iteration.
for k = 2:1000000
x(1,k) = x(1,k-1) + 5;
end
toc
%%
15. Suppression
❑Suppression is the act of putting a semicolon next to variables that
we declare or calculate so that MATLAB® does not unnecessarily
display them in the Command Window.
16. Suppression
❑You are encouraged to run the code examples below.
❑The suppressed code calculates all the values for variable x and
took 0.0025 seconds.
❑The unsuppressed code calculates all the values for variable x and
took 9.6777 seconds.
❑Note that the unsuppressed code takes much longer.
❑Each time a value of x is calculated, because there is no semicolon
to suppress the statement, the value of x is displayed.
17. Suppressed Code
%%Note the semicolon suppressing x in the for loop
tic
x=zeros(1,1000);
for k = 2:1000
x(1,k) = x(1,(k-1)) + 5;
end
toc
%%
18. Unsuppressed Code
%%Note that x is unsuppressed, since there is no semicolon the
value of x will print out every time the loop runs
tic
x=zeros(1,1000);
for k = 2:1000
x(1,k) = x(1,(k-1)) + 5
end
toc
%%
20. Template
Name Bits
Used
Can Represent Non-
Integers
Can Represent Negative
Numbers
Double
Precision
64 Yes Yes
Single
precision
32 Yes Yes
Int64 64 No Yes
Int32 32 No Yes
Int16 16 No Yes
Int8 8 No Yes
Uint64 64 No No
Uint32 32 No No
Uint16 16 No No
Uint8 8 No No
21. Converting to Double and Single
%% Below we declare a variable x, the default bit precision is
Double
x=ones(1,100000);
%% Below is the code to change the bit precision of a variable to
Single.
x=single(x);
%to change from another bit precision to double use
x=double(x);
22. Converting to Int
x=ones(1,100000);
%% Below is the code to change the bit precision of a variable to Int64.
x=int64(x);
%% Below is the code to change the bit precision of a variable to Int32.
x=int32(x);
%% Below is the code to change the bit precision of a variable to Int16.
x=int16(x);
%% Below is the code to change the bit precision of a variable to Int8.
x=int8(x);
23. Convert to Uint
x=ones(1,100000);
%% Below is the code to change the bit precision of a variable to Uint64.
x=uint64(x);
%% Below is the code to change the bit precision of a variable to Uint32.
x=uint32(x);
%% Below is the code to change the bit precision of a variable to Uint16.
x=uint16(x);
%% Below is the code to change the bit precision of a variable to Uint8.
x=uint8(x);
25. Parallel Processing
❑You will have noticed now that For loops are computed in a sequential or serial fashion, the
first iteration in the loop is computed, then the second and so on.
❑Parallel processing allows multiple iterations to be computed simultaneously, with the aim of
reducing the time taken to compute a task or carry out a calculation.
26. Setting Up Parallel Processing
❑In the home tab, beneath the preference cog you will see the parallel dropdown box, click on
that, click manage cluster profile, edit the local(default) cluster profile, set the “Number of
workers to start on your local machine “NumWorkers” to 12, next scroll down and set “Range of
number of workers to run job “NumWorkersRange” to 12.
❑Click the Validate tick icon, this will let MATLAB® test your system to see if the settings above
can be used. Each worker is a parallel process, with the settings above we have set 12 parallel
processes to run on your local default cluster profile. If you have failed the validation test,
reduce the number of workers and the range down to a lower number like 4.
27. Setting Up Parallel Processing
❑You cannot carry out a parallel process if it depends on future or past iterations as each
iteration is carried out independently on a different thread with no knowledge of any of the
other iterations.
❑Each worker used in turn uses a lot of virtual memory, roughly 150-350MB each, if you run a
machine with little virtual memory available you will have to use fewer workers.
❑Although there are several caveats, imagination and creativity with your coding can overcome
these problems. Parallel computation is very fast at computing moving averages over larger
datasets when compared to serial computation.
28. Setting Up Parallel Processing
❑To initialise parallel processing we use the code: parpool local
❑Note that initialising parallel processing does not take a negligible amount of time.
❑Therein lies the crux of the problem. If your code computes faster than the initialisation time
don’t bother with parallel computing.
❑The benefit of parallel computing is seen when you are working on large datasets. This is where
serial computation really shows its inefficiency.
❑To shut down workers after computation we use the code: delete(gcp('nocreate')) Make sure
to shut down workers as this will free up virtual memory and reduce the load on your CPU.
30. Parallel Computation Implemented Incorrectly
% Note that the following code will fail as we cannot address a past or future indexed
iterations that computed within a parallel loop.
parpool local %This initialises the for loop and accesses the local profile, older
versions of MATLAB® may use matlabpool open local instead of parpool local.
x(1:10)=1;
parfor i =2:10
x(1,i)=x(1,(i-1))+1
end
delete(gcp('nocreate')) % This shuts down the workers after computation, older
versions of MATLAB® may use matlabpool close instead of delete(gcp('nocreate')).
31. Parallel Computation Implemented Correctly
%% Here on each iteration we calculate the moving average of c and store the value in x using a
window of 5 samples.
c(1:1000000)=rand;
c=single(c); % When using a large variable it is prudent to cut down on bit precision if possible to
save virtual memory.
parpool local
tic
parfor i =3:999998
x(1,i)=mean(c(i-2:i+2));
end
toc
delete(gcp('nocreate'))
%%
33. GPU Parallel Processing and Programming
❑A GPU or Graphics Processing Unit can carry out computation and store variables due to having
on-board memory.
❑The GPU was designed for a very specific task in mind, to render frames and carry out simple
calculations.
❑GPUs do not perform as well as CPUs when working on If and While statements.
❑However, they are incredibly fast at simple repetitive computations such as addition,
subtraction, division and multiplication.
❑This makes GPUs ideal for image processing and signal processing.
34. GPU Parallel Processing and Programming
❑We can compute variables that are on the CPU and GPU simultaneously with one another but
we are blind to modifications made to GPU based variables until we call them back into the CPU
driven workspace.
❑Sending information to and from your GPU generates an overhead, if managed poorly, it can
defeat any time saved or bottleneck things, becoming time consuming.
❑Once a variable is on the GPU you can use it like any other variable. There are a number
MATLAB® functions are optimised to accelerate computation with variables that are stored on a
GPU, refer to the MathWorks website for an updated list.
35. GPU Parallel Processing and Programming
❑To Parallel Process variables on a GPU simply setup a parfor loop where all the variables being
acted on are stored on the GPU.
❑The same rules apply as in parallel processing on a CPU, we can’t access past or future
iterations of variables that are defined inside the loop that rely on the loop index as the workers
work independent of one another on various iterations simultaneously.
36. GPU Code
%% X is a variable and G is that variable put on the GPU.
X = rand(10,'single');
G = gpuArray(X);
% To bring it back to the workspace and view your GPU variables
we “gather” it onto a new variable c.
c = gather(G) ;
%%