SlideShare a Scribd company logo
1 of 36
Download to read offline
Faster Computation with MATLAB
BY MUHAMMAD BILAL ALLI
Course Outline
❑Performance
❑Memory Types
❑Preallocation
❑Suppression
❑Bit Precision
❑CPU Parallel Processing and Programming
❑GPU Parallel Processing and Programming
Performance
Performance
❑Performance is key when we find ourselves running code that takes
a long time to finish.
❑The following videos outline many facets of theory, but there are
some common factors we can eliminate before we delve into any
further material.
Performance
❑MATLAB® is best run with no other applications open. If your code
takes a long time to compile, consider closing other programs e.g.
browsers. These programs add a load to the CPU and use up memory
that could be used elsewhere to handle variables.
❑Under your computers power options make sure your CPU is not
set to a power saver profile, instead, set it to high performance to
get the best performance out of your CPU.
Memory Types
Memory Types
❑When we think of memory whilst discussing computers, we
typically think of hard drives, our standard storage devices.
❑Virtual memory is the memory that drives our computation. We
usually think of RAM (Random-Access Memory) when we think of
virtual memory but virtual memory can be a mix of both storage
memory from hard drives and RAM.
Memory Types
❑RAM is much faster and built for high speed
calculation/computation, storage memory is slower.
❑By adjusting the page file one can add storage memory to the pool
of virtual memory.
❑This can be thought of as adding slower RAM to your system
without having to buy any more.
❑We would typically do this if we needed to hold a large variable and
the amount of RAM available was not sufficient.
Preallocation
Preallocation
❑Memory preallocation is the act of setting aside memory for a
variable so that the memory that variable requires does not need to
be recalculated and allotted.
❑In iterative processes we want to have as few steps as possible.
❑By preallocating variables their sizes and memory requirements
stay constant.
❑This results in fewer steps that need to be computed on every
iteration.
❑This saves time and increases our performance, as we have
removed a potential bottleneck.
Preallocation
❑You are encouraged to run the code examples on the following
slides.
❑The preallocated code took 0.031262 seconds while the non-
preallocated code took roughly 15 times longer at 0.472885.
Non-Preallocated Variable X
tic
x = 0;
%% As the dimensions of x change with each iteration an extra
step to set aside memory takes place on each iteration.
for k = 2:1000000
x(1,k) = x(1,k-1) + 5;
end
toc
%%
Preallocated Variable X
tic
x =zeros(1,1000000);
%% As the dimensions of x remain constant, performance is higher
here as we only carry out steps we need to on each iteration.
for k = 2:1000000
x(1,k) = x(1,k-1) + 5;
end
toc
%%
Suppression
Suppression
❑Suppression is the act of putting a semicolon next to variables that
we declare or calculate so that MATLAB® does not unnecessarily
display them in the Command Window.
Suppression
❑You are encouraged to run the code examples below.
❑The suppressed code calculates all the values for variable x and
took 0.0025 seconds.
❑The unsuppressed code calculates all the values for variable x and
took 9.6777 seconds.
❑Note that the unsuppressed code takes much longer.
❑Each time a value of x is calculated, because there is no semicolon
to suppress the statement, the value of x is displayed.
Suppressed Code
%%Note the semicolon suppressing x in the for loop
tic
x=zeros(1,1000);
for k = 2:1000
x(1,k) = x(1,(k-1)) + 5;
end
toc
%%
Unsuppressed Code
%%Note that x is unsuppressed, since there is no semicolon the
value of x will print out every time the loop runs
tic
x=zeros(1,1000);
for k = 2:1000
x(1,k) = x(1,(k-1)) + 5
end
toc
%%
Bit Precision
Template
Name Bits
Used
Can Represent Non-
Integers
Can Represent Negative
Numbers
Double
Precision
64 Yes Yes
Single
precision
32 Yes Yes
Int64 64 No Yes
Int32 32 No Yes
Int16 16 No Yes
Int8 8 No Yes
Uint64 64 No No
Uint32 32 No No
Uint16 16 No No
Uint8 8 No No
Converting to Double and Single
%% Below we declare a variable x, the default bit precision is
Double
x=ones(1,100000);
%% Below is the code to change the bit precision of a variable to
Single.
x=single(x);
%to change from another bit precision to double use
x=double(x);
Converting to Int
x=ones(1,100000);
%% Below is the code to change the bit precision of a variable to Int64.
x=int64(x);
%% Below is the code to change the bit precision of a variable to Int32.
x=int32(x);
%% Below is the code to change the bit precision of a variable to Int16.
x=int16(x);
%% Below is the code to change the bit precision of a variable to Int8.
x=int8(x);
Convert to Uint
x=ones(1,100000);
%% Below is the code to change the bit precision of a variable to Uint64.
x=uint64(x);
%% Below is the code to change the bit precision of a variable to Uint32.
x=uint32(x);
%% Below is the code to change the bit precision of a variable to Uint16.
x=uint16(x);
%% Below is the code to change the bit precision of a variable to Uint8.
x=uint8(x);
CPU Parallel Processing and Programming
Parallel Processing
❑You will have noticed now that For loops are computed in a sequential or serial fashion, the
first iteration in the loop is computed, then the second and so on.
❑Parallel processing allows multiple iterations to be computed simultaneously, with the aim of
reducing the time taken to compute a task or carry out a calculation.
Setting Up Parallel Processing
❑In the home tab, beneath the preference cog you will see the parallel dropdown box, click on
that, click manage cluster profile, edit the local(default) cluster profile, set the “Number of
workers to start on your local machine “NumWorkers” to 12, next scroll down and set “Range of
number of workers to run job “NumWorkersRange” to 12.
❑Click the Validate tick icon, this will let MATLAB® test your system to see if the settings above
can be used. Each worker is a parallel process, with the settings above we have set 12 parallel
processes to run on your local default cluster profile. If you have failed the validation test,
reduce the number of workers and the range down to a lower number like 4.
Setting Up Parallel Processing
❑You cannot carry out a parallel process if it depends on future or past iterations as each
iteration is carried out independently on a different thread with no knowledge of any of the
other iterations.
❑Each worker used in turn uses a lot of virtual memory, roughly 150-350MB each, if you run a
machine with little virtual memory available you will have to use fewer workers.
❑Although there are several caveats, imagination and creativity with your coding can overcome
these problems. Parallel computation is very fast at computing moving averages over larger
datasets when compared to serial computation.
Setting Up Parallel Processing
❑To initialise parallel processing we use the code: parpool local
❑Note that initialising parallel processing does not take a negligible amount of time.
❑Therein lies the crux of the problem. If your code computes faster than the initialisation time
don’t bother with parallel computing.
❑The benefit of parallel computing is seen when you are working on large datasets. This is where
serial computation really shows its inefficiency.
❑To shut down workers after computation we use the code: delete(gcp('nocreate')) Make sure
to shut down workers as this will free up virtual memory and reduce the load on your CPU.
Serial Computation
%%
x(1)=1;
for i =2:10
x(i)=x(i-1)+1
end
% Output would be x= [1,2,3,4,5,6,7,8,9,10].
Parallel Computation Implemented Incorrectly
% Note that the following code will fail as we cannot address a past or future indexed
iterations that computed within a parallel loop.
parpool local %This initialises the for loop and accesses the local profile, older
versions of MATLAB® may use matlabpool open local instead of parpool local.
x(1:10)=1;
parfor i =2:10
x(1,i)=x(1,(i-1))+1
end
delete(gcp('nocreate')) % This shuts down the workers after computation, older
versions of MATLAB® may use matlabpool close instead of delete(gcp('nocreate')).
Parallel Computation Implemented Correctly
%% Here on each iteration we calculate the moving average of c and store the value in x using a
window of 5 samples.
c(1:1000000)=rand;
c=single(c); % When using a large variable it is prudent to cut down on bit precision if possible to
save virtual memory.
parpool local
tic
parfor i =3:999998
x(1,i)=mean(c(i-2:i+2));
end
toc
delete(gcp('nocreate'))
%%
GPU Parallel Processing and Programming
GPU Parallel Processing and Programming
❑A GPU or Graphics Processing Unit can carry out computation and store variables due to having
on-board memory.
❑The GPU was designed for a very specific task in mind, to render frames and carry out simple
calculations.
❑GPUs do not perform as well as CPUs when working on If and While statements.
❑However, they are incredibly fast at simple repetitive computations such as addition,
subtraction, division and multiplication.
❑This makes GPUs ideal for image processing and signal processing.
GPU Parallel Processing and Programming
❑We can compute variables that are on the CPU and GPU simultaneously with one another but
we are blind to modifications made to GPU based variables until we call them back into the CPU
driven workspace.
❑Sending information to and from your GPU generates an overhead, if managed poorly, it can
defeat any time saved or bottleneck things, becoming time consuming.
❑Once a variable is on the GPU you can use it like any other variable. There are a number
MATLAB® functions are optimised to accelerate computation with variables that are stored on a
GPU, refer to the MathWorks website for an updated list.
GPU Parallel Processing and Programming
❑To Parallel Process variables on a GPU simply setup a parfor loop where all the variables being
acted on are stored on the GPU.
❑The same rules apply as in parallel processing on a CPU, we can’t access past or future
iterations of variables that are defined inside the loop that rely on the loop index as the workers
work independent of one another on various iterations simultaneously.
GPU Code
%% X is a variable and G is that variable put on the GPU.
X = rand(10,'single');
G = gpuArray(X);
% To bring it back to the workspace and view your GPU variables
we “gather” it onto a new variable c.
c = gather(G) ;
%%

More Related Content

What's hot

Ensembling & Boosting 概念介紹
Ensembling & Boosting  概念介紹Ensembling & Boosting  概念介紹
Ensembling & Boosting 概念介紹Wayne Chen
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overheadCass Everitt
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SBrandon Liu
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...MLconf
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboostmichiaki ito
 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Johan Andersson
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 
09 accelerators
09 accelerators09 accelerators
09 acceleratorsMurali M
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learningAmgad Muhammad
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsElectronic Arts / DICE
 
2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptxJAEMINJEONG5
 
Sig13 ce future_gfx
Sig13 ce future_gfxSig13 ce future_gfx
Sig13 ce future_gfxCass Everitt
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
 
Anti-Aliasing Methods in CryENGINE 3
Anti-Aliasing Methods in CryENGINE 3Anti-Aliasing Methods in CryENGINE 3
Anti-Aliasing Methods in CryENGINE 3Tiago Sousa
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15MLconf
 
Seminar_New -CESG
Seminar_New -CESGSeminar_New -CESG
Seminar_New -CESGQian Wang
 

What's hot (20)

Ensembling & Boosting 概念介紹
Ensembling & Boosting  概念介紹Ensembling & Boosting  概念介紹
Ensembling & Boosting 概念介紹
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overhead
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboost
 
2021 04-01-dalle
2021 04-01-dalle2021 04-01-dalle
2021 04-01-dalle
 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
09 accelerators
09 accelerators09 accelerators
09 accelerators
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-Graphics
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx
 
Slide tesi
Slide tesiSlide tesi
Slide tesi
 
Sig13 ce future_gfx
Sig13 ce future_gfxSig13 ce future_gfx
Sig13 ce future_gfx
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
Low-level Graphics APIs
Low-level Graphics APIsLow-level Graphics APIs
Low-level Graphics APIs
 
Anti-Aliasing Methods in CryENGINE 3
Anti-Aliasing Methods in CryENGINE 3Anti-Aliasing Methods in CryENGINE 3
Anti-Aliasing Methods in CryENGINE 3
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
 
Seminar_New -CESG
Seminar_New -CESGSeminar_New -CESG
Seminar_New -CESG
 

Similar to Faster computation with matlab

Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learningAmer Ather
 
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLScaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLSeldon
 
OSTEP Chapter2 Introduction
OSTEP Chapter2 IntroductionOSTEP Chapter2 Introduction
OSTEP Chapter2 IntroductionShuya Osaki
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptxruvex
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesEd Hunter
 
End to end testing a web application with Clojure
End to end testing a web application with ClojureEnd to end testing a web application with Clojure
End to end testing a web application with ClojureGerard Klijs
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
Parallelism in a NumPy-based program
Parallelism in a NumPy-based programParallelism in a NumPy-based program
Parallelism in a NumPy-based programRalf Gommers
 
Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Julien SIMON
 
MapReduce: teoria e prática
MapReduce: teoria e práticaMapReduce: teoria e prática
MapReduce: teoria e práticaPET Computação
 
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...Bharath Sudharsan
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded ProgrammingSri Prasanna
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
 
Multiprocessing with python
Multiprocessing with pythonMultiprocessing with python
Multiprocessing with pythonPatrick Vergain
 
Leonid Kuligin "Training ML models with Cloud"
 Leonid Kuligin   "Training ML models with Cloud" Leonid Kuligin   "Training ML models with Cloud"
Leonid Kuligin "Training ML models with Cloud"Lviv Startup Club
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 

Similar to Faster computation with matlab (20)

Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLScaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
 
OSTEP Chapter2 Introduction
OSTEP Chapter2 IntroductionOSTEP Chapter2 Introduction
OSTEP Chapter2 Introduction
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptx
 
Introduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimizationIntroduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimization
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
End to end testing a web application with Clojure
End to end testing a web application with ClojureEnd to end testing a web application with Clojure
End to end testing a web application with Clojure
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Parallelism in a NumPy-based program
Parallelism in a NumPy-based programParallelism in a NumPy-based program
Parallelism in a NumPy-based program
 
Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)
 
MapReduce: teoria e prática
MapReduce: teoria e práticaMapReduce: teoria e prática
MapReduce: teoria e prática
 
Matopt
MatoptMatopt
Matopt
 
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
 
OpenMP.pptx
OpenMP.pptxOpenMP.pptx
OpenMP.pptx
 
openmpfinal.pdf
openmpfinal.pdfopenmpfinal.pdf
openmpfinal.pdf
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
Multiprocessing with python
Multiprocessing with pythonMultiprocessing with python
Multiprocessing with python
 
Leonid Kuligin "Training ML models with Cloud"
 Leonid Kuligin   "Training ML models with Cloud" Leonid Kuligin   "Training ML models with Cloud"
Leonid Kuligin "Training ML models with Cloud"
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 

Recently uploaded

Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 

Recently uploaded (20)

Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 

Faster computation with matlab

  • 1. Faster Computation with MATLAB BY MUHAMMAD BILAL ALLI
  • 2. Course Outline ❑Performance ❑Memory Types ❑Preallocation ❑Suppression ❑Bit Precision ❑CPU Parallel Processing and Programming ❑GPU Parallel Processing and Programming
  • 4. Performance ❑Performance is key when we find ourselves running code that takes a long time to finish. ❑The following videos outline many facets of theory, but there are some common factors we can eliminate before we delve into any further material.
  • 5. Performance ❑MATLAB® is best run with no other applications open. If your code takes a long time to compile, consider closing other programs e.g. browsers. These programs add a load to the CPU and use up memory that could be used elsewhere to handle variables. ❑Under your computers power options make sure your CPU is not set to a power saver profile, instead, set it to high performance to get the best performance out of your CPU.
  • 7. Memory Types ❑When we think of memory whilst discussing computers, we typically think of hard drives, our standard storage devices. ❑Virtual memory is the memory that drives our computation. We usually think of RAM (Random-Access Memory) when we think of virtual memory but virtual memory can be a mix of both storage memory from hard drives and RAM.
  • 8. Memory Types ❑RAM is much faster and built for high speed calculation/computation, storage memory is slower. ❑By adjusting the page file one can add storage memory to the pool of virtual memory. ❑This can be thought of as adding slower RAM to your system without having to buy any more. ❑We would typically do this if we needed to hold a large variable and the amount of RAM available was not sufficient.
  • 10. Preallocation ❑Memory preallocation is the act of setting aside memory for a variable so that the memory that variable requires does not need to be recalculated and allotted. ❑In iterative processes we want to have as few steps as possible. ❑By preallocating variables their sizes and memory requirements stay constant. ❑This results in fewer steps that need to be computed on every iteration. ❑This saves time and increases our performance, as we have removed a potential bottleneck.
  • 11. Preallocation ❑You are encouraged to run the code examples on the following slides. ❑The preallocated code took 0.031262 seconds while the non- preallocated code took roughly 15 times longer at 0.472885.
  • 12. Non-Preallocated Variable X tic x = 0; %% As the dimensions of x change with each iteration an extra step to set aside memory takes place on each iteration. for k = 2:1000000 x(1,k) = x(1,k-1) + 5; end toc %%
  • 13. Preallocated Variable X tic x =zeros(1,1000000); %% As the dimensions of x remain constant, performance is higher here as we only carry out steps we need to on each iteration. for k = 2:1000000 x(1,k) = x(1,k-1) + 5; end toc %%
  • 15. Suppression ❑Suppression is the act of putting a semicolon next to variables that we declare or calculate so that MATLAB® does not unnecessarily display them in the Command Window.
  • 16. Suppression ❑You are encouraged to run the code examples below. ❑The suppressed code calculates all the values for variable x and took 0.0025 seconds. ❑The unsuppressed code calculates all the values for variable x and took 9.6777 seconds. ❑Note that the unsuppressed code takes much longer. ❑Each time a value of x is calculated, because there is no semicolon to suppress the statement, the value of x is displayed.
  • 17. Suppressed Code %%Note the semicolon suppressing x in the for loop tic x=zeros(1,1000); for k = 2:1000 x(1,k) = x(1,(k-1)) + 5; end toc %%
  • 18. Unsuppressed Code %%Note that x is unsuppressed, since there is no semicolon the value of x will print out every time the loop runs tic x=zeros(1,1000); for k = 2:1000 x(1,k) = x(1,(k-1)) + 5 end toc %%
  • 20. Template Name Bits Used Can Represent Non- Integers Can Represent Negative Numbers Double Precision 64 Yes Yes Single precision 32 Yes Yes Int64 64 No Yes Int32 32 No Yes Int16 16 No Yes Int8 8 No Yes Uint64 64 No No Uint32 32 No No Uint16 16 No No Uint8 8 No No
  • 21. Converting to Double and Single %% Below we declare a variable x, the default bit precision is Double x=ones(1,100000); %% Below is the code to change the bit precision of a variable to Single. x=single(x); %to change from another bit precision to double use x=double(x);
  • 22. Converting to Int x=ones(1,100000); %% Below is the code to change the bit precision of a variable to Int64. x=int64(x); %% Below is the code to change the bit precision of a variable to Int32. x=int32(x); %% Below is the code to change the bit precision of a variable to Int16. x=int16(x); %% Below is the code to change the bit precision of a variable to Int8. x=int8(x);
  • 23. Convert to Uint x=ones(1,100000); %% Below is the code to change the bit precision of a variable to Uint64. x=uint64(x); %% Below is the code to change the bit precision of a variable to Uint32. x=uint32(x); %% Below is the code to change the bit precision of a variable to Uint16. x=uint16(x); %% Below is the code to change the bit precision of a variable to Uint8. x=uint8(x);
  • 24. CPU Parallel Processing and Programming
  • 25. Parallel Processing ❑You will have noticed now that For loops are computed in a sequential or serial fashion, the first iteration in the loop is computed, then the second and so on. ❑Parallel processing allows multiple iterations to be computed simultaneously, with the aim of reducing the time taken to compute a task or carry out a calculation.
  • 26. Setting Up Parallel Processing ❑In the home tab, beneath the preference cog you will see the parallel dropdown box, click on that, click manage cluster profile, edit the local(default) cluster profile, set the “Number of workers to start on your local machine “NumWorkers” to 12, next scroll down and set “Range of number of workers to run job “NumWorkersRange” to 12. ❑Click the Validate tick icon, this will let MATLAB® test your system to see if the settings above can be used. Each worker is a parallel process, with the settings above we have set 12 parallel processes to run on your local default cluster profile. If you have failed the validation test, reduce the number of workers and the range down to a lower number like 4.
  • 27. Setting Up Parallel Processing ❑You cannot carry out a parallel process if it depends on future or past iterations as each iteration is carried out independently on a different thread with no knowledge of any of the other iterations. ❑Each worker used in turn uses a lot of virtual memory, roughly 150-350MB each, if you run a machine with little virtual memory available you will have to use fewer workers. ❑Although there are several caveats, imagination and creativity with your coding can overcome these problems. Parallel computation is very fast at computing moving averages over larger datasets when compared to serial computation.
  • 28. Setting Up Parallel Processing ❑To initialise parallel processing we use the code: parpool local ❑Note that initialising parallel processing does not take a negligible amount of time. ❑Therein lies the crux of the problem. If your code computes faster than the initialisation time don’t bother with parallel computing. ❑The benefit of parallel computing is seen when you are working on large datasets. This is where serial computation really shows its inefficiency. ❑To shut down workers after computation we use the code: delete(gcp('nocreate')) Make sure to shut down workers as this will free up virtual memory and reduce the load on your CPU.
  • 29. Serial Computation %% x(1)=1; for i =2:10 x(i)=x(i-1)+1 end % Output would be x= [1,2,3,4,5,6,7,8,9,10].
  • 30. Parallel Computation Implemented Incorrectly % Note that the following code will fail as we cannot address a past or future indexed iterations that computed within a parallel loop. parpool local %This initialises the for loop and accesses the local profile, older versions of MATLAB® may use matlabpool open local instead of parpool local. x(1:10)=1; parfor i =2:10 x(1,i)=x(1,(i-1))+1 end delete(gcp('nocreate')) % This shuts down the workers after computation, older versions of MATLAB® may use matlabpool close instead of delete(gcp('nocreate')).
  • 31. Parallel Computation Implemented Correctly %% Here on each iteration we calculate the moving average of c and store the value in x using a window of 5 samples. c(1:1000000)=rand; c=single(c); % When using a large variable it is prudent to cut down on bit precision if possible to save virtual memory. parpool local tic parfor i =3:999998 x(1,i)=mean(c(i-2:i+2)); end toc delete(gcp('nocreate')) %%
  • 32. GPU Parallel Processing and Programming
  • 33. GPU Parallel Processing and Programming ❑A GPU or Graphics Processing Unit can carry out computation and store variables due to having on-board memory. ❑The GPU was designed for a very specific task in mind, to render frames and carry out simple calculations. ❑GPUs do not perform as well as CPUs when working on If and While statements. ❑However, they are incredibly fast at simple repetitive computations such as addition, subtraction, division and multiplication. ❑This makes GPUs ideal for image processing and signal processing.
  • 34. GPU Parallel Processing and Programming ❑We can compute variables that are on the CPU and GPU simultaneously with one another but we are blind to modifications made to GPU based variables until we call them back into the CPU driven workspace. ❑Sending information to and from your GPU generates an overhead, if managed poorly, it can defeat any time saved or bottleneck things, becoming time consuming. ❑Once a variable is on the GPU you can use it like any other variable. There are a number MATLAB® functions are optimised to accelerate computation with variables that are stored on a GPU, refer to the MathWorks website for an updated list.
  • 35. GPU Parallel Processing and Programming ❑To Parallel Process variables on a GPU simply setup a parfor loop where all the variables being acted on are stored on the GPU. ❑The same rules apply as in parallel processing on a CPU, we can’t access past or future iterations of variables that are defined inside the loop that rely on the loop index as the workers work independent of one another on various iterations simultaneously.
  • 36. GPU Code %% X is a variable and G is that variable put on the GPU. X = rand(10,'single'); G = gpuArray(X); % To bring it back to the workspace and view your GPU variables we “gather” it onto a new variable c. c = gather(G) ; %%