This file contains a simple description about what I have created about how to detect object motion and track whatever moving as a computer vision project when being undergraduate student at 2014.
The MATLAB code of the system is also available in the document.
Find me on:
AFCIT
http://www.afcit.xyz
YouTube
https://www.youtube.com/channel/UCuewOYbBXH5gwhfOrQOZOdw
Google Plus
https://plus.google.com/u/0/+AhmedGadIT
SlideShare
https://www.slideshare.net/AhmedGadFCIT
LinkedIn
https://www.linkedin.com/in/ahmedfgad/
ResearchGate
https://www.researchgate.net/profile/Ahmed_Gad13
Academia
https://www.academia.edu/
Google Scholar
https://scholar.google.com.eg/citations?user=r07tjocAAAAJ&hl=en
Mendelay
https://www.mendeley.com/profiles/ahmed-gad12/
ORCID
https://orcid.org/0000-0003-1978-8574
StackOverFlow
http://stackoverflow.com/users/5426539/ahmed-gad
Twitter
https://twitter.com/ahmedfgad
Facebook
https://www.facebook.com/ahmed.f.gadd
Pinterest
https://www.pinterest.com/ahmedfgad/
Real-Time Object Motion Detection Using Background Subtraction
1. Real-Time Object Motion Detection and Tracking
2014
By
Ahmed Fawzy Gad
Faculty of Computers and Information (FCI)
Menoufia University
Egypt
ahmed.fawzy@ci.menofia.edu.eg
MENOUFIA UNIVERSITY
FACULTY OF COMPUTERS AND
INFORMATION
INFORMATION TECHNOLOGY
DEPARTMENT
COMPUTER VISION
المنوفية جامعة
والمعلومات الحاسبات كلية
قسمالمعلومات تكنولوجيا
بالحاسب الرؤية
المنوفية جامعة
2. Proposed system specification:
The main goal is to establish a system that can track moving object in a video if existed that can be
used in real-time real-life surveillance video. From a give input video, it will be needed to analyze the
video well to extract valiable informtion and data that help us to distinguish between real still static
parts and moving parts. The major steps is to read an input video frames from an already existing video
or from a live camera. The frames read will be analyzed to find the background of the frame. If the
background was correctly detected with a high percentage of accuracy, it will be easy to analyze the
video to find moving objects after that. After successfully finding the background, it is required to find
if there is an moving object in successive frames. The objects can be extracted using simple segmenta-
tion operations. Then for every detected moving object, it will be traced until it is hidden or until the
video end. After each frame, all moving objects will be labeled.
Steps involved in the system are as shown in the following diagram:
Key terms and algorithms involved in the system:
Retinex Algorithm
Gaussian Mixture Models (GMM)
Morphological Operations
Zero-Mean Gaussian Distribution
Shadow Effect
Gradient Filter
System can be divided into two major steps:
1. Background Model Creation
2. Objects Segmentation and Tracking
Background Model Creation:
At first, we need to find what is the background and what is the foreground of the video frame. Using a
collection of frames we can fullfill this task. By assuming that the first frame signifies the objects mov-
ing and the still background. Then for the next frame we can detect moving objects by finding the
frame difference mask and testing if there is a change over the currectly used background (consisting of
just one frame). The pass the second frame and go to third frame and calculate the frame difference
mask between the second and third frames. But before processing the third one, we need to update the
background of the image. But why to update the background? The answer is that no fixed background
can be used for all the video because the content is continously changing. Because of changes in the
scene (environmental, shadows, etc.) it is necessary to update the background model. Say that the the
first 10 frames are used to get the background and no other frames will be used. Say that in those 10
frames, someone was standing in them, so you 100% will regard this person a part of the background.
But what if he started to move after the 10 frames. So your background will not be consistent and may
lead to some false positives/negtives. So you need to continously update your background model. Back
to third frame, extract objects the try to upate the background model. Continue in this form until you
you reached a consistent background model if from each successive frame adds no valiable information
to the currently used background model.
To simplify the system and make it computationally suitable, start to find the frame difference mask
3. between some specified number of frames. The higher this number the more good the accuracy. The
frame difference mask is obtained by thresholding the frame difference and this output of the threshold-
ing operation is accumulated accross high (desired) number of frames and result is sent to the back-
ground model to enhance it more and more.
In order for the frame difference to be applicable, it must obeys a zero-mean Gaussian distribution and
its probability den-
sity function as shown in the following equation:
where FD is the frame difference and is the variance of the frame difference and is equal to twice
the camera noise variance. Ho denotes the null hypothesis, i.e., the hypothesis that there is no change
at the current pixel. The threshold value is decided by required significance level.
Their relation is shown as follows:
Exmple is shown in the following figure:
Assume that there is just one background image is taken into regard, then all classification will depend
on this single image. and result is as follows:
After some other successive frames, not just a single frame was forming the background model, but a
collection of frames the results must for sure be more accurate and the model will be more and more
enhanced. This is the result:
4. According to the measured frame difference masks, you can create a suitable background model. In this
approach of determining the background, only motion-based information is used to extract the moving
objects and hence finding the background and no shape information extracted. This may help to gener-
alize the approach and avoids leaky results in case that a specefic object stops. Also in the video with
low quality and especilly in real-time and outdoor frames, it will be difficult to extract the specefic in-
formation about shape and texture because of the high variation in objects characteristics.
So rather than using a single static background model, a moving average background model is used.
Especially because basing the model on a single image will not be suitable for outdoors frames. A per-
centage of 5% of the total frames initially can be suitable and you make it dynamic for higher accuracy.
Objects Segmentation and Tracking:
After finding the background mask, next is to calculate the background difference. The background
difference step generates a background difference mask by thresholding the difference between the
current frame and the background information stored in the last background which is similar to what
done in the frame difference mask step. Similar to frame difference mask, the threshold value is also
determined by the required significance level according to the same equation:
Now background can be detected efficiently, so next is to detect the objects found in successive frames
that is not apply to any pixel region in the currently used background model. The object detection step
generates the initial object mask from the frame difference mask and the background difference mask.
The following table lists the criteria for object detection
where |BD| means the absolute value of difference between the current frame and the background in-
formation stored in the background buffer, |FD| is the absolute value of frame difference, and the OM
field indicates that whether or not the pixel is included in the object mask. TH(BD) and TH(FD) are the
threshold values for generating the background difference mask and frame difference mask, respective-
ly.
After the object detection step, an initial object mask is generated. However, due to the camera noise
and irregular object motion, there exist some noise regions in the initial object mask. The left hand-side
image is an image found at the beggining of the process where no mush number of frames was collect-
ed to be used as background and the right hand-side image is the one obtained after the background
mask was sufficiently trained.
5. This degradation is shown in the above images. A post-processing step to eliminate such noise. One
way to remove this noise is to use simple morphological operations such as openig, closing, dilation,
erosion, or combinations of them. But with morphological operations, it is recommended to use it in
case of simple regions that contains narrow holes but if applied to lage region, it will need a larger
mask and hence increasing computation complexicity and may also degrade the performance.
Since there are two kinds of noise, noise in the background region and noise in the foreground region,
two passes are included in this step. The first pass removes small black regions found in the back-
ground regions, which are noise regions in foreground or holes in the change detection mask. The sec-
ond pass removes small white regions in foreground regions, which are noise regions in background or
false alarm regions in change detection mask. After that, a close-open morphological operations can be
suitably applied.
This system is dedicated to track moving objects. No other regions rather than the desired object should
be taken into regard. But if an object is moving, with the falling lighting or sun shine may create a
shadow for this moving object which is very difficult to distinguish moving object. This is the shadow
effect. But the background can not detect the shadow because the shadow is regarded a moving entity
not still prt and the background can distinguish only between still and moving objects. The gradient
filter is used to eliminate this effect.
where I is the input imge, B is a 3x3 structuring element of the morphological operation and G is the
gradient image.
The output of the gradient filter (G) is used to be the input to the segmentation process rather than us-
ing the original image. After taking the gradient, the values in the shadow region tend to be very small
while the edges have large gradient value. The effect of the shadow can be reduced significantly. An-
other benefit of the gradient filter is that if the illumination or the camera gain change within the se-
quence, the effect is small in the gradient domain.
6. Performance Evaluation:
The metric used in evaluating the accuracy of the system is to count number of objects detected and
tracked in each frame and compare the results with ground truth data. This is frame-based approach to
evaluation the system. Each frame is treated independently from other frames. This does not take into
account the response of the system in preserving the identity of the object over its lifespan.
Starting with the first frame of the test sequence, frame-based metrics are computed for every frame in
the sequence. From each frame in the video sequence, first a few true and false detection and tracking
quantities are computed.
True Negative (TN): Number of frames where both ground truth and system results agree on the ab-
sence of any object.
True Positive (TP): Number of frames where both ground truth and system results agree on the pres-
ence of one or more objects.
False Negative (FN): Number of frames where ground truth contains at least one object, while system
either does not contain any object.
False Positive (FP): Number of frames where system results contain at least one object, while ground
truth either does not contain any object.
After testing the system over 1135 frames from the PETS 2000 database, there were 360 frames used to
measure TN and FP and remaining 775 frames were used to measure TP and FN.
The system were robust to not mistakenly detect an object when there is not actually an object in the
frame. But weakness appear in case that there is ultimately an object, missing some objects appear.
From the first 360 frames, we can conclude that TN 348 and FP of 12.
From the later 775 frames, we conclude that TP is 723 and FN is 52.
This yields in accuracy of (TP+TN)/Total which is (723+348)/1135=94.4%
REFERENCES
[1] Chien, Shao-Yi, Shyh-Yih Ma, and Liang-Gee Chen. "Efficient moving object segmentation algo-
rithm using background registration technique." Circuits and Systems for Video Technology, IEEE
Transactions on 12.7 (2002): 577-586.
[2] Krishna, MT Gopala, M. Ravishankar, and DR Ramesh Babu. "Automatic detection and tracking of
moving objects in complex environments for video surveillance applications." Electronics Computer
Technology (ICECT), 2011 3rd International Conference on. Vol. 1. IEEE, 2011.
[3] Allen, John G., Richard YD Xu, and Jesse S. Jin. "Object tracking using camshift algorithm and
multiple quantized feature spaces." Proceedings of the Pan-Sydney area workshop on Visual infor-
mation processing. Australian Computer Society, Inc., 2004.
[4] Comaniciu, Dorin, Visvanathan Ramesh, and Peter Meer. "Kernel-based object tracking." Pattern
Analysis and Machine Intelligence, IEEE Transactions on25.5 (2003): 564-577.
[5] Gorry, Benjamin, et al. "Using mean-shift tracking algorithms for real-time tracking of moving im-
ages on an autonomous vehicle testbed platform."Proceedings of World Academy of Science, Engineer-
ing and Technology. Vol. 25. No. 11. 2007.
[6] Stauffer, Chris, and W. Eric L. Grimson. "Adaptive background mixture models for real-time track-
ing." Computer Vision and Pattern Recognition, 1999. IEEE Computer Society Conference on.. Vol. 2.
7. IEEE, 1999.
[7] Bashir, Faisal, and Fatih Porikli. "Performance evaluation of object detection and tracking sys-
tems." IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS).
Vol. 5. 2006.
[8] Keni, Bernardin, and Stiefelhagen Rainer. "Evaluating multiple object tracking performance: the
CLEAR MOT metrics." EURASIP Journal on Image and Video Processing 2008 (2008).
8. MATLAB Code
videoSource =
vsion.VideoFileReader('vidName.mp4','ImageColorSpace','Intensity','VideoOutputDa
taType','uint8');
detector = vision.ForegroundDetector(...
'NumTrainingFrames', 5, ... % 5 because of short video
'InitialVariance', 30*30); % initial standard deviation of 30
blob = vision.BlobAnalysis(...
'CentroidOutputPort', false, 'AreaOutputPort', false, ...
'BoundingBoxOutputPort', true, ...
'MinimumBlobAreaSource', 'Property', 'MinimumBlobArea', 250);
shapeInserter = vision.ShapeInserter('BorderColor','White');
videoPlayer = vision.VideoPlayer();
inc=1;
while ~isDone(videoSource)
frame = step(videoSource);
fgMask = step(detector, frame);
bbox = step(blob, fgMask);
out = step(shapeInserter, frame, bbox); % draw bounding boxes around cars
step(videoPlayer, out); % view results in the video player
inc=inc+1;
end
release(videoPlayer);
release(videoSource);