Literature Review on Content Based Image Retrieval
1. Literature Review on
Content Based Image
Retrieval (CBIR)
2015 July MCS3108 Image Processing and Vision
Name : U. V Vandebona
Index No : 13440722
Registration No: 2013/MCS/072
2. Literature Review on Content Based Image Retrieval (CBIR)
Page 1
A B S T R A C T
The purpose of this study is to explore on the field of content based image retrieval, which comes under
image processing and vision domain. This paper further discusses the research work carried out in this arena
specifically more focusing on the question, how this image retrieval technique would impact on the computer
vision and its future.
Keywords: Content Based Image Retrieval (CBIR), Computer Vision, Reverse Image Search
INTRODUCTION
The term “Content Based Image Retrieval” was coined by Toshikazu Kato in 1992 on his research
article “Database architecture for content-based image retrieval”. His experiments were on about
automatic retrieval of desired images from a large collection of image database, based on the image
features such like color and shape (Kato 1992).
Content based image retrieval or in shorten CBIR, is an application of computer vision techniques
to image retrieval. In the earlier days, when it comes to image retrieval, it was only concept based.
Which means using of metadata such as keywords, tags, or descriptions associated with the image giving
a concept, giving a descriptive meaning to the image (Khutwad and Vaidya 2013). But we cannot
guarantee that, for every image there exist associated text annotations or complete text annotations. For
example images captured from surveillance cameras. So looking for content is a good option to fill that
gap. In this context, the term "content" refer to colors, shapes, textures, or any other image feature
information that can be derived from the image itself. The main drawback of traditional concept based
approach is to create that kind of a text descriptive database is time consuming as it need to do
manually and may not capture the keywords desired to describe the image. As CBIR is an automated
approach, the effect of the human errors is really less.
ARCHITECTURE
A typical CBIR system contains four parts in its process as depict in the figure 1.
1. Creating the image data collection
2. Build feature database by automatically extracting features of the images in image data
collection.
3. Search on a required image using feature database
4. Arrange the order of retrieved results
Search
In most CBIR systems, there are two ways to search for an image. Which method to use is depend on the
application domain.
3. Literature Review on Content Based Image Retrieval (CBIR)
Page 2
• Query by Example
Query by example (QBE) is a query technique that involves providing the CBIR system with an
example image or part of an image that it will then base its search upon. Also search with
multiple sample images or search with a sketched image can be taken. Result images should all
share common elements with the provided sample image. This is also called as reverse image
search and Google image search is a popular example for this technique. Commonly used reverse
image search algorithms include:
• Scale-invariant feature transform - to extract local features of an image
• Maximally stable extremal regions
• Vocabulary Tree
• Text Semantics
Apart from the QBE, images can be retrieved by providing text semantics. For example if we
query by providing the text “elephant with a flower”, the retrieved images should contain an
elephant and a flower. This type of open-ended task is very difficult for computers to perform as
there need to be some training to match features of the semantics. This method needs some form
of human feedback to optimize the resulting images. Human interaction can progressively refine
the search results by marking images in the results as "relevant", "not relevant", or "neutral" to the
search query, then repeating the search with the new information.
Figure 1 - General Architecture of a CBIR System
4. Literature Review on Content Based Image Retrieval (CBIR)
Page 3
THE “CONTENT” COMPARISON
The most common method for comparing two images in CBIR is using an image distance measure.
It compares the similarity of two images in various dimensions such as color, texture and shape; the visual
features what we described in the next section. For example a distance of value 0 signifies an exact
match with the query, with respect to the dimensions that were considered. Search results then can be
sorted based on their distance to the queried image. Many techniques to measure image distance what
we called as similarity models, have been developed and can be used for fulfill this requirement. The
distance formulas used by many researchers, for image retrieval, include Histogram Euclidean Distance,
Histogram Intersection Distance, Histogram Manhattan Distance and Histogram Quadratic Distance
(Singha and Hemachandran 2012). The evaluation of results can be done in terms of precision and recall.
THE “CONTENT” - VISUAL FEATURES
Comparing same two images may be an easy task. But if those same two images have different
scales, rotations or different transformations, then it would be challenging. It would make more
challenging if the object in the image itself has different transformations than to the other. To solve this
problem, in CBIR, a description of the required content in terms of visual features of an image is used.
Features described can be either of general purpose or domain specific. General features include low
level features such like color and texture, and middle level features such like shape etc., whereas domain
specific features are those used in special applications such as biometrics.
Color
This method has advantages of speed retrieval, low demand of memory space and not sensitive
with the image changes of the size and rotation. Therefore this type of CBIRs being widely used. Color
based general purpose image retrieval systems roughly fall into three categories depending on the
feature extraction approach used.
1. Histogram Based
2. Color Layout Based
3. Region Based
Histogram-based search methods are investigated in two different color spaces RGB and HSV.
The first order (mean), the second order (variance) and the third order (skewness) color moments have
been proved to be efficient and effective in representing color distributions of images (Khutwad and
Vaidya 2013). Computing distance measures based on color similarity is achieved by computing a color
histogram for each image that identifies the proportion of pixels within an image holding specific values.
Many research results suggested that by extending the global color feature to a local one, can
obtain better resultant image retrieval. So a good approach is to divide the whole image into sub blocks
and extract color features from each of the sub blocks. A variation of this approach is the quad tree-
based color layout approach where the entire image is split into a quad tree structure and each tree
branch has its own histogram to describe its color content.
5. Literature Review on Content Based Image Retrieval (CBIR)
Page 4
Even though color layout based approach is conceptually simple, the computation and storage
mechanism is expensive. So a more sophisticated approach is to segment the image into regions with
salient color features by color-set back projection and then to store the position and color-set feature of
each region (Kaur and Banga 2011).
Texture
This method looks for visual patterns in images and how they are spatially defined. Textures are
represented by textons which are then placed into a number of sets, depending on how many textures
are detected in the image. These sets not only define the texture, but also where in the image the texture
is located. Texture based general purpose image retrieval systems usually adopt texture statistic features
and structure features by transforming the special domain into frequency domain (Kodituwakku and
Selvarajah 2011). This method uses following methods to classify textures.
1. Co-occurrence matrix
2. Laws texture energy
3. Wavelet Transform and Gabor Transform (Singh and Minu 2013)
4. Orthogonal Transforms
Shape
Shapes will often be determined first applying segmentation or edge detection to an image.
Edges convey essential visual information about images. The edge descriptor captures the five categories
of spatial distribution of edges that include vertical, horizontal, 45 degree diagonal, 135 degree
diagonal, and isotropic. This model expects the input as query by example and any combination of
features can be selected for retrieval.
Most of the shape descriptors are not been able to address varieties of shape variations in
nature. Shapes of natural objects can be from different angles and can be rotated, scaled, skewed,
stretched, defected and can be noise affected, etc. It is generally recognized that an effective shape
representation should be rotation, translation and scaling invariant. A shape representation should also
be invariant or robust to affine and perspective transform to address the skew, stretching, and different
views of objects. Generally, there are two groups of shape descriptors;
1. Contour-based shape descriptors
2. Region based shape descriptors.
The Contour shape descriptors only employ shape boundary information and capture shape
boundary features. Region-based shape descriptors make use of all the pixel information across the
shape region (Kaur and Banga 2011).
SUMMARY AND DISCUSSION
What features and representations should be used in image retrieval is depend on the
application domain. By combining the content based image retrieval techniques with the concept based
image retrieval techniques, the overall image retrieving performance can make increased.
6. Literature Review on Content Based Image Retrieval (CBIR)
Page 5
REFERENCE
1. Kato, Toshikazu. "Database architecture for content-based image retrieval." Proceedings of SPIE Image
Storage and Retrieval Systems. 1992.
2. Kaur, Simardeep, and V K Banga. "Content Based Image Retrieval." International Conference on Advances
in Electrical and Electronics Engineering, 2011.
3. Khutwad, Harshada Anand, and Ravindra Jinadatta Vaidya. "Content Based Image Retrieval."
International Journal of Image Processing and Vision Sciences (ISSN Print: 2278 – 1110) Vol 2, no. 1
(2013).
4. Kodituwakku, Saluka Ranasinghe, and S Selvarajah. "Analysis and Comparison of Texture Features for
Content Based Image Retrieval." International Journal of Latest Trends in Computing (E-ISSN: 2045-5364)
Vol 2, no. 1 (March 2011).
5. Singh, Garima, and Priyanka Bansal Minu. "Content Based Image Retrieval." International Journal of
Innovative Research and Studies (ISSN: 2319-9725) Vol 2, no. 7 (July 2013).
6. Singha, Manimala, and K Hemachandran. "Content Based Image Retrieval using Color and Texture." Signal
& Image Processing : An International Journal (SIPIJ) Vol 3, no. 1 (2012).
Concept Based
Image Retrieval
Content Based Image Retrieval - Low Level Features
Color Texture Shape
Histogr
am
Color
Layout Regi
on
Wavelet
Transform and
Gabor
Transform
Contour-
based
shape
descriptors
Region-
based
shape
descriptors
Use Metadata
RGB,
HSV
Local color
features
Seg
ment
atio
n
Texture statistic
features &
structure
features
Shape
boundary
information
Regional
pixel
information
Disadvanta
ges
− Every image do
not have
complete
metadata
− Time consuming
manual labor to
create the
feature database
Doesn’t
conside
r the
local
color
informa
tion
Computation
and storage
mechanism is
expensive
than region
based
method
Variances in
textons can lead
confusion for
search.
Not been able to address
accurately varieties of
shape variations
Advantages Speed
− Speed
− Low demand for memory
space
− Not sensitive for image
transformations
Performance Performance