Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
International Journal of Engineering Research and Development
1. International Journal of Engineering Research and Development
e-ISSN: 2278-067X, p-ISSN: 2278-800X, www.ijerd.com
Volume 10, Issue 6 (June 2014), PP.05-13
Feature extraction for content-based mammogram retrieval
Dr.K.karteeka Pavan1, Sri.M.Brahmaiah2, Ms.Sk.Habi Munnissa3
1RVR&JC College of Engineering, ANU, chowdavaram, Guntur-19.
2R.V.R&JC College of Engineering, ANU, chowdavaram, Guntur-19.
3 R.V.R&JC College of Engineering, ANU, chowdavaram, Guntur-19.
Abstract:- Extracting image features are one of the ways to classify the images. Image texture is used in CBIR
(Content-Based Image Retrieval) to represent and index the images. Many statistical matrix representations are
proposed to distinguish texture by the statistical distribution of the image intensity. This paper studies the
performance of various gray level statistical matrices with thirteen statistical texture features. For the
classification of mammograms. The relative performance of various statistical matrices is evaluated using
classification accuracy and retrieval time by conducting experiments on MIAS (Mammography Image Analysis
Society) images.
Keywords:- Content-based image retrieval, Mammogram, Texture, Gray level statistical Matrix.
I. INTRODUCTION
Content-Based Image Retrieval (CBIR) has become more and more popular for various applications
[12]. Medical image diagnosis is one of the primary application domains for content-based access technologies
[23]. In the medical field, enormous amount of digital images are produced every day and used for diagnostics
like X-ray, MRI, CT, and Mammogram [17]. Finding anatomic structures and other regions of interest is
important in the clinical decision making process. Hence, the decision support systems in radiology create a
need for powerful data retrieval [8]. Breast cancer is one of the causes for cancer deaths in women.
Mammography is the most reliable methods for early detection of breast cancer and one of the most frequent
Application areas within the radiology department with respect to content-based search [11, 22]. Texture is one
of the visual features used in CBIR to represents the image to extract similar areas [27]. Mammograms possess
discriminative textural information [16]; Specific textural patterns cane be revealed on the mammograms for the
calcification, architectural distortion, asymmetry and mass categories [30]. In statistical texture analysis, the
texture information in an image is represented by a gray level statistical matrix from which the textural features
are estimated [9]. The second order and higher order gray level statistical matrices have been found to be a
powerful statistical tool in the discrimination of textures [24]. In [29], the gray level co-occurrence matrices
(GLCMs) of pixel distance one, three and five are generated in order to estimate the Haralick’s texture features
for the retrieval of abnormal mammograms from the MIAS database. Mohamed Eisa et al. [5] investigated the
retrieval of mass and calcification mammograms from the MIAS database using texture and moment-based
features [1]. In [7], the textural features of a medical database consisting brain, spine, heart, lung, breast,
adiposity, muscle, liver and bone images of 11 each are extracted from gray level co-occurrence matrices. The
descriptor combining gradient, entropy, and homogeneity performs better than the remaining features. In [3], for
the classification and retrieval of benign and malignant type mammograms in the MIAS database, the Gabor and
GLCM based texture features are used in addition to shape features. Sun et al. [26] proposed texture features
based on the combination of distortion constraint and weighted moments for the retrieval of abnormal
mammograms from the MIAS database and the result show that their performance is better than region [14] and
Gabor features. In [31], the gray level aura matrix (GLAM) is used to extract texture information for the
retrieval of four categories of mammograms from the DDSM database in [32].
The objectives of the work is i) to extract texture features from various types of mammograms; ii) to
investigate the effectiveness of the texture features for the retrieval of mammograms; iii) to compare the
retrieval performance of GLCM, GLAM, GLNM texture features extraction methods. Section 2 explains the
methodology for mammogram retrieval using the proposed gray level statistical matrix. Section 3 presents the
experimental results and discussions. Finally, Section 4 gives the conclusion.
5
2. Feature extraction for content-based mammogram retrieval
II. METHODOLOGY
Content-based mammogram retrieval using the proposed gray level statistical matrix consists of feature
extraction and image retrieval. During the first stage, in the pre-processing step the regions of interest (ROIs) of
the database images are normalized to zero mean and unit variance [4]. From the pre-processed images, the gray
level statistical matrices are generated in order to estimate the texture features and to form the feature dataset
using GLCM, GLAM, and GLNM. During the on-line image retrieval stage also, initially, the texture features
based on the gray level statistical matrix are estimated from the pre-processed ROI of the given query image.
Finally, the performance measures are calculated using SVM Classification in order to analyse the effectiveness
of the proposed method towards mammogram retrieval.
Fig. 2.1 Overview of mammogram retrieval using the proposed approach
2.1 GLCM
The texture filter functions provide a statistical view of texture based on the image histogram. These
functions can provide useful information about the texture of an image but cannot provide information about
shape, i.e., the spatial relationships of pixels in an image.
Another statistical method that considers the spatial relationship of pixels is the gray-level co-occurrence
matrix (GLCM), also known as the gray-level spatial dependence matrix. The toolbox provides
functions to create a GLCM and derive statistical measurements from it.
2.1.1 Creating a Gray-Level Co-Occurrence Matrix
To create a GLCM, use the graycomatrix function. The graycomatrix function creates a gray-level co-occurrence
matrix (GLCM) by calculating how often a pixel with the intensity (gray-level) value i occurs in a
specific spatial relationship to a pixel with the value j. By default, the spatial relationship is defined as the pixel
of interest and the pixel to its immediate right (horizontally adjacent), but you can specify other spatial
relationships between the two pixels. Each element (i,j) in the resultant glcm is simply the sum of the number of
times that the pixel with value i occurred in the specified spatial relationship to a pixel with value j in the input
image.The processing required to calculate a GLCM for the full dynamic range of an image is prohibitive,
graycomatrix scales the input image. By default, graycomatrix uses scaling to reduce the number of intensity
values in grayscale image from 256 to eight. The number of gray levels determines the size of the GLCM. To
control the number of gray levels in the GLCM and the scaling of intensity values, using the NumLevels and the
GrayLimits parameters of the graycomatrix function. See the graycomatrix reference page for more information.
The gray-level co-occurrence matrix can reveal certain properties about the spatial distribution of the
gray levels in the texture image. For example, if most of the entries in the GLCM are concentrated along the
diagonal, the texture is coarse with respect to the specified offset. You can also derive several statistical
measures from the GLCM. See Deriving Statistics from a GLCM for more information.To illustrate, the
following figure shows how graycomatrix calculates the first three values in a GLCM. In the output GLCM,
element (1,1) contains the value 1 because there is only one instance in the input image where two horizontally
adjacent pixels have the values 1 and 1, respectively. glcm(1,2) contains the value 2 because there are two
instances where two horizontally adjacent pixels have the values 1 and 2. Element (1,3) in the GLCM has the
6
3. Feature extraction for content-based mammogram retrieval
value 0 because there are no instances of two horizontally adjacent pixels with the values 1 and 3. graycomatrix
continues processing the input image, scanning the image for other pixel pairs (i,j) and recording the sums in the
corresponding elements of the GLCM.
Figure 2.1.1: Process Used to Create the GLCM
2.2 GLAM
An image can be modelled as rectangular constitutions of m x n grids. Furthermore a
neighbourhoodsystemN = {Ns, s ∈S} can be defined. At whichthe neighbourhood Ns is built from the
basicneighbourhoodE at site s. The basic neighbourhood isthereby a chosen structural element [9].Aura Measure:
[9] Given two subsets A, B⊆ S,where |A| is the total number of elements in A. The aurameasure of A with
respect to B is given in (1).
GLAM (Gray Level Aura Matrix): [9] Let N be theneighbourhood system over S and {Si, 0 ≤ i ≤ G - 1}
bethegray level set of an image over S with G asthenumber of different gray levels, then the GLAM of theimage
is given in (2).
WherebySi={s ∈S | xs= i} is the gray level set correspondingto the ith level, and m(Si, Sj, N) is the
aurameasure of Si with respect to Sjwith the neighbourhoodsystemN.
Figure 2.2.1: Process Used to Create the GLAM
Fig 2.2.1(a) A sample binary lattice S, where thesubset A is the set of all 1’s and B the set of all 0’s.Fig
2.2.1(b) The structural element of the neighbourhoodsystem. Fig 2.2.1(c) The shaded sites are the sites who are
involved for building m(S1,S0,N).Fig 2.2.1 (d) Thecorresponding GLAM.The aura of A with respect to B
characterizes howthe subset B is represented in the neighbourhood of A.The GLAM of an image measures the
amount of eachgray level in the neighbourhood of each gray level. Asan example, the GLAM for the image
shown in Figure 2.2.1(a) is shown in Figure 2.2.1(d), which is calculated usingthe structural element of the four-nearest-
7
neighbourneighbourhood system.
4. Feature extraction for content-based mammogram retrieval
8
2.3 Gray level neighbours matrix (GLNM)
The proposed gray level statistical matrix, termed as gray level neighbours matrix (GLNM), which can
extract textural information contains the size information of texture elements and is based on the occurrence of
gray level neighbours within the specified neighbourhood.
The gray level neighbours are the pixels in the specified neighbourhood with similar gray level as the
centre pixel. In the case of a 3×3 neighbourhood, the maximum number of possible gray level neighbours is
eight. The number of rows and columns of the GLNM are equal to the number of gray levels and maximum gray
level neighbours, respectively. If the number of gray levels and the neighbourhood size are larger it may result
in an array of larger dimension, which can be controlled to a considerable extent by reducing the quantization
level of the image. The matrix element (i, j) of the GLNM is the ‘j’ number of neighbours within the given
neighbourhood having the intensity ‘i’, which is defined as,
Where, # denotes the number of elements in the set and Nxy (p, q) is the defined neighbourhood in the
image. The generation of the GLNM matrix shown in Fig. 2 is simple, i.e., the number of operations required to
process an image to obtain the GLNM is directly proportional to the total number of pixels.
Consider Fig.2.3.2 (a), which shows a 6×6 image matrix with eight gray levels ranging from0 to 7.
Figure 2.3.2(b) shows the corresponding GLNM generated. In this case, the row size ofthe GLNM is equal to
number of gray levels in the image matrix, i.e., eight and the columnsize of the GLNM is equal to the maximum
gray level neighbours of the specified neighbourhood, i.e., also eight due to 3×3 neighbourhood considered. For
example, the element in the (1, 2) position (medium shaded) of the GLNM whose value is five, indicates that
two gray level neighbours occur five times for the centre pixel with gray level value zero. Likewise, the element
in the (5, 3) position (light shaded) whose value is four, indicates that three gray level neighbours occurs four
times for the centre pixel with gray level value four. Also, the element in the (8, 2) position (dark shaded) whose
value is three, indicates that two gray level neighbours occurs three times for the centre pixel with gray level
values even.
Fig. 2.3.2 (a) Sample image matrix (b) Gray level neighbours matrix
c) Illustration for generating the value of G (1,2)
The elements in the right columns of the GLNM having zero values are an indication of the absence of
higher number of pixel neighbours for all the gray levels.
2.4 Texture features
The texture coarseness or fineness of the ROI can be interpreted as the distribution of the elements in
the GLNM. If a texture is smooth then a pixel and its neighbours will probably have similar gray levels.This
means that the entries in the GLNM take larger values and concentrate on right most side columns. On the other
hand, if a texture has fine details then the difference between a pixel and its neighbouring pixels will probably
be large. This means that the entries in the GLNM take smaller values and concentrate on left most side
columns.13 texture features [9] that are computed from the GLNM are as follows:
5. Feature extraction for content-based mammogram retrieval
9
Contrast:
When i and j are equal, the diagonal elements are considered and (i-j) =0. These values represent pixels
entirely similar to their neighbor, so they are given aweight of 0.If i and j differ by 1, there is a small contrast,
and the weight is 1.If i and j differ by 2, contrast is increasing and the weight is 4.The weights continue to
increase exponentially as (i-j) increases.
Homogeneity:
This statistic is also called as Inverse Difference Moment. It measures imagehomogeneityas it assumes
larger values for smaller gray tone differences in pairelements. It is more sensitive to the presence of near
diagonal elements in the GLNM.If weights decrease away from the diagonal, the result will be larger
forwindows with little contrast. It has maximum value when all elements in the imageare same.
GLNM contrast and homogeneity are strongly, but inversely, correlatedin terms of equivalent
distribution in the pixel pair’s population. It meanshomogeneity decreases if contrast increases while energy is
kept constant.
Dissimilarity:
In the Contrast measure, weights increase exponentially (0, 1, 4, 9, etc.) as one moves away from the
diagonal. However in the dissimilarity measure weights increase linearly (0, 1, 2,3 etc.).Dissimilarity and
Contrast result in larger numbers for more contrast windows. If weights decrease away from the diagonal, the
result will be larger for windows with little contrast. Homogeneity weights values by the inverse of the Contrast
weight, with weights decreasing exponentially away from the diagonal.
ASM:
ASM and Energy use each Pijas a weight for itself. High values of ASM or Energy occur when the
window is very orderly. The name for ASM comes from Physics, and reflects the similar form of Physics
equations used to calculate the angular second moment; a measure of rotational acceleration .The square root of
the ASM is sometimes used as a texture measure, and is called Energy. This statistic is also called Uniformity. It
measures the textural uniformity that is pixel pair repetitions. It detects disorders in textures. Energy reaches a
maximum value equal to one. High energy values occur when the gray level distribution has a constant or
periodic form.
The square root of the ASM is sometimes used as a texture measure, and is called Energy.
Entropy:
Entropy is a notoriously difficult term to understand; the concept comes from thermodynamics. It refers
to the quantity of energy that is permanently lost to heat ("chaos") every time a reaction or a physical
transformation occurs. Entropy cannot be recovered to do useful work. Because of this, the term is used in
nontechnical speech to mean irremediable chaos or disorder. Also, as with ASM, the equation used to calculate
physical entropy is very similar to the one used for the texture measure. This statistic measures the disorder or
complexity of an image.
6. Feature extraction for content-based mammogram retrieval
10
Difference Entropy:
Sum Entropy:
Sum Average:
Variance:
This means relative to ith pixel and jth pixel respectively. This statistic is a measure of heterogeneity and is
strongly correlated to first order statistical variable such as standard deviation. Variance increases when the gray
level values differ from their mean.
Standard deviation are given by
Variance in texture measures performs the same task as does the common descriptive statistic called
variance.
Difference Variance:
Diff Variance
Correlation:
The Correlation texture measures the linear dependency of grey levels on those of neighbouring pixels.
Correlation can be calculated for successively larger window sizes. The window size at which the Correlation
value declines suddenly may be taken as one definition of the size of definable objects within an image.
G is the number of gray levels used.μx, μy, and are the means and standard deviations of Pxand Py.
Cluster Shade:
Cluster Prominence:
III. RESULTS AND DISCUSSIONS
The digital mammograms available in the MIAS database [25] were used for the experiments. The
database includes 322 mammograms belonging to normal (Norm) and six abnormal classes—architectural
distortion (Arch), asymmetry (Asym), calcification (Calci), circumscribed (Circ) masses, spiculated (Spic)
masses and ill-defined (Ill-def) masses. Each mammogram is of size 1,024×1,024 pixels, and annotated for the
class, severity, centre of abnormality, background tissue character and radius of a circle enclosing the
abnormality. As shown in Fig. 3, the ROIs from abnormal mammograms were extracted. Hence, the abnormal
ROIs are of different sizes. But, in the case of normal mammograms, the ROIs of uniform size 200×200 pixels
were cropped about the centre, which is a new approach that avoids bias in the case of normal mammograms.
Out of 322 ROIs, there are 209 normal, 19 architetural distortions, 15 asymmetry cases, 26 calcification regions,
24 circumscribed masses, 19 spiculated masses and 15 ill-defined masses. In this work, all the 327 ROIs were
involved to create the feature dataset and 110 ROIs comprising one-third from each mammogram class were
selected as queries. The performance analysis of the proposed gray level statistical matrix for texture feature
7. Feature extraction for content-based mammogram retrieval
extraction regarding mammogram retrieval problem is presented in this section. overall Time and Performance
offered by various methods is reported.
11
Method
Time(in terms of seconds) Performance(in terms of error rate)
GLNM
76.22 53%
GLAM
1072.28 63%
GLCM
1200.08 60%
Table 1: Time and Performance rates of proposed method and competing methods
Fig 3.Graph for Time and Performance Analysis
IV. CONCLUSION
The paper reports the retrieval performance of the GLCM, GLAM and GLNM by applying on MIAS
database. The capability of the methods in extracting texture features is demonstrated. These retrieval
approaches may help the physicians to effectively search for relevant mammograms during their diagnosis.
From the results, least computational time is observed for GLNM and comparatively better classification rate is
observed in GLAM. Developing more efficient feature estimation method is our future endeavour.
.
REFERENCES
[1]. Chang HD, Shi XJ, Min R, Hu LM, Cai XP, Du HN (2006) Approaches for automated detection and
classification of masses in mammograms. Pattern Recognit 39:646–668
[2]. Chen CH, Pau LF, Wang PSP (eds) (1998) The handbook of pattern recognition and computer vision,
(2nd edn). World Scientific Publishing pp 207–248
[3]. Choraś RS (2008) Feature extraction for classification and retrieval mammogram in databases. Int J
Med Eng Inf 1(1):50–61
[4]. Do MN, Vetterli M (2002) Wavelet-based texture retrieval using generalized gaussian density and
Kullback–Leibler distance. IEEE Tans Image Proc 11(2):146–158
[5]. Eisa M, Refaat M, El-Gamal AF (2009) Preliminary diagnostics of mammograms using moments and
texture features. ICGST-GVIP J 9(5):21–27
[6]. El-Naqa I, Yang Y, Galatsanos NP, Nishikawa RM,Wernick MN (2004) A similarity learning approach
to content-based image retrieval: application to digital mammography. IEEE Trans Med Imaging
23(10):1233–1244
[7]. Felipe JC, Traina AJM, Ribeiro MX, Souza EPM, Junior CT (2006) Effective shape-based retrieval
and classification of mammograms. In: Proceedings of the Twenty First Annual ACM symposium on
Applied Computing. pp 250–255
8. Feature extraction for content-based mammogram retrieval
[8]. Greenspan H, Pinhas AT (2007) Medical image categorization and retrieval for PACS using the GMM-KL
framework. IEEE Trans Inf Technol Biomed 11:190–202
[9]. Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans
12
Syst Man Cybern 3(6):610–621
[10]. Khotanzad A, Hong YH (1990) Invariant image recognition by zernike moments. IEEE Trans Pattern
Anal Mach Intell 12(5):489–497
[11]. Korn P, Sidiropoulos N, Faloutsos C, Siegel E, Protopapas Z (1998) Fast and effective retrieval of
medical tumor shapes. IEEE Trans Knowl Data Eng 10(6):889–904
[12]. Kwitt R, Meerwald P, Uhl A (2011) Efficient texture image retrieval using copulas in a Bayesian
framework. IEEE Trans Image Process 20(7):2063–2077
[13]. Lamard M, Cazuguel G, Quellec G, Bekri L, Roux C, Cochener B (2007) Content-based image
retrieval based on wavelet transform coefficients distribution. In: Proceedings of the Twenty Ninth
Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE
Press, Lyon, France, pp 4532–4535
[14]. Lu S, Bottema MJ (2003). Structural image texture and early detection of breast cancer. In:
Proceedings of the 2003 APRS Workshop on Digital Image Computing. pp 15–20
[15]. Manjunath BS, Ma WY (1996) Texture features for browsing and retrieval of image data. IEEE Trans
Pattern Anal Mach Intell 18(8):837–842
[16]. Mudigonda NR, Rangayyan RM, Leo Desautels JE (2000) Gradient and texture analysis for the
classification of mammographic masses. IEEE Trans Med Imaging 19(10):1032–1043
[17]. Muller H, Michoux N, Bandon D, Geissbuhler A (2004) A review of content-based image retrieval
systems in medical applications—clinical benefits and future directions. Int J Med Inform 73:1–23
[18]. Muller H, Muller W, Squire DM, Marchand-Maillet S, Pun T (2005) Performance evaluation in
contentbased image retrieval: overview and proposals. Pattern Recognit Lett 22(5):593–601
[19]. Pandey D, Kumar R (2011) Inter space local binary patterns for image indexing and retrieval. J Theor
Appl Inf Technol 32(2):160–168
[20]. Qin X, Yang Y (2004) Similarity measure and learning with Gray Level Aura Matrices (GLAM) for
texture image retrieval. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit Washington DC
USA 1:326–333
[21]. Quellec G, Lamard M, Cazuguel G, Cochener B, Roux C (2010) Wavelet optimization for content-based
image retrieval in medical databases. Med Image Anal 14:227–241
[22]. Schnorrenberg F, Pattichis CS, Schizas CN, Kyriacou K (2000) Content-based retrieval of breast
cancer biopsy slides. Technol Health Care 8:291–297
[23]. Smeulders AVM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the
end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
[24]. Srinivasan GN, Shobha G (2008) Statistical texture analysis. ProcWorld Acad Sci Eng Technol
36:1264–1269
[25]. Suckling J, Parker J, Dance DR, Astley SM, Hutt I, Boggis CRM, Ricketts I, Stamatakis E, Cerneaz N,
Kok SL, Taylor P, Betal D, Savage J (1994) Mammographic image analysis society digital
mammogram database. Proceedings of International Workshop on Digital Mammography pp 211–221
[26]. Sun J, Zhang Z (2008) An effective method for mammograph image retrieval. In: Proceedings of
International Conference on Computational Intelligence and Security. pp 190–193
[27]. Tourassi GD (1999) Journey toward computer-aided diagnosis: role of image texture analysis.
Radiology 213:317–320
[28]. Tourassi G, Harrawood B, Singh S, Lo J, Floyd C (2007) Evaluation of information theoretic similarity
measure for content-based retrieval and detection of masses in mammograms. Med Phys 34:140–150
[29]. Wei CH, Li CT, Wilson R (2005) A general framework for content-based medical image retrieval with
its application to mammogram retrieval. Proc SPIE Int Symp Med Imaging 5748:134–143
[30]. Wei CH, Li CT, Wilson R (2006) A content-based approach to medical image database retrieval. In:
Ma ZM (ed) Database modeling for industrial data management: emerging technologies and
applications. Idea Group Publishing, Hershey, pp 258–291
[31]. Wiesmuller S, Chandy DA (2010) Content-based mammogram retrieval using gray level aura matrix.
Int J Comput Commun Inf Syst (IJCCIS) 2(1):217–222
[32]. D. Abraham Chandy . J. Stanly Johnson .S. Easter Selvan (2013) Texture feature extraction using gray
level statistical matrix for content based mammogram retrieval.Springer Science + Business Media
New York.
9. Feature extraction for content-based mammogram retrieval
Dr.K.Karteeka Pavan completed Ph.D(CSE) from ANU. She is presently working as
a Professor. in RVR&JC College of Engineering, chowdavaram, Guntur-19, India.
She is having about 15 years of teaching experience and also published papers in Bio
Informatics in addition to associate member of CSI and Life member of ISTE. E-Mail
id: kkp@rvrjcce.ac.in
Sri. Madamanchi Brahmaiah completed M.Tech from ANU. He is presently working
as an Asst. Prof. in RVR&JC College of Engineering, chowdavaram, Guntur-19, India.
He is having about 4 years of teaching experience and also worked almost 14 years as
programmer in addition to associate member of CSI and member of ISTE, Member in
IAENG. E-Mail id: brahmaiah_m@yahoo.com
Ms. SK.Habi Munnissa studying final year M.C.A. from RVR&JC College of
Engineering, chowdavaram, Guntur-19, Affiliated to ANU. She associate member of
CSI E-Mail id: habi.hr43@gmail.com
13