Automation of Historical Document Recognition

Automation of Preprocessing and Recognition of
Historical Document Images
A Thesis submitted to
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
Belgaum
for the award of degree of
Doctor of Philosophy in
Computer Science & Engineering
by
B Gangamma
Department of Computer Science & Engineering,
P E S Institute of Technology - Bangalore South Campus
(formerly P E S School of Engineering), Bangalore, Karnataka, India.
2013

(formerly P E S School of Engineering),
Bangalore, Karnataka, India.
CERTIFICATE
This is to certify that B Gangamma has worked under my supervision
for her doctoral thesis titled “Automation of Preprocessing and
Recognition of Historical Document Images”. I also certify that
the work is original and has not been submitted to any other University
wholly or in part for any other degree.
Dr. Srikanta Murthy K
Professor & Head,
Bangalore, Karnataka, India.
i

Bangalore, Karnataka, India
DECLARATION
I hereby declare that the entire work embodied in this doctoral thesis
has been carried out by me at Research Centre, Department of Com-
puter Science & Engineering, P E S Institute of Technology - Bangalore
South Campus(formerly P E S School of Engineering) under the super-
vision of Dr. Srikanta Murthy K. This thesis has not been submitted
in part or full for the award of any diploma or degree of this or any
other University.
B Gangamma
Research scholar
Department of Computer Science & Engineering
P E S Institute of Technology - Bangalore South Campus,
(formerly P E S School of Engineering), Bangalore.
ii

Acknowledgements
Any accomplishment requires the eﬀorts of many people and
this work is not an exception. I will be failing in my duty if I
do not express my gratitude to those who have helped in my
endeavor.
With deep gratitude and reverence, I would like to express
my sincere thanks to my research supervisor Dr. Srikanta
Murthy K, Professor & Head, Department of Computer Sci-
ence & Engineering, P E S Institute of Technology - Banga-
lore South Campus, Bangalore, for his constant and untiring
eﬀorts to guide right through the research work. His tremen-
dous enthusiasm, inspiration, and constant support through-
out my research work have encouraged me to complete this
dissertation work. His wide knowledge and logical way of
thinking, detailed and constructive comments have provided
a good basis for the research work and thesis. I would like
to thank Dr. J Suryaprasad, Principal & Director, P E S In-
stitute of Technology - Bangalore South Campus, Bangalore,
for his constant support.
I owe special thanks and sincere gratitude to Mrs. Shylaja
S S, Professor & Head, Department of Information Science &

Engineering, P E S Institute of Technology, Bangalore for mo-
tivating, encouraging and providing necessary support to the
complete research and thesis work. I am also thankful to Dr.
S Natarajan, Professor, Department of Information Science
and Engineering, P E S Institute of Technology, Bangalore,
for providing proper directions to my research work. I wish
to express my warm and sincere thanks to Dr. K. N. Bala-
subramanya Murthy, Principal & Director, P E S Institute of
Technology, Bangalore for inspiring me to take up research
and work towards a doctoral degree. I would like to express
sincere thanks to P E S management for providing motivation
and a platform to carry out the research.
I thank whole heartedly Mr. Jayasimha, Mythic Society of
India, Bangalore, for providing me the scanned copies of the
palm leaf manuscripts. My warm thanks are due to Mr. M
P Shelva Thirunarayana, R Narayana Iyangar, Academy of
Sanskrit Research Center, Melukote and Sri. S N Cheluva-
narayana, Principal, Sanskrit College, Melukote, Karnataka,
for providing knowledge about the historical documents along
with sample manuscripts of paper and palm leaf.
I need to put my sincere eﬀort in thanking Dr. Veeresh
Badiger, Professor, Kannada University Hampi, Karnataka,
for providing information about resources and guiding my re-
search work. Further I would like to extend special thanks to
him for providing digitized samples of palm leaf manuscripts.
iv

I would like to thank Dr. G. Hemantha Kumar, Professor
& Chairman, Department of Studies in Computer Science,
University of Mysore, for his valuable suggestions and direc-
tions during pre Ph.D. viva voce. I whole heartedly thank
Dr. M Ashwath Kumar, Professor, Department of Infor-
mation Science & Engineering, M S R Institute of Technol-
ogy, Bangalore, for his valuable directions given during pre
Ph.D. viva-voce. I warmly thank Dr. Bhanumathi, Reader
at Manasa Gangothri, Mysore, for providing useful informa-
tion about palm leaf manuscripts. Detailed discussion about
manuscripts and interesting explorations with her has been
very helpful for my work.
I wish to thank Dr. Suryakantha Gangashetty, Assistant
Professor, IIIT Hyderabad, for his suggestions, and Dhanan-
jaya, Archana Ramesh, Dilip, research scholars at IIIT Hy-
derabad, for their valuable discussions. I am grateful to Dr.
Basavaraj Anami, Principal, K L E Institute of Technology,
Hubli, for his guidance and wonderful interactions, which
helped me in shaping my research work properly.
I would like to express my heartfelt thanks to Dr. Punitha
P Swamy, Professor & Head, Department of Master of Com-
puter Application, P E S Institute of Technology, Bangalore,
for her detailed review, constructive criticism and excellent
advice throughout my research work and also during prepa-
ration of the thesis. My sincere thanks to Dr. Avinash N.
v

Professor, Department of Information Science & Engineering,
P E S Institute of Technology, Bangalore, for his valuable dis-
cussions during thesis write up.
I owe my most sincere thanks to my brother-in-law Dr.
Mallikarjun Holi, Professor & Head, Department of Bio-medical
Engineering, Bapuji Institute of Engineering & Technology,
Davanagere, for reviewing my thesis and giving valuable sug-
gestions.
I owe my loving thanks to my husband Suresh Holi and
my children Anish and Trisha, who have extended constant
support in completing my work. Without their encourage-
ment and understanding it would have been impossible for
me to ﬁnish this work. I express deepest sense of gratitude
to my father-in-law Prof. S. M. Holi, who has motivated me
towards research. His inspiring and encouraging nature has
stimulated me to take up research. I would like to express
my heartfelt thanks to my mother-in-law, Mrs. Rudramma
Holi for her loving support. I also extend my sincere thanks
to my sister-in-laws Dr. Prema S Badami, Mrs. Shivaleela S
Patil, Sharanu Holi, brother-in-law Mr. Sanganna Holi and
their families for giving me moral support.
I express my heartfelt thanks to my parents Mr. Somaraya
Biradar and Mrs. Shivalingamma Biradar for encouraging
and helping me in my activities. I would like to place my grat-
vi

itude to my sisters Mrs. Nirmala Marali, Suvarna Patil and
brothers Manjunath Biradar and Vishwanath Biradar along
with their family for providing moral support during my re-
search work.
During this work, I have collaborated with many colleagues
for whom I have great regard, and wish to extend my warmest
thanks to all faculty colleagues, Department of Information
Science and Engineering in P E S Institute of Technology,
Bangalore. I wish to thank my team mates Mr. Arun Vikas,
Jayashree, Mamatha H R, Karthik S and friends Sangeetha
J, Suvarna Nandyal, Srikanth H R for their support. Lastly,
and most importantly, I am indebted to my faculty colleagues
for providing a stimulating and healthy environment to learn
and grow. It is a pleasure to thank many people who have
helped me directly or indirectly and who made this thesis
possible. I also place my sincere gratitude to external review-
ers for providing critical comments which signiﬁcantly helped
in improving the standard of the thesis. I take this oppor-
tunity to thank VTU e-learning center for having given me
an opportunity to present the template used to prepare my
doctoral thesis using Latex.
B Gangamma
vii

DEDICATED TO MY FAMILY,
MENTORS AND WELL
WISHERS

Abstract
Historical documents are the priceless property of any country and they
provide insight and information about, ancient culture and civilization.
These documents are found in the form of inscriptions on variety of
hard and fragile materials like stone, pillar, rocks, metal plates, palm
leaves, birch leaves, clothes, and papers. Most of these documents are
nearing the end of their natural lifetime and are posed with various
problems due to climatic condition, method of preservation, materials
used to inscribe etc. Some of the problems are due to the worn out
conditions of the material such as brittleness, strained and stained,
sludge and smudge, fossil deposition, fungus attack, dust accumulation,
wear and tear of the material, broken, damaged etc. These damages
create problems in processing the historical documents and make the
inscriptions illegible for reading and make the historical documents
indecipherable.
Although preservation through digitization is in progress by various
organizations, deciphering the documents is very diﬃcult and demands
the expertise of Paleographers and Epigraphists. Since such experts are
less in number and could become extinct in the near future, there is a
need to automate the process of deciphering these document images.
The problems and complexities posed by these documents have led
to the design of a robust system which automates the processing and
deciphering of these document images, and hence demands thorough
preprocessing algorithms to enhance these.

The accuracy of the recognition system always depends on the seg-
mented characters and its extracted features. Historical document im-
ages usually pose uneven line space, inscriptions over curved lines, over-
lapping text lines etc., making segmentation of the document diﬃcult.
In addition, the documents also pose challenges like low contrast; dark
and uneven background, blotched (stained) characters etc, usually re-
ferred to as noise. Presence of noise also leads to erroneous segmen-
tation of the document image. Therefore there is a need for thorough
preprocessing techniques to eliminate the noise and enhance the doc-
ument image. To decipher the documents belonging to various era,
we need a character set pertaining to that era. Hence this warrants a
recognition system to recognize the era of the character.
In this context, this research work focuses on developing algorithms:
to preprocess and enhance historical document images of Kannada -
a South Indian language, to eliminate noise, to segment the enhanced
document image into lines and characters and to predict the era of the
scripts.
To preprocess the noisy document images, three image enhancement
algorithms in spatial domain and two algorithms in frequency domain
are proposed. Out of these spatial domain methods, the ﬁrst method
utilizes the morphological reconstruction technique to eliminate the
dark uneven noisy background. This algorithm is used as background
elimination technique in the other four algorithms proposed for image
enhancement. Although, the gray scale morphological operations elim-
inate noisy dark background, this method fails to enhance, severely
degraded document image and is unable to preserve the sharp edges.
ii

To enhance the image by eliminating the noise without smoothing the
edges, a second algorithm is developed using bilateral filter, which com-
bines domain and range filtering. The third algorithm is a non local
means filter algorithm based on similarity measure between non local
windows and it is proposed to denoise the document images.
Frequency domain based transforms and its varied versions are used
in image denoising, feature extraction, compression and reconstruction.
An algorithm based on wavelet transform is developed to analyze and
restore the degraded document images. However wavelet transform
works well in handling the point discontinuity, but fails to handle curve
discontinuity. To overcome the problem of handling curve discontinuity,
curvelet transform based approach is proposed, which provides better
results in comparison with the wavelet transformed approach. The
performances of all the image enhancement techniques are compared
using Peak Signal Noise Ratio (PSNR), computational time and human
visual perception.
Two segmentation algorithms have been developed to address the
problem of segmenting historical document image, one is based on
piecewise projection profile method and the other is based on mor-
phological closing and connected component analysis (CCA). The first
method addresses the uneven line spacing by dividing the image into
vertical pieces, extracting each line from each piece and combining lines
of all the vertical pieces. The second method addresses the problems of
both uneven spacing and the touching (overlapping) lines using closing
operation and CCA.
iii

Document skew might be introduced during image capture and needs
to be deskewed. Since the historical documents usually contain uneven
spacing between lines, correcting document skew will not help in seg-
menting the handwritten document image correctly. Uneven line spac-
ing will usually cause multiple skews within the document. To correct
the skew within the document lines, an extended version of the second
segmentation algorithm is developed.
To predict the era of the script/character, curvelet transform based
algorithm is designed to extract the characteristic features and mini-
mum distance classiﬁer is employed to recognize the era of the charac-
ters. To sum up, in this research work: three spatial domain techniques,
two frequency domain based approaches have been implemented for
denoising and enhancing the degraded historical document images and
two segmentation algorithms have been designed to segment the lines
and characters from the document images, one algorithm is designed to
detect and correct the multiple skews within the document and another
algorithm is presented to predict the era of the segmented character so
that the respective character set belonging to that particular era can
be referred in order to decipher the documents.
iv

Contents
1 Preface 1
1.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Historical Documents . . . . . . . . . . . . . . . . . . . 3
1.2.1 Kannada Scripts/Character . . . . . . . . . . . 6
1.3 Motivation for the Research Work . . . . . . . . . . . . 7
1.3.1 Data Collection . . . . . . . . . . . . . . . . . . 7
1.3.2 Enhancement/Preprocessing . . . . . . . . . . . 10
1.3.3 Segmentation . . . . . . . . . . . . . . . . . . . 12
1.3.4 Feature Extraction and Recognition . . . . . . . 13
1.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Organization of the Thesis . . . . . . . . . . . . . . . . 16
2 Literature Survey 17
2.1 Computer Vision . . . . . . . . . . . . . . . . . . . . . 17
2.2 Preprocessing and Segmentation . . . . . . . . . . . . . 18
2.2.1 Enhancement of Historical Document Image . . 24
2.2.2 Segmentation of Historical Documents . . . . . 26
2.3 Character Recognition . . . . . . . . . . . . . . . . . . 28
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 34
i

3 Enhancement of Degraded Historical Documents : Spa-
tial Domain Techniques 35
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Gray Scale Morphological Reconstruction (MR) Based
Approach . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.1 Overview of Mathematical Morphology . . . . . 38
3.2.2 Adaptive Histogram Equalization(AHE) . . . . 42
3.2.3 Gaussian Filter . . . . . . . . . . . . . . . . . . 42
3.2.4 Proposed Methodology . . . . . . . . . . . . . . 43
3.2.5 Results and Discussion . . . . . . . . . . . . . . 48
3.3 Bilateral Filter (BF) Based Approach . . . . . . . . . . 54
3.3.1 Overview of Bilateral Filter . . . . . . . . . . . 55
3.4 Non Local Means Filter (NLMF) Based Approach . . . 66
3.4.1 Overview of Non Local Means Filter . . . . . . 67
3.4.2 Proposed Algorithm . . . . . . . . . . . . . . . 68
3.5 Discussion of Three Spatial Domain Techniques . . . . 77
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 82
4 Enhancement of Degraded Historical Documents : Fre-
quency Domain Techniques 84
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 84
4.2 Wavelet Transform (WT) Based Approach . . . . . . . 85
4.2.1 Overview of Wavelet Transform . . . . . . . . . 86
4.2.2 Denoising Method . . . . . . . . . . . . . . . . 88
4.2.2.1 Thresholding Algorithms . . . . . . . . 88
ii

4.2.3.1 Stage 1: Mathematical Reconstruction 92
4.2.3.2 Stage 2: Denoising by Wavelet Transform 93
4.2.3.3 Stage 3: Postprocessing . . . . . . . . 94
4.2.3.4 Algorithm . . . . . . . . . . . . . . . . 94
4.2.4 Results and Discussions . . . . . . . . . . . . . 94
4.3 Curvelet Transform (CT) Based Approach . . . . . . . 98
4.3.1 Overview of Curvelet Transform . . . . . . . . . 100
4.3.2 Proposed Method . . . . . . . . . . . . . . . . 104
4.3.2.1 Denoising Using Curvelet Transform . 104
4.3.2.2 Algorithm . . . . . . . . . . . . . . . . 104
4.3.3 Results and Discussions . . . . . . . . . . . . . 106
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.5 Discussion on Enhancement Algorithms . . . . . . . . . 108
5 Segmentation of Document Images 116
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 116
5.2 Proposed Methodologies . . . . . . . . . . . . . . . . . 117
5.3 Method 1: Piece-wise Horizontal Projection Proﬁle Based
Approach . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.3.1 Division into Vertical Strips . . . . . . . . . . . 120
5.3.2 Horizontal Projection Proﬁle of a Strip . . . . . 120
5.3.3 Reconstruction of the Line Using Vertical Strips 120
5.3.4 Character Extraction . . . . . . . . . . . . . . . 122
5.3.5 Algorithm for Document Image Segmentation. . 122
5.4 Method 2: Mathematical Morphology and Connected
Component Analysis(CCA) Based Approach . . . . . . 126
iii

5.4.1 Morphological Closing Operation . . . . . . . . 128
5.4.2 Line Extraction Using Connected Components
Analysis . . . . . . . . . . . . . . . . . . . . . . 129
5.4.3 Finding the Height of Each Line and Checking
the Touching Lines. . . . . . . . . . . . . . . . . 130
5.4.4 Character Extraction . . . . . . . . . . . . . . . 130
5.4.5 Algorithm for Segmentation of the Document Im-
age into Lines. . . . . . . . . . . . . . . . . . . . 131
5.5 Discussion on Method 1 and Method 2 . . . . . . . . . 133
5.6 Skew Detection and Correction Algorithm . . . . . . . 135
5.6.1 Skew Angle Detection . . . . . . . . . . . . . . 137
5.6.2 Skew Correction . . . . . . . . . . . . . . . . . . 138
5.6.3 Algorithm for Deskewing . . . . . . . . . . . . . 140
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 144
6 Prediction of Era of Character Using Curvelet Trans-
form Based Approach 146
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 146
6.2 Related Literature . . . . . . . . . . . . . . . . . . . . 147
6.3 Proposed Method . . . . . . . . . . . . . . . . . . . . . 151
6.3.1 Data Set Creation . . . . . . . . . . . . . . . . 152
6.3.2 Preprocessing . . . . . . . . . . . . . . . . . . . 152
6.3.3 Feature Extraction using FDCT . . . . . . . . . 153
6.3.4 Classiﬁcation . . . . . . . . . . . . . . . . . . . 153
6.3.5 Algorithm for Era Prediction . . . . . . . . . . 153
6.4 Experimentation and Results . . . . . . . . . . . . . . 154
iv

6.4.1 Experimentation 1 . . . . . . . . . . . . . . . . 154
6.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . 157
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 159
7 Conclusion and Future Work 160
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . 164
A Palm Leaf Images 167
B Paper Images 170
C Stone Inscription Images 174
D Author’s Publications 178
v

List of Figures
1.1 6th
Century Ganga Dynasty Inscription. . . . . . . . . 4
1.2 13th
Century Hoysala Dynasty Inscription. . . . . . . . 5
1.3 Inscriptions on palm leaf belonging to 16th
− 18th
century. 6
1.4 Stone inscription belonging to 3rd
century BC. . . . . . 7
3.1 (a) Input image. (b) Result of binary morphological
dilation operation. (c) Result of binary morphological
erosion operation. . . . . . . . . . . . . . . . . . . . . . 39
3.2 (a) Input image. (b) Result of binary morphological
opening operation. (c) Result of binary morphological
closing operation. . . . . . . . . . . . . . . . . . . . . . 40
3.3 (a) Original Gray scale image. (b) Result of gray scale
dilate operation on image. (c) Result of gray scale ero-
sion operation on image. . . . . . . . . . . . . . . . . . 41
3.4 (a) Original Gray scale image. (b) Result of gray scale
closing operation on image. (c) Result of gray scale
opening operation on image. . . . . . . . . . . . . . . . 41
3.5 Noisy palm leaf document image belonging to 16th
century. 43
3.6 Binarized noisy images of Figure(3.5). . . . . . . . . . . 43
3.7 Original image of palm leaf script belonging to 16th
century. 44
3.8 Binarized noisy image of Figure(3.7). . . . . . . . . . . 44
vi

3.9 Flow chart for MR based method. . . . . . . . . . . . . 45
3.10 AHE result on images shown in Figure(3.5) and Figure(3.7) 46
3.11 Result of stage 2. (a), (b) are results of opening opera-
tion on images shown in Figure(3.10)(a), (b). and (c),
(d) are results of reconstruction technique. . . . . . . . 47
3.12 Result of stage 3. (a), (b) Results of closing operation
on stage 2 output images shown in Figure(3.11)(a), (b).
(c), (d) Subtraction of R1 from R4. (e), (f) Subtraction
of result of previous step from R2. . . . . . . . . . . . . 47
3.13 (a), (b) Results of Gaussian ﬁlter on images shown in
Figure(3.12((e), (f). . . . . . . . . . . . . . . . . . . . . 48
3.14 Morphological reconstruction technique on images shown
in Figure(3.13)(a), (b). . . . . . . . . . . . . . . . . . . 48
3.15 Binarized images of Figure(3.14)(a),(b). . . . . . . . . 49
3.16 (a), (b), (c), (d) Results of MR based method paper im-
ages shown in Appendix 1 Figure(B.1), Figure(B.2), Fig-
ure(B.3) and Figure(B.4) belonging to nineteenth and
beginning of twentieth century. . . . . . . . . . . . . . 51
3.17 (a), (b) Results of MR based method on image of palm
leaf shown in Appendix 1 Figure(A.1) and (A.3) belong-
ing to 16th
to 18th
century . . . . . . . . . . . . . . . . 52
3.18 Result of MR based method on sample image taken from
Belur temple inscriptions Figure(C.2) belonging to 17th
century AD. . . . . . . . . . . . . . . . . . . . . . . . . 53
3.19 (a), (b) Result of MR based method on stone inscriptions
shown in Appendix 1 Figure(C.1), Figures(C.3) belong-
ing to 14 − 17th
century. . . . . . . . . . . . . . . . . . 53
vii

3.20 Comparison of proposed method with Gaussian, Aver-
age and Median ﬁlter. Figures (a), (b), (c), (d) show the
result of respective methods and ﬁgures (e), (f), (g), (h)
show the binarized images of (a), (b), (c), (d). . . . . . 54
3.21 Flow chart for BF based method. . . . . . . . . . . . . 57
3.22 (a) Input image of the palm leaf manuscript belonging
to 18th
century. (b) Its binarized version. . . . . . . . . 58
3.23 (a) Filtered image using BF method. (b) Final result
of the BF method. (c) Binarized version of enhanced
image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.24 (a), (b),(c),(d) Results of BF based method on input
paper images in Figure(B.1), Figure(B.2), Figure(B.3)
and Figure(B.4) respectively. . . . . . . . . . . . . . . . 62
3.25 (a), (b) Results of BF based method Figure(A.4 and
Figure(A.5. . . . . . . . . . . . . . . . . . . . . . . . . 63
3.26 (a) Input image of palm leaf manuscript. (b) Result of
MR based method. (b) Enhanced image using BF based
method. . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.27 (a) (b) are results of BF based method on input image
in Figure(A.2) and Figure(3.7). . . . . . . . . . . . . . 64
3.28 Result of BF based method on image Figure(A.6) . . . 64
3.29 (a), (b) Results of BF based method on image in Fig-
ure(C.1) and Figure(C.3). . . . . . . . . . . . . . . . . 65
3.30 Result of BF based method on Figure(C.2) Belur temple
inscriptions belonging to 17th
century AD. . . . . . . . 65
viii

3.31 Non Local Mean Filter Approach. Small patch of size
2p + 1 by 2p + 1 centred at x is the candidate pixel, y
and y′
are the non local patch within search window size
2k + 1 by 2k + 1. . . . . . . . . . . . . . . . . . . . . . 66
3.32 Input palm script image with low contrast. . . . . . . . 68
3.33 Result of NLMF method with residual image on Fig-
ure(3.32). . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.34 (a) Result of NLMF based method on image shown in
Figure(3.32). (b) Binarized image. . . . . . . . . . . . . 70
3.35 Flow chart for NLMF based method. . . . . . . . . . . 71
3.36 (a) Original image. (b) Filtered image using NLMF. (c)
Binarized image of the proposed NLMF method. (d)
Binarized noisy image using Otsu method. . . . . . . . 72
3.37 Results of NLMF based method on input images in Ap-
pendix 1 Figure(B.1), Figure(B.2), Figure(B.3) and Fig-
ure(B.4) . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.38 (a) Result of MR based method, (b) enhanced image of
using BF based method, and (c) result of NLMF based
method on input image shown in Figure(3.26). . . . . . 76
3.39 (a) and (b) Results of NLMF based method on input
images shown in Figure(A.2) and Figure(A.1). . . . . . 76
3.40 Result of NLMF based method on input image in Fig-
ure(A.6). . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.41 Results of NLMF nased method on images Figure (C.1
and Figure(C.3). . . . . . . . . . . . . . . . . . . . . . 77
3.42 (a), (b) Results of NLMF based method on images shown
in Figure(C.2) and Figure(C.4). . . . . . . . . . . . . 78
ix

4.1 Comparison of all thresholding methods . . . . . . . . 92
4.2 (a) Paper manuscript image-3 of previous century. (b)
Enhanced image using WT based approach. . . . . . . 95
4.3 Enhanced images using WT based approach on (a) Pa-
per manuscript image of shown in Appendix 1 (a) Fig-
ure(B.2) and (b) Figure(B.3 . . . . . . . . . . . . . . . 96
4.4 (a) Palm leaf manuscript image belonging to 16th
- 18th
century. (b) Enhanced image using WT based approach. 96
cen-
tury. (b) Enhanced image using WT based approach. . 97
cen-
cen-
4.8 (a) Stone inscription image belonging to seventeenth cen-
tury. (b) Result of WT based approach. . . . . . . . . 100
4.9 (a) and (c) Stone inscription images belonging to 14th
-
17th
century. (b) and (d) Results of WT based approach. 101
4.10 Result of WT based approach on stone inscription be-
longing to seventeenth century shown in Appendix 1 Fig-
ure (C.2). . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.11 (a)Wrapping data, initially inside a parallelogram, into
a rectangle by periodicity(Figures reproduced from pa-
per [172]). The shaded region represents trapezoidal
wedge.(b) Discrete curvelet frequency tiling. . . . . . . 102
4.12 (a), (c) and (e) Input images paper, palm leaf and stone.
(b), (d) and (f) Result of CT based approach. . . . . . 103
x

4.13 (a)-(b) Input images. (c)-(d) Results of first and second
stage of curvelet based approach. (e)-(f) Result of last
stage(image 15-49). . . . . . . . . . . . . . . . . . . . . 105
4.14 (a) Palm leaf manuscript image belonging in between
16th
to 18th
century. (b) Enhanced image using WT
based approach. (c) Result of CT based approach. . . . 106
4.15 (a) Input image of palm script. (b) Result of WT based
method. (c) Result of CT method. . . . . . . . . . . . 107
4.16 (a) Input image of palm script. (b) Result of WT based
method. (c) Result of CT method. . . . . . . . . . . . 107
4.17 (a) Result of WT based approach, (b) result of CT based
approach on image shown in Figure(4.8)(a). . . . . . . 108
4.18 Results of WT based method shown in (a), (c) and result
of CT based method shown in (b)-(d) for stone inscrip-
tion images shown in Figure(4.9)(a) and (c). . . . . . . 109
5.1 (a) Handwritten Kannada document image. (b) Hori-
zontal projection profile of handwritten document image. 118
5.2 Handwritten Kannada document image. . . . . . . . . 119
5.3 Horizontal projection profile of the input image Fig-
ure(5.2). . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.4 Non-Zero Rows (NZRs) and rows labelled NZR1 and
NZR2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.5 Horizontal projection profile of a strip. . . . . . . . . . 121
5.6 Extracted text lines. . . . . . . . . . . . . . . . . . . . 122
5.7 Character extraction from line. . . . . . . . . . . . . . 123
5.8 (a), (c), (e) are the extracted lines and (b),(d),(f) are
showing extracted characters from lines(a), (c), (e). . . 123
xi

5.9 Input handwritten image and extracted Lines. . . . . . 124
5.10 Extracted characters. . . . . . . . . . . . . . . . . . . . 124
5.11 Input image with uneven spacing between lines . . . . 126
5.12 Result of method 1 on the image shown in Figure(5.11). 126
5.13 Result of closing operation. . . . . . . . . . . . . . . . 127
5.14 Extracted text lines. . . . . . . . . . . . . . . . . . . . 127
5.15 (a) Line and extracted characters from line (a). . . . . 128
5.16 Input image. . . . . . . . . . . . . . . . . . . . . . . . . 128
5.18 Result of extraction of connected components(lines). . . 131
5.19 Result of binarization operation. . . . . . . . . . . . . . 133
5.21 Result of extraction of connected components and cor-
responding lines. . . . . . . . . . . . . . . . . . . . . . 134
5.22 (a) Touching line portion. (b) Result of closing and
opening operation. . . . . . . . . . . . . . . . . . . . . 135
5.23 Extraction of lines. . . . . . . . . . . . . . . . . . . . . 135
5.24 Input skewed image. . . . . . . . . . . . . . . . . . . . 137
5.25 Horizontal projection proﬁle of the input image(5.24). . 138
5.27 Skew angle calculation from single connected component. 139
5.28 Result of deskewing. . . . . . . . . . . . . . . . . . . . 141
5.29 Reconstructed image of Figure(5.24). . . . . . . . . . . 142
5.30 (a) Input Image. (b) Deskewed image. . . . . . . 143
5.31 Input skewed image. . . . . . . . . . . . . . . . . . . . 143
5.32 Deskewed image. . . . . . . . . . . . . . . . . . . . . . 144
6.1 Sample epigraphical characters belonging to diﬀerent era. 148
xii

6.2 Prediction Rate for Gabor, Zernike and proposed method.157
A.1 Original image of palm leaf script of 18th
century. . . . 167
A.2 Input images of palm leaf document belonging to 17th
century. . . . . . . . . . . . . . . . . . . . . . . . . . . 168
A.3 Palm leaf image belonging to 18th
century. noisy input
image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
A.4 Input image of palm leaf document belonging to 17th
century. . . . . . . . . . . . . . . . . . . . . . . . . . . 168
A.5 Input image of palm leaf document belonging to 17th
century. . . . . . . . . . . . . . . . . . . . . . . . . . . 169
A.6 Input images of palm leaf document belonging to 17th
century. . . . . . . . . . . . . . . . . . . . . . . . . . . 169
B.1 Sample paper image belonging to previous century. . . 170
B.2 Original paper image -1 belonging to nineteenth and be-
ginning of twentieth century. . . . . . . . . . . . . . . . 171
B.3 Original paper image -2 belonging to nineteenth and be-
ginning of twentieth century. . . . . . . . . . . . . . . . 172
B.4 Original paper image-3 belonging to nineteenth and be-
ginning of twentieth century. . . . . . . . . . . . . . . 173
C.1 Stone inscription image belonging to 14 − 17th
century. 174
C.2 Digitized image of Belur temple inscription belonging to
17th
century AD. . . . . . . . . . . . . . . . . . . . . . 175
C.3 Digitized image of Belur temple inscriptions belonging
to 17th
century AD. . . . . . . . . . . . . . . . . . . . 176
C.4 Digitized image of Shravanabelagola temple inscriptions
belonging to 14th
century AD. . . . . . . . . . . . . . . 177
xiii

List of Tables
1.1 Evolution of Kannada Character . . . . . . . . . . . . . 8
3.1 Comparison of PSNR values and execution time for three
spatial domain methods to enhance the paper document
images of 512 × 512 size. . . . . . . . . . . . . . . . . . 79
spatial domain methods to enhance the palm leaf docu-
ment images of 512 × 512 size. . . . . . . . . . . . . . . 80
spatial domain methods to enhance the stone inscription
images of 512 × 512 size. . . . . . . . . . . . . . . . . . 81
4.1 Comparison of various wavelet thresholding methods for
five images along with PSNR values. . . . . . . . . . . 91
4.2 PSNR values obtained from five different thresholding
methods for few images. . . . . . . . . . . . . . . . . . 93
4.3 Result of Curvelet Transform based approach. . . . . . 111
4.4 Comparison of PSNR Values and execution time for Wavelet
and Curvelet Transform based methods on paper images. 112
xiv

and Curvelet Transform based methods on palm leaf im-
ages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
and Curvelet Transform based methods on stone inscrip-
tion images. . . . . . . . . . . . . . . . . . . . . . . . . 114
4.7 Comparison of PSNR values of two frequency domain
based approaches. . . . . . . . . . . . . . . . . . . . . . 115
5.1 Result of skew detection and correction. . . . . . . . . 141
5.2 Skew angle detected for each line in the document image. 145
6.1 Confusion Matrix and Recognition Rate(RR) for char-
acter image size 100 × 50. . . . . . . . . . . . . . . . . 155
6.2 Confusion Matrix and Recognition Rate (RR) for char-
acter image size 40 × 40 with ﬁrst scale. . . . . . . . . 156
6.3 Recognition Rate(RR) of the data set 64 × 64 and Con-
fusion Matrix for character image size 64 × 64 with ﬁrst
scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.4 Comparison of the Recognition Rates(RR) for various
character image sizes 40 × 40, 64 × 64, 100 × 50. . . . 157
xv

Chapter 1
Preface
1.1 Preamble
Documents are the major source of data, information and knowledge, which are writ-
ten, printed, circulated and stored for future use. Nowadays computers are gaining
dominion as they are used virtually everywhere to store information from handwritten
as well as printed documents and also produce printed documents [1], [2]. The often
repeated slogan of the paperless office for all organizations has now given way to a
different objective. In order to achieve such a paperless office, information needs to be
entered into a computer manually. Due to the substantial amount of labor required
to do so, the only solution is to make computers capable of reading paper documents
efficiently without the intervention of human operators. There exists massive scope
for research in the field of document image processing, particularly in the conversion
of document images into editable forms [3].
For the past few years, a lot of ambitious large-scale projects have been proposed
to make all written material available online in a digital form. Universities initiated
Million Book Project and industry initiated projects such as Google Books Library
in order to make this goal achievable and a lot of challenges still need to be handled
in the processing of these documents [4]. The main purpose of the digital library is
to consolidate all the documents that are spread across the globe and enable access
to their digital contents. The Optical Character Recognition (OCR) technology has
helped in converting document images into machine editable format. Even though
1

the OCR system adequately recognizes the documents, the recognition of handwrit-
ten documents is not completely reliable and is still an open challenge to researchers.
Inaccurate recognition is due to many factors like scanning errors, lighting conditions,
quality of the documents etc. Further inaccuracies stem from the age of these docu-
ments and the condition of the materials these documents are inscribed upon. Some
operations that can be performed on document images include: pre-processing of the
noisy image, enhancement of the low contrast image, de-blurring of the blurred im-
age, estimation of the skew introduced during image acquisition, segmentation of the
document image into lines, words, and characters and recognition of the character.
Historical documents are documents which contain vital information about our an-
cestors. They encompass every aspect of their life, religion, education etc. These are
inscribed or printed on a variety of materials and they substantially differ from vari-
ous other documents that are prevalent today mainly because of the major differences
in their layout structure. Due to their variable structure, extraction of the contents
of historical documents is a complicated task. Additional complexity is posed by
the various states of degradation that the historical documents are found in. The
primary causes for this degradation are factors like aging, faint typing, ink seepage
and bleeding, holes, spots, ornamentation and seals. Historical documents consist of
additional abnormalities like the presence of narrow spaced lines (with overlapping
and touching components) and the unusual and varying shapes in which the charac-
ters and words are found, due to differences in writing techniques and variations in
location and the particular period in which they were drafted. These problems also
create complications in segmenting the document image into lines, words and char-
acters, which is required to extract characteristic features for recognition purposes.
Thus, the removal of noise in the input document image and segmentation of the
document image into lines, words and characters are important factors in improving
the efficiency of OCR. Since processing of degraded documents plays a significant
role in deciding the overall result of the recognition product, it is essential that it be
handled effectively. With this background, in this thesis, we explore some efficient
image enhancement algorithms to enhance the degraded historical document images,
segment the enhanced image into lines, words and characters and documents belong-
2

ing to different eras. In this thesis, the terms document images and documents are
used interchangeably to refer to historical document images.
In the subsequent section, we present a brief introduction to historical documents,
its relevance and need for preservation. In the next succeeding section, we present
the motivation for the research work with brief introduction to document image pro-
cessing techniques: data acquisition/collection, pre-processing, segmentation, feature
extraction and recognition. Contribution of the research work and organization of
the thesis is presented in the last two sections.
1.2 Historical Documents
Written scripts have been the primary mode of communication and information stor-
age for hundreds of centuries. Prehistoric humans inscribed on stones, rocks and cave
walls. While some of these were used as a means of communication, others inscribe
a more religious or ceremonial purpose to them. Over the ages, evolving from primi-
tive objects like stones and rocks, novel mechanisms like palm or birch leaves, clothes
and paper became prevalent mediums for information storage. In later centuries,
they were more predominantly used to record information about education, religion,
health and socio-political advancement. These ancient artifacts are conventionally
referred to as historical documents and are a crucial part of any nations cultural
heritage. Some of the sample images shown in Figure(1.1), Figure(1.2) are stone
inscriptions of 6th
and 13th
centuries and Figure(1.3) is a palm leaf document.
According to Sircar [5] it has been confidently estimated that, about 80 percent of
all knowledge about Indian history (before 10th
century A.D) has been derived from
inscriptional sources. Commonly found inscriptions are usually found inscribed on
walls of caves, pillars, big rocks, metal plates, coins etc. The remarkable durability
of these materials compelled ancestors to record vital information imperative for
future generations. Many of these inscriptions were inscribed to preserve truths
about battles and recognize acts of bravery and courage pertaining to our ancestors.
Some of them are : Edicts of the rulers: Achievements of rulers, Eulogies: awards
given to persons in praise, Commemorative inscriptions: this type again has five sub
3

Figure 1.1: 6th
Century Ganga Dynasty Inscription.
categories. Donatory Inscriptions, Hero stones, Sathi stone, Epitaphs(inscriptions on
Tomb) and Miscellaneous.
These inscriptions not only comprise of text/characters, but also contain paintings
and carvings of humans, animals, nature and spiritual deities. An expert is required
to study and decipher their contents in the context in which they were envisioned in
a particular era. The study of such inscriptions is known as Epigraphy and an expert
involved in deciphering inscriptions is known as an Epigraphist. The inscriptions
on rocks, stones, caves and metals are vital resources which enlighten the present
generation about our past [6].
Stones, rocks, and metals were also used to inscribe significant community mes-
sages to people. Detailed information and stories could not be inscribed on materials
like rocks and stone. Therefore early ancestors used palm leaves and birch leaves
as a medium for imparting such information. They comprise of mythological stories,
spiritual teachings, and knowledge which spans a plethora of fields like science, educa-
tion, politics, law, medicine, literature etc. It has been estimated that India has more
than a hundred lakh palm and birch leaf documents available in various conditions.
Literature has revealed that the first usage of paper discovered through excavations
was in China from the 2nd
century BC[7]. People in India, started writing on paper
4

Figure 1.2: 13th
Century Hoysala Dynasty Inscription.
during 17th
century. As these documents contain vital information pertaining to our
past and are reminiscent of our cultural integrity, there is a dire need to preserve
them and prevent any further degradation.
It is rightly said that the nation or the society, which does not know its heritage,
cannot fully comprehend its present and hence is unable to lead its future. This
heritage encompasses almost every aspect of human inquiry, be it culture, spiritual-
ity, philosophy, astronomy, medicine, religion, literature or education that prevailed
during diﬀerent ages [8]. Majority of the details about a civilization can be obtained
from their ancient scriptures which help in understanding the past. Since these docu-
5

Figure 1.3: Inscriptions on palm leaf belonging to 16th
− 18th
century.
ments have degraded due to various factors like: weather conditions, fossil deposition,
fungus attacks, wear and tear, strain and stain, brittleness due to dry weather, ink
seepage, bleeding through and scratches etc., they cannot be preserved in their origi-
nal form for prolonged duration. Therefore, automated tools are required to capture
the document, enhance the documents images, recognize the era to which they belong
and ﬁnally convert them into digital form for long-term preservation. In our research
work, we have considered Kannada historical document images for experimentation.
Hence information about Kannada script and its evolution is provided in the next
section.
1.2.1 Kannada Scripts/Character
In South East Asia, East Asia including India, inscriptions are found in one of the
three scripts namely Indus valley, Brahmi and Kharosti. The Kannada script, a
South Indian language script is one among the many evolved versions of Brahmi and
are shown in Figure 1.1, an instance of Kannada script inscribed during 3rd
century
BC. The image shown in Figure(1.4) shows the evolution of Kannada script since 3rd
century. The evolution of the script has brought changes in the structure and shape
of the script, mainly due to factors like writing materials, writing tools, method of
inscribing and the background of the inscriber [9], [10], [11],
Kannada script has a history of more than 2000 years and has taken shape from
early Brahmi script to the present Kannada as shown in Table(1.1). It has undergone
various changes and modiﬁcations during the dynasty of Satavahana(2nd
century A
6

Figure 1.4: Stone inscription belonging to 3rd
century BC.
.
D), Kadamba(4th
−5th
century A D), Ganga(6th
century A D), Badami Chalukya(6th
century A D), Rastrakuta(9th
century A D), Kalyani Chalukya(11th
century A D),
Hoysala(13th
century A D), Vijayanagar (15th
century A D), Mysore(18th
century A
D). Since experts are few in number and are fast decreasing, it is the need of the hour
to preserve and automate the process of deciphering these inscriptions.
1.3 Motivation for the Research Work
Historical documents are national treasures and provide valuable insight into past
cultures and civilizations, the signiﬁcance of which has been extensively discussed in
the previous sections. The preservation of these documents is of vital importance and
is being strenuously carried out with the help of an assortment of advanced tools and
technologies. These kinds of documents are being digitized, processed and preserved
using a noteworthy set of image processing and pattern recognition techniques. The
major steps involved in the processing of an image are: image acquisition/collection,
preprocessing, segmentation, feature extraction and recognition[12],[13]. These and
other related works are discussed in the following sub sections.
1.3.1 Data Collection
The historical documents considered for this research work were collected from var-
ious libraries and universities across Karnataka; one of the prominent South Indian
7

Table 1.1: Evolution of Kannada Character
Character ’a’ Century
Ashoka, 3rd
Century B C
Saathavahana, 2nd
Century A D
Kadamba, 4th
- 5th
Century A D
Ganga, 6th]
Century A D
Badami Chalukya, 6th
Century A D
RashtraKuta, 9th
Century A D
Kalyani Chalukya 11th
Century A D
Hoysala, 13th
Century A D
Vijayanagara 15th
Century A D
Mysore 18th
Century A D
States. These digitized documents are inscribed/written in Kannada, which is the
regional and official language of Karnataka. About 2700 digitized document images
were considered for our study. Majority of these are palm leaf documents and the
rest are paper and stone inscriptions span different eras from 13th
to 19th
centuries.
Since these images are collected using different setups i.e. either using a camera
8

or a scanner, the particular resolution details are unavailable. Differences in setup
cause significant variations in image size and resolution and introduce complexities
in setting up parameter values for experimentation. Therefore, each image set has
to be manually inspected and adjusted to get suitable image and character size. The
image set consists of documents inscribed by different individuals and also length of
the palm leaves used to inscribe varies across the collection.
Paper documents are categorized into two groups : Good-quality images and Noisy
images. Uneven illumination, brown colored and low contrast paper images without
spots, stains, and smears etc are grouped under Good-quality images. Images with
spots, stains, smears or smudges, with less or more background noise, wrinkles due
to humidity, illumination variation, ink seeping from the other side of the page,
oily pages, thin pen strokes, breaks, dark lines due to folding, de-coloring, etc are
grouped under Noisy images. Approximately 200 documents were collected with
varying resolutions. During experimentation, the images are divided into different
sizes depending on its overall size and also into 512 × 512 sized images. Higher
resolution images are re-sized and divided into smaller sized images. Lower resolution
images are divided without re-sizing. Large images are not capable of being processed
using computers due to hardware constraints, therefore images have to be divided into
smaller sized images. About 500 plus images were created out of 200 images.
Palm leaves are classified into two groups viz. Degraded and Severely Degraded.
Leaves with low contrast due to repeated application of preservatives (oil), stains
due to uneven application of oils, accumulation of dust, holes introduced due to
tying of the leaves together are classified as Degraded. Subsequently, leaves with
dark and brown colored lines introduced due to cracks, strains, breaks, wear and
tear and noise due to scanning errors are grouped under Severely Degraded. These
documents are hard to enhance and segment. About 1000 palm leaf documents were
collected with their sizes varying from 2cm to 24cm in length and 2cm to 6cm in
width. Furthermore, the lengthy images(of size more than 10 cm in length) were
re-sized and divided into smaller size, based on the size and character size within the
document. Therefore images were divided into two to three segments and used for
9

subsequent experimentation. Approximately 2000 images were obtained from 1000
images.
The percentage of degradation was found to be significantly higher in earlier stone
inscriptions, particularly those from 3rd century BC to 13th century AD. Capturing
stone inscriptions under different lighting conditions creates illumination and inten-
sity problems along with scratches, cracks, breaks and also leads to erased characters
due to wear and tear. So, stone inscriptions tend to be more severely degraded than
palm leaves and paper. Therefore, it is difficult to enhance the entire image. Ap-
proximately 200 digitized images of stone inscriptions were collected. Even though
more than 400 images were created out of 200, we have considered only 200 resized
samples for our study.
Some of the sample images belonging to paper, palm leaves and stone inscriptions
used for experimentation are shown in Appendix A, Appendix B and Appendix C.
1.3.2 Enhancement/Preprocessing
The primary objective of pre-processing is to improve the image quality by adequately
suppressing unwanted distortions and suitably enhance the part of the full image
features that are important for further processing. Even though we have a myriad
of advanced photography and scanning equipment at our disposal, natural aging and
perpetual deterioration have rendered many historical document images thoroughly
unreadable. Aging of these documents have led to the deterioration of the writing
media employed, due to influences like seepage of ink, smearing along the cracks,
damage to the leaf due to holes used for binding the manuscript leaves and other
extraneous factors such as dirt and discoloration.
In order to suitably preserve these fragile materials, digital images are predomi-
nantly captured using High Definition(HD) digital cameras in presence of an appropri-
ate light source instead of platen scanners. Digitizing palm leaf and birch manuscripts
pose a variety of problems. They cannot be forced flat and the light source used for
digital cameras are usually uneven and the very process of capturing a digital image of
10

the leaf introduces many complications. These factors lead to poor contrast between
the background and the foreground text. Therefore, innovative digital image pro-
cessing techniques are necessary to improve the legibility of the manuscripts. To sum
up, historical document images pose several challenges to preprocessing algorithms,
namely low contrast, nonuniform illumination, noise, scratches, holes, etc.
It has been observed from literature that many spatially linear, nonlinear and
spectral filters are used to denoise the image [14], [15], [16], [17]. Gatos et al.[18] pro-
posed a novel noise reduction technique called Wiener filter and adaptive binarization
method. Unsharp masking was proposed to enhance the edge detail information in
the degraded document. These filters eliminate noise, smoothen the image and give
a blurring effect[19]. In degraded documents, the text information is very crucial
for subsequent stages of character recognition and therefore losing out text informa-
tion while smoothing, is unacceptable. Therefore a suitable algorithm is required to
eliminate noise, without losing out much of the textual content.
Literature survey reveals that very little work has been reported on Indian historical
document processing owing to the fact that preservation of ancient physical resources
has taken precedence quite lately. India is a country of vast cultural heritage and
is one of the largest repositories of cultural heritages in the world. It houses an
estimated 5 million ancient manuscripts available in various archives and museums
throughout the country. The preservation of these resources was never a priority
subject in the past, so large resources have either vanished or gone out of our country.
Furthermore, even the ones which have survived have undergone massive degradation.
Therefore preservation of these historical heritages through digitization is of utmost
importance. However, any degradedness in the original document will be transferred
directly to their digitized versions rendering them illegible. To improve the legibility
of the document, images have to be pre-processed in order to get an enhanced copy.
So this warrants the development of novel image processing algorithms to preprocess
the digitized images.
11

1.3.3 Segmentation
Image segmentation is the process of splitting a digital image into multiple groups of
pixels, each of which are assigned unique labels so that pixels with the same label share
certain visual characteristics. In general terms, it can be considered as simplifying
the representation of an image into something that is more meaningful and easier
to analyze. Image segmentation is typically used to trace objects, boundaries and
regions of interest. In case of document images, segmentation refers to extraction of
lines, words, and characters from the given document. Segmentation of a document
image into text, lines and words is a critical phase in moving towards unconstrained
handwritten document recognition. Extracting lines from handwritten documents
is more complicated, as these documents contain non uniform line spacing, narrow
spacing between lines, scratches, holes and other factors which are elaborated in the
previous section on historical documents. Apart from variations of the skew angle
between text lines or along the same text line, the existence of overlapping or touching
lines, uneven character size and non-Manhattan layout pose considerable challenges
to text line extraction.
Due to inconsistency in writing styles, scripts, etc., methods that do not use any
prior knowledge adapt to the properties of the document image, as the proposed,
would be more robust. Line extraction techniques may be categorized as projection
based, grouping, smearing and Hough transform based [20]). Global projections based
approaches are very effective for machine printed documents but cannot handle text
lines with variable skew angles. However, they can be applied for skew correction in
documents with constant skew angle[21]. Hough transformed based methods handle
documents with variation in the skew angle between text lines, but are not very
effective when the skew of a text line varies along its width [22].
The most known of these segmentation algorithms are the following: X-Y cuts or
projection profiles based [23], Run Length Smoothing Algorithm(RLSA) [24], compo-
nent grouping [25], document spectrum [26], constrained text lines[27], Hough trans-
form [28], [29], and Scale space analysis [30]. All of the above segmentation algorithms
are mainly devised for present-day documents. For historical and handwritten docu-
ment segmentation, projection profiles [31], Run Length Smoothing Algorithm [32],
12

Hough transform[33] and scale space analysis algorithms [34] are mainly used. As
segmentation of the historical document images is another focus of our research work,
a detailed literature survey is given in next chapter and algorithms developed for line
segmentation are detailed in chapter 5.
1.3.4 Feature Extraction and Recognition
Feature extraction involves simplifying the amount of resources required to describe
a large set of data accurately. When performing analysis of complex data, one of the
major problem stems from the number of variables involved. Analysis involving a
large number of variables generally requires a large amount of memory, computation
power and/or the presence of a classification algorithm which over-fits the training
sample and generalizes poorly to new samples.
Feature extraction is a general term used for methods which involve constructing
combinations of the variables to get around these problems, while still describing
the data with sufficient accuracy. Features are used as input to classifiers in order
to classify and recognize the object. To recognize the character, features have to
be extracted from the segmented document. Literature survey reveals a wide array
of creative works in the diverse field of Document image processing and recognition.
Many authors have developed efficient algorithms for segmentation of the document
into lines, words, characters [35][36], feature extraction and classification of charac-
ters [37]. Feature extraction and recognition is an important part of the recognition
system. Major feature extraction algorithms are based on structural features, statis-
tical features and spectral methods. Structural features are based on topological and
geometrical characteristics such as, maxima and minima, reference lines, ascenders,
descenders, strokes and their direction between two points, horizontal curves at top
or bottom, cross points, end points, branch points etc [38]. A detailed literature
survey on the enhancement, segmentation and recognition stages is presented in the
next chapter.
Although significant efforts have been made to digitize the historical content, the
understanding of these documents is beyond the scope of any common man. The
13

underlying reason for this is that the character set has evolved and changed from
ancient times to what it is now. The scripts/characters used to inscribe the contents
are no longer prevalent. Hence expert knowledge is required to decipher these docu-
ments. In the present scenario, the number of expert Epigraphists are few in number
and are fast decreasing which could lead to a major problem in deciphering these pre-
cious resources in the future. Hence there is a need to develop supplementary tools
to recognize the era of the character which in turn helps to refer the corresponding
character set to understand the document through applications of computer vision
techniques.
Only few authors have attempted to recognize Brahmi scripts and predict the
corresponding era. Fewer still have worked on deciphering of South Indian Kannada
language epigraphical (Stone inscriptions) scripts and proposed algorithms for predic-
tion of the era of the script [8]. In our research work, palm leaf and paper manuscripts
belonging to various eras are considered to predict the era of the document and the
algorithms devised for the prediction of the era is provided in chapter 6.
1.4 Contribution
In this research work, the severe degradation of the documents has been addressed
by developing spatial and frequency domain based algorithms. In spatial domain,
three algorithms have been designed based on 1) Gray Scale Morphological Recon-
struction(MR); 2) Bilateral ﬁltering and 3) Non Local Means ﬁltering in combination
with morphological operations. In the frequency domain, two algorithms have been
devised using wavelet and curvelet transforms.
In spatial domain, gray scale morphological reconstruction technique is devised
using gray scale opening and closing operations. Gray scale opening is applied to
compensate for non uniform background intensity and suppress bright details smaller
than the structural element, while closing operation suppresses the darker details.
This algorithm is further used as background elimination method in combination
with remaining algorithms in this thesis. This method works well for the images
14

with less degradation. Severely degraded images are handled using a Bilateral Fil-
ter (BF) with a combination of gray scale morphological reconstruction technique.
Bilateral filter based method along with the MR algorithm is employed to eliminate
noise, enhance the contrast and eliminate dark background. The bilateral filter is
a non linear filter which uses a combination of range filtering and domain filtering.
A combination of Non Local Means filter (NLMF) and MR technique is employed
in designing enhancement algorithm to de-noise the documents based on similarity
measure between non local windows.
Since simple spatial domain techniques cannot handle all types of degradations, it
becomes necessary to transform the problem into another domain to get better results.
An attempt has been made to eliminate the noise using frequency domain based
methods to achieve the desired results. An algorithm based on wavelet transform is
devised to analyse and enhance the image. Since wavelet transform is unable to handle
curve discontinuity, an extended wavelet transform known as curvelet transform based
approach is used to design the second algorithm to enhance the degraded documents.
Due to the presence of uneven spaces, curved lines and touching lines in a histor-
ical document, the segmentation of the document becomes quite complicated. To
address this problem, two segmentation algorithms have been proposed. First algo-
rithm, based on piecewise projection profile, is suitable for extracting the curved lines,
but fails to segment the touching lines. Therefore the second algorithm, based on
mathematical morphology and Connected Component Analysis(CCA) is developed
to segment the touching lines. The second algorithm segments the touching line as
well as curved lines. The extended version of the second algorithm: a combined ap-
proach of morphology and CCA, is designed to detect the skewed lines and correct the
lines within the document. Usually handwritten documents contain uneven spacing
causing skewed lines in the document. The detection and correction of the individual
line skews will make the segmentation task simple.
The segmented characters are used in further stages of image processing viz. feature
extraction, recognition and classification. To recognize and classify the characters,
features of the individual characters have to be extracted and used as input to the
15

classiﬁers. In this research work, recognizing the era of the character is taken up
so that character set belonging to that era can be used to decipher the document.
Hence, algorithms for era prediction of the segmented characters is devised using
curvelet transform.
1.5 Organization of the Thesis
The thesis is organized into seven chapters. Chapter one provides an introduction to
historical document image processing, motivation for the research and contribution
of the thesis. Chapter two presents the literature survey. Chapter three provides
the algorithms which are designed based on spatial domain techniques. Chapter four
explains the algorithms developed to enhance the historical document images based
on frequency (wavelet) domain techniques. Chapter ﬁve presents the algorithms
developed for segmentation of handwritten documents into lines and characters and
skew detection and correction algorithms. Chapter six deals with the development
of the algorithms for feature extraction and recognition of the era of the character.
Chapter seven provides conclusion and future scope of the work.
16

Chapter 2
Literature Survey
2.1 Computer Vision
Visual system has been the greatest source of information to all living things since
beginning of the history. To interact effectively with the world, the vision system must
be able to extract, process, and recognize a large variety of visual structures from the
captured images [1]. One picture is worth a thousand words is well known sentence
to describe the importance of the visual data. Visual information transmitted in
the form of digital images is becoming the major method of communication in the
present scenario. This has resulted into a new field of computer technology known
as Computer Vision [2]. It is a rapidly growing field with increasing applications
in science and engineering and holds the responsibility of developing the suitable
machine that could perform the visual functions of an eye. It is mainly concerned
with modeling and replicating human vision using computer software and hardware
[12], [13]. It combines the knowledge of all fields of engineering in order to understand
and simulate the operation of the human vision system.
Computer vision finds its applications in various fields like: military, medicine,
remote sensing, forensic science, transportations etc. Some of these applications are:
content based image retrieval, automated image and video annotation, semantics re-
trieval, document image processing, mining, warehouse, augmented reality, biometric,
non-photorealistic rendering, and knowledge extraction etc. These applications in-
volve, various sub fields of Computer Vision such as Digital Image Processing, Pattern
17

Classification and/or Object Recognition, Video Processing, Data Mining and Arti-
ficial Intelligence etc. These sub fields are required to process the image/video data
in various combinations to get desired output.
One sub field of computer vision is Document Image Analysis and Recognition
(DIAR), which aims to develop techniques to automatically read and understand
the contents of document through machines. The DIAR system consists of four
major stages: document image acquisition, image preprocessing, feature extraction
and recognition. Document image acquisition deals with the capturing the document
image using scanners and cameras. Image preprocessing mainly deals with noise
elimination, restoration, segmentation. Feature extraction deals with the extraction
of the characteristic features of the segmented character(document) for recognition
of the character. Pattern recognition or classification is mainly used to recognizing
the object/pattern in the image using features extracted from feature extraction
techniques. In our research work, algorithms for image enhancement of historical
documents, segmentation of the document and prediction/recognition of era of the
document are presented and detailed literature survey of them is given in the following
sections.
2.2 Preprocessing and Segmentation
Often, degraded document creates problems in acquiring better quality images. In
document digitization projects of large volume, the main challenge is to automatically
decide correct and proper enhancement technique. Image enhancement techniques
may adversely influence an image quality if applied to incorrect image. Boutros [39]
proposed a prototype which can automate the image enhancement process. It is
clear that the quality of image acquisition affects the later stages of document image
processing. Hence proper image preprocessing algorithms are needed.
Text line extraction would segment the document images without background noise
and non-textual elements. In practice, it is very difficult to get document images with-
out noise. Some preprocessing techniques need to be performed before segmentation.
Non-textual elements around the text such as book bindings, book sides, and parts of
18

fingers should be removed. On the document itself, holes and stains may be removed
by high-pass filtering. Other non-textual elements (stamps, seals) and also ornamen-
tation, decorated initials can be removed using knowledge about the shape, the color
or the position of these elements. Extracting text from figures (text segmentation)
can also be performed on texture grounds [40], [41], or by morphological filters.
Intensive research work has been found in development of algorithms based on text
line distortion [42], [43], [44] methods. These proposed methods are aimed at solving
nonlinear folding of documents. Folding (warping) can sometimes become serious
and contents of the document become unreadable. Fan et al. [45] proposed hybrid
method by combining two cropping algorithms, first based on line detection and the
second based on text region growing, to achieve robust cropping.
Javadevan et al. [46] presented a survey on bank cheque processing. The work
presented covers the aspects of the document image processing. Almost all documents
which are part of any organization viz., business letters, newspaper, technical reports,
legal documents, bank checks need to be processed to extract information. Authors
have discussed various aspects of check processing techniques. As checks are scanned
in various conditions, low contrast, slanted, tilted are common problems. Cheques
may also contain scratches, lines, overwriting ink marks on the check leaf. These
create problems in recognizing the correct date, account number, amount, check
numbers etc. Cheque writers usually cross the text lines and write above the text
line.
Suen et al. [47] proposed a method to process the bank check in which initially the
image was smoothed using mean filter and then background was eliminated through
an iterative thresholding. Madasu and Lovell [48] proposed bank check process-
ing method based on gradient and Laplacian values which are used to find whether
an image pixel belongs to background or foreground. The binarization approach
proposed in [49] was based on Tsallis entropy to find the best threshold value and
histogram specification was adopted for preprocessing some images. To eliminate
the background from the cheque image in [51], a stored background sample image
was subtracted from the skew corrected test image. Background subtraction method
19

was adapted to extract written information from Indian bank cheques. Erosion and
dilation operations were used to eliminate the background residual noise. Logical
smearing was applied with the help of end-point co-ordinates of detected lines to deal
with broken lines in [57].
Binarization of images is very important step in any recognition systems. A lot
of work in finding suitable thresholding value for binarization has been found from
the literature survey. Sahoo et al. [52] compared the performances of more than 20
global thresholding algorithms using uniformity or shape measures. The comparison
showed that Otsu‘s class separability method [53] performed best.
Sezgin and Sankur [54] discussed various thresholding techniques in their survey
paper. The binarization algorithm proposed in [55] defines an initial threshold value
using percentage of the desired density of black pixels to appear in the final binarized
image. To improve the efficiency of the algorithm, a cubic function was used to
establish relationship between the initial threshold value and the final one. In [56],
the binarization of the grey-scale image was done with a threshold value calculated
dynamically based on the number of connected components in the area of courtesy
amount.
Slant/skew is the deviation of handwritten strokes from the vertical direction (Y
- axis) due to different writing styles. The skew may be introduced while scanning
the documents and can be detected by finding the angle that the baseline makes
with the horizontal direction. It has to be detected and corrected for successful
segmentation and recognition of handwritten user inputs. Skew correction is done
by simply rotating the image in the opposite direction by an angle equal to the
inclination of the guidelines. A comprehensive survey on different skew detection
techniques was reported in [50]. Due to the presence of guidelines, the histogram
with longest peak corresponds to the skew of the image. To correct the rotation
and translation occurred during the image acquisition process, a method based on
projection profile has been used in [51].
20

Kim and Govindaraju [58] proposed a chain code representation for calculating
the slant angle of handwritten information. In [59] and [60], the average slant of
a word was determined by an algorithm based on the analysis of slanted vertical
histograms [61]. The heuristics for finding the average slant was to search for the
greatest positive derivative in all the slanted histograms and then corrected through
a shear transformation in the opposite direction. Also in [62] and [63], the slant of
handwritten information was computed using the histogram of the directions of the
contour pixels.
Many techniques have been developed for page segmentation of printed documents
viz., newspapers, scientific journals, magazines, business letters produced with mod-
ern editing tools [64], [65], [66], [26]. The segmentation of handwritten documents
has also been addressed with the segmentation of address blocks on envelopes and
mail pieces [68], [67], [69], [70] and for authentication or recognition purposes [71],
[72].
There are various methods available for text line extraction. One of the fundamen-
tal methods is projection profile method which is used for printed documents and
handwritten document with proper spacing between lines. The vertical projection
profile is obtained by summing pixel values along the horizontal axis for each y value.
The profile curve can be smoothed by a Gaussian or median filter to eliminate local
maxima [34]. The profile curve was then analyzed to find its maxima and minima.
There are two drawbacks: short lines will provide low peaks, very narrow lines, as
well as those including many overlapping components, will not produce significant
peaks. In case of skew or moderate fluctuations of the text lines, the image may be
divided into vertical stripes and profiles sought inside each stripe [73]. These piece-
wise projections are thus a means of adapting to local fluctuations within a more
global scheme.
In Shapiro et al. [74] paper, the global orientation or skew angle of a handwritten
page was first searched by applying a Hough transform on the entire image. Once this
skew angle was obtained, projections were achieved along this angle. The number of
maxima of the profile gives the number of lines. Low maxima were discarded based
21

on their value, which was compared to the highest maxima. Lines were delimited by
strips, searching for the minima of projection profiles around each maxima.
In the work of Antonacopoulos and Karatzas [75], each minimum of the profile curve
was a potential segmentation point. Potential points were then scored according to
their distance to adjacent segmentation points. The reference distance was obtained
from the histogram of distances between adjacent potential segmentation points. The
highest scored segmentation point was used as an anchor to derive the remaining ones.
The method was applied to printed records of the Second World War which have
regularly spaced text lines. The logical structure was used to derive the text regions
where the names of interest can be found. The RXY cuts method applied in He and
Downton [31] uses alternating projections along the X and Y axes. This results in a
hierarchical tree structure. Cuts were found within white spaces. Thresholds were
necessary to derive inter-line or inter-block distances. This method can be applied
to printed documents (which are assumed to have these regular distances) or well-
separated handwritten lines.
For printed and binarized documents, smearing methods such as the Run-Length
Smoothing Algorithm [76] can be applied. Consecutive black pixels along the hori-
zontal direction were smeared: the white space between them was filled with black
pixels if their distance is within a predefined threshold. The bounding boxes of the
connected components in the smeared image enclose text lines. A variant of this
method adapted to gray level images and applied to printed books from the sixteenth
century consists in accumulating the image gradient along the horizontal direction
[77]. This method has been adapted to old printed documents within the Debora
project [78]. For this purpose, numerous adjustments in the method concern the
tolerance for character alignment and line justification. Shi and Govindaraju [79]
proposed a method for text line separation using fuzzy run length which imitates an
extended running path through a pixel of a document image.
The Hough transform [28] is a very popular technique for finding straight lines in
images. The Hough transform can also be applied to fluctuating lines in handwritten
drafts [80]. An approach based on attractive-repulsive forces was presented by Oztop
22

et al.[81]. It works directly on gray level images and consists of iteratively adapting
the y position of a predefined number of baseline units. Baselines are constructed
one by one from the top of the image to the bottom. Pixels of the image act as
attractive forces for baselines and already extracted baselines act as repulsive forces.
Tseng and Lee [82] presented a method based on probabilistic Viterbi algorithm ,
which derives non-linear paths between overlapping text lines. In Likforman-Sulem
et al. [33] method, touching and overlapping components are detected using Hough
transform method. Pal and Datta [35] proposed a line segmentation method based
on the piecewise projection profile.
Some solutions for separation of units belonging to several text lines can be found
in literature survey for recognition purposes. In Bruzzone and Coffetti´s method [83],
the contact point between ambiguous strokes was detected and processed from their
external border. An accurate analysis of the contour near the contact point was per-
formed in order to separate the strokes according to two registered configurations: a
loop in contact with a stroke or two loops in contact. Khandelwal et. al. [84] pre-
sented a methodology based on comparison of neighborhood connected components
to check text line belonging to same line or not. Components less than average height
are ignored and addressed later in the postprocessing.
New algorithm for segmentation of overlapping line and multi touching components
has been proposed by Zahour et al. [85] using block covering method which has
three steps. First step classifies the document using fractal analysis and Fuzzy C
means algorithm. Second step classifies the block using statistical analysis of block
height. Last step was a neighboring analysis for constructing text lines. High accuracy
through fractal analysis and a fuzzy C-means classifier were used to find the type of
the document.
Bloomberg’s [87] text line segmentation algorithm was specially designed for sep-
arating text and halftone image from a document image. But it was unable to
discriminate between text and drawing type non-text components and therefore fails
23

to separate them from each other. Hence Syed et al [88] presented a method to over-
come the Bloomberg’s algorithm and was able to separate text and non text regions
properly including halftones, drawings, map, graphs etc.
Bansal and Sihna et al. [89] proposed an algorithm which was based on the struc-
tural properties of the Devanagari script. They have implemented using two pass:
1) words were segmented into characters/composite characters, 2) height and width
of the character box was used to check whether the segmented character is single or
composite. Ashkan et al. [90] proposed skew estimation algorithm using eigen value
technique to detect and correct the skew in the document.
2.2.1 Enhancement of Historical Document Image
Ancient and historical documents strongly differ from the recent documents because
layout structure is completely different. As these documents contain variable struc-
ture, extraction of the contents are complicated. Besides, historical documents are
degraded in nature, due to ageing or faint typing, ink seepage and bleed through.
They include various disturbing facts like holes, spots, ornamentation or seals. Hand-
written pages include narrow spaced lines with overlapping and touching components.
Characters and words have unusual and varying shapes, depending on the writer, the
period and the place.
Relatively good progress can be found in the area of historical document image pro-
cessing. Shi and Govindarahu [91] proposed method for enhancement of historical
degraded document images using background light normalization. In their work, the
method captures the background intensity with the help of best fit linear function and
normalized with respect to the approximation. Shi and Govindaraju [92] also pro-
posed method for segmentation of historical document image using background light
intensity normalization. Yan and Leedham [93] proposed a thresholding technique
for binarization of historical documents. It uses local features vectors for analysis.
Gatos et al. [18] presented new adaptive approach for the binarization and en-
hancement of degraded documents. Proposed method does not require any parameter
24

tuning by the user and can deal with degradations which occur due to shadows, non-
uniform illumination, low contrast, large signal-dependent noise, smear and strain.
It consisted of several distinct steps: pre-processing procedure using low-pass Wiener
filter, rough estimation of foreground regions, and background surface calculation
by interpolating neighboring background intensities, thresholding by combining the
calculated background surface with the original image while incorporating image up-
sampling and finally a post-processing step in order to improve the quality of text
regions and preserve stroke connectivity.
Gatos et al. [95] presented a new approach for document image binarization. The
proposed method was mainly based on the combination of several state-of-the-art bi-
narization methodologies as well as on the efficient incorporation of the edge details
of the gray scale image. An enhancement step based on mathematical morphology
operations were also involved in order to produce a high quality result while preserv-
ing stroke information. The proposed method demonstrated superior performance
against six well-known techniques on numerous degraded handwritten and machine
printed documents.
Shi et al. [96] proposed methods for enhancing digital images of palm leaf and
other historical manuscripts. They have approximated the background of a gray-
scale image using piece-wise linear and nonlinear models. Normalization algorithms
are used on the color channels of the palm leaf image to obtain an enhanced gray-
scale image. Experimental results have shown significant improvement in readability.
An adaptive local connectivity map has been used to segment lines of text from
the enhanced images with the objective of further the techniques such as keyword
spotting or partial OCR and thereby making it possible to index these documents
for retrieval from the digital library.
Probabilistic models for text extraction algorithm from degraded document images
has been presented in [86]. Document image was considered as mixture of Gaussian
densities which corresponds to the group of pixels belonging to foreground and back-
ground of document image. Also Expected maximization (EM) algorithm was used to
estimate the parameters of Gaussian mixtures. Using these parameters, the image is
25

divided into two class: Text foreground and background using Maximum Likelihood
approach.
2.2.2 Segmentation of Historical Documents
Louloudis et al. [94] presented new text line detection method for unconstrained
handwritten documents. The proposed technique was based on the strategy that
consists of three distinct steps. The first step includes preprocessing for image en-
hancement, connected component extraction and average character height estimation.
In the second step, a block-based Hough transform was used for the detection of po-
tential text lines while the third step was used to correct possible false alarms. The
performance of the proposed methodology was based on a consistent and concrete
evaluation technique that relies on the comparison between the text line detection
result and the corresponding ground truth annotation.
Surinta and Chamchong [36] presented paper on image segmentation of historical
handwriting from palm leaf manuscripts. The process composed of following steps:
background elimination to separate text and background by Otsuâs algorithm, line
segmentation and character segmentation by histogram of image.
Shi et al. [97], have presented new text line extraction method for handwritten
Arabic documents. The proposed technique was based on generalized adaptive local
connectivity map using a steerable directional filter. The algorithm was designed to
solve particularly complex problems seen in handwritten documents such as fluctu-
ating, touching or crossing text lines.
Nikolaou et al. [98] presented method towards the development of efficient tech-
niques in order to segment document pages resulting from the digitization of histori-
cal machine-printed sources. To address the problems posed by degraded documents,
they implemented an algorithm which uses following steps. First, using Adaptive
Run Length Smoothing Algorithm (ARLSA) to handle the problem of dense and
complex document layout, second to detect the noise areas and punctuation marks
that usually are present in historical machine-printed documents, third deals with de-
tection of possible obstacles created from background areas to separate neighboring
26

text columns or text lines, and last step deals with segmentation using segmentation
paths in order to isolate possible connected characters.
The enhancement of the document with ink bleed through using recursive unsuper-
vised classification technique has been proposed by Fadoua et al. [99]. The presented
method performs recursively K-means algorithm on the degraded image with princi-
pal component analysis of the document image. Then cluster values are taken and
back projected on the space. The iterative method has used for finding logarithmic
histogram and separating background and foreground using K-means algorithm until
clear separation of background and foreground of the document was made.
Kishore and Rege [19] used unsharp masking to enhance the edge detail information
in the degraded document. Gatos et al. [100] proposed method mainly is based on
the combination of several state-of-the-art binarization methodologies as well as on
the efficient incorporation of the edge information of the gray scale source image. An
enhancement step based on mathematical morphology operations was also involved
in order to produce a high quality result while preserving stroke information.
Halabi and Zaid [101] presented an enhanced system for degraded old document.
The developed system was able to deal with degradations which occur due to shadows,
non-uniform illumination, low contrast and noise. Ferhat et al.[102] proposed image
restoration using Singular Value Decomposition and restored even blurred image.
Lu and Tan [14] proposed technique which estimates document background sur-
face using an iterative polynomial smoothing procedure. Various types of document
degradations are then compensated by using the estimated document background
surface intensity. Using L1-norm image gradient, the text stroke edge is detected
from the compensated document image. Finally, the document text is segmented by
a local threshold that is estimated based on the detected text stroke edges. Ntogas
and Ventzas [15] proposed binarization procedure consisted of five discrete steps in
image processing, for different classes of document images.
27

Badekas and Papamarkos [103] proposed new method which estimates the best pa-
rameter values for each one of the document binarization techniques and also estima-
tion of the best document binarization result of all techniques. Laurence Likforman-
Sulem et al. [16] presented novel method for document enhancement which combines
two recent powerful noise-reduction steps. The first based on the total variation
framework and second based on Non-local Means. Non Local Means filter computa-
tional complexity depends on the size of the patch and window. Layout analysis is
required to extract text lines and identify the reading order properly which provides
proper input to classifiers.
Generic layout analysis for variety of typed text, handwritten and ancient Arabic
document image has been proposed in [104] paper. The proposed system performs
text and non text separation, then text line detection, and lastly reading order deter-
mination. This method can be combined with an efficient OCR engine for digitization
of documents. Considerable amount of work can be found on segmentation of histor-
ical documents in [105]. Hánault et al. [106] proposed a method based on linear level
set concept for binrizing the degraded documents. This method takes advantage of
the local probabilistic models and flexible active contour scheme. In the next section,
we present detailed literature survey on character recognition.
2.3 Character Recognition
The history of character recognition can be traced back as far as 1940, when the
Russian scientist Tyuring attempted to develop an aid for the visually handicapped
[107]. The first character recognizers appeared in mid 1940s with the development of
digital computers. Early work on automatic recognition of characters concentrated
either upon machine printed content or on a small set of well distinguished handwrit-
ten texts or symbols. Machine printed OCR systems in that period generally used
template matching in which an image is compared to a library of images. For hand-
written text, low level image processing techniques were used on the binary images to
extract feature vectors, which are then fed to statistical classifiers. With the explo-
sion of information technology, the previously developed methodologies found a very
28

fertile environment for rapid growth in many application areas as well as OCR sys-
tems development [108], [109]. Structural approaches were initiated in many systems
in addition to statistical methods [110], [111].
The character recognition research was focused basically on the shape recognition
techniques without using any semantic information. This led to an upper limit in the
recognition rate, which was not sufficient in many practical applications. Historical
review of OCR research and development during this period can be found in [112] for
offline and online cases, respectively.
Stubberud et al. [113] proposed a method to improve the performance of an optical
character recognition (OCR) system, by using an adaptive technique that restores
touching or broken character images. By using the output from an OCR system
and a distorted text image, this technique trains an adaptive restoration filter and
then applies the filter to the distorted text image that the OCR system could not
recognize.
Indian language character recognition systems are still in the research stage. Most
of the research work is concerned with Devanagari and Bangla script characters, the
two most popular languages in India. Research work on Bangla character recognition
started in the early 90s. Chaudhuri and Pal [114] have discussed different works done
for Indian script identification. They have also discussed the various steps needed to
improve Indian script OCR development and have developed complete OCR system
for printed Bangla script. This approach involved skew correction, segmentation
and removal of noise. A technique with feature and template matching has been
implemented for recognition. A higher recognition rate was achieved in this method.
Sural and Das [115] have proposed a Hough transform based fuzzy feature ex-
traction method for Bangla script recognition. Some studies are reported on the
recognition of other languages like Tamil, Telugu, Oriya, Kannada, Punjabi, Gu-
jrathi, etc. Pal et al. [116] presented an OCR with error detection and correction
technique for a highly inflectional Indian language, Bangla. The technique was based
on morphological parsing where using two separate lexicons of root words and suffixes,
29

the candidate root-suffix pairs of each input string, are detected, their grammatical
agreement was tested and the root/suffix part in which the error occurred was noted.
The correction was made to the corresponding error part of the input string by means
of a fast dictionary access technique.
Pal and Chaudhuri [117] have proposed a system for classification of machine
printed and hand written text line. They have used a method based on structural and
statistical features of the machine printed and handwritten text lines. They achieved
a score of 98.6% in recognition. This technique used string features extracted through
row and column wise scanning of character matrix.
Pal et al. [118] proposed a new method for automatic segmentation of touching
numeral using water reservoir. A reservoir is a metaphor to illustrate the region where
numerals touch. Reservoir could be obtained by considering accumulation of water
poured from the top or from the bottom of the numerals. Touching character position
(top, middle or bottom) can be decided, by considering reservoir location and size.
Next, analyzing the reservoir boundary, touching position and topological features of
the touching pattern, the best cutting point can be determined. By combining with
morphological structural features the cutting path was generated for segmentation.
Structural and topological features based tree classifier and neural network classifier
has been used for most of the Indian Languages [119].
Some work on recognition of Telugu characters could be traced in the literature.
Elastic matching using Eigen deformation for hand character recognition was pro-
posed by Uchida and Sakoe [120]. The accuracy of recognition was found to be
99.47%. The deformations within each character category are of intrinsic nature
and can be estimated by the principal component analysis of the actual deformation
automatically collected by the elastic matching.
Pujari et al. [121] has proposed an algorithm for Telugu character recognition that
uses wavelet multi resolution analysis to extract features and associative memory
model to accomplish the recognition tasks. Multifont Telugu character recognition
algorithm was proposed by Rasanga et al. [122] using spatial feature of histogram
30

of orientation(HOG). Sastry et al. [123] implemented a methodology to extract and
recognize the Telugu character from palm leaf using decision tree concept.
Human machine interaction using optical character recognition for Devanagari
scripts has been designed by [124]. Shelke and Apte [125] proposed a novel method to
recognize handwritten character using feature extraction based on structural features
and the classification was done using their parameters. The final stage of feature
extraction was done by radon transform and classification was carried out with the
combination of Euclidean distance, feed forward and back propagation neural net-
works. The extended version of their paper, feature extraction, employs generation
of kernels using wavelet transform [126] and Neural networks [127]. Malayalam char-
acter recognition was proposed by John et al. [128] using Haar wavelet transform as
feature extraction approach and support vector machine as classifier. Pal et al. [129]
proposed a method to recognize unconstrained Malayalam handwritten numeral using
reservoir method. The main reservoir based features used were number of reservoirs,
positions of reservoirs with respect to bounding box of the touching pattern, height
and width of the reservoirs and water flow direction etc. Topological and structural
features were also used as feature extraction method along with the water reservoir
method.
Nagabhushan and Pai [130] have worked on Kannada Character Recognition area.
They proposed a method for the recognition of Kannada characters, which can have
spread in vertical and horizontal directions. The method uses a standard sized rect-
angle which can circumscribe standard sized characters. This rectangle can be inter-
preted as a 2-dimensional, 3×3 structure of nine parts which is defined as bricks. This
structure was also interpreted as consecutively placed three row structures of three
bricks each or adjacently placed three column structures of three bricks each. The
recognition has been done based on an optimal depth logical decision tree developed
during the Learning phase and did not require any mathematical computation.
Printed Kannada character recognition system was designed by Ashwin and Satry
[131] using zonal approach and support vector machine(SVM). In their zonal ap-
proach, the character image is divided into a number of circular tracks and sectors.
31

Automation of Historical Document Recognition

Automation of Historical Document Recognition

Recommended

Recommended

More Related Content

Similar to Automation of Historical Document Recognition

Similar to Automation of Historical Document Recognition (20)

Recently uploaded

Recently uploaded (20)

Automation of Historical Document Recognition