optical character recognition system

OCR System
Presented By:-
Vijay apurva(9910103462),
From 4th
year,CSEGuided By:-
Mr. Ankur
kulhari

The current capacity to translate paper documents quickly and
accurately into machine readable form using optical character
recognition technology augments the opportunities in document
searching and storing, as well as the automated document
processing. A fast response in translating large collections of image-
based electronic documents into structured electronic documents is
still a problem. The availability of a large number of processing units
in Grid environments and of free optical character recognition
tools can be exploited to produce a fast translation.
ABSTRACT:-

CONTENTS :-
 What is OCR?
 When and Why OCR?
 Existing System.
 Proposed System.
 Architecture of OCR.
 Algorithms of OCR.
 Modules of OCR.
 Design of OCR.
 Design of Screen shots for OCR.
 Conclusion.

WHAT IS OCR? :-
OCR stands for Optical Character Recognition. It is
one such system that allows us to scan printed, typewritten or
hand written text (numerals, letters or symbols) and/or convert
scanned image in to a computer process able format, either in the
form of a plain text or a word document.
 Later the converted documents can be edited, used or reused
in other documents. Thus the documents become editable.

WHEN AND WHY OCR? :-
 OCR is used when recreating a similar document in paper as
a document in electronic form takes more time.
 The converted text files take less space than the original
image file and can be indexed. Hence the use of OCR adds an
advantage to the user who had to deal with conversion of great
amount of paper works in to electronic form.

EXISTING SYSTEM:-
In the running world there is a growing demand for
the users to convert the printed documents in to electronic
documents for maintaining the security of their data. Hence the
basic OCR system was invented to convert the data available on
papers in to computer process able documents, So that the
documents can be editable and reusable.

PROPOSED SYSTEM:-
Our proposed system is OCR ON A GRID
INFRASTRUCTURE which is a character recognition system that
supports recognition of the characters of multiple languages. This
feature is what we call grid infrastructure which eliminates the
problem of heterogeneous character recognition. In this context,
Grid infrastructure means the infrastructure that supports group of
specific set of languages. Thus OCR on a grid infrastructure is multi-
lingual.

ARCHITECTURE :-
 The Architecture of the optical character recognition system on a
grid infrastructure consists of the three main components. They are:-
 Scanner
 OCR Hardware or Software
 Output Interface

Document
Illuminator
Detector
Document
Analysis
Character
Recognition Contextual
Processing
Scanner
OCR Hard-Ware Or Soft-Ware
Document image
Output
Interface
Recognition Results
To application user

TYPES OF TRAINING:-
Basically there are two major types of training using which we can
train a neural network system. They are:-
 Supervised Training
 Unsupervised Training

FLOWCHART FOR UNSUPERVISED LEARNING:-

KOHONEN NETWORK:-
The Kohonen network is presented with data, but the correct
output that corresponds to that data is not specified. Using the
Kohonen network this data can be classified into groups.

FLOWCHART FOR KOHONEN TRAINING:-

ALGORITHMS OF OCR:-
TRAINING ALGORITHM:-
One of the most common learning algorithms is called Hebb’s
Rule. This rule was developed to assist with unsupervised training.
 Hebb’s rule is expressed as:
Δ Wi j= µ ai aj (d-a)

MODULES :-
The Modules that were identified in the Optical Character
Recognition system are as follows:-
 Document Processing
 Neural network System Training
 Document Recognition
 Document Editing and
 Document Searching

DESIGN OF OCR :-
The design of our OCR system can be best explained
with the following diagram:-
Scan
Store
Recognize Editing
Searching
Document
and users
Database

OVERALL USECASE DIAGRAM:-
end-user1
end-user2
Document modification Document deletion
Document recognition
scan documents
store documents
Document processing
<<includes>>
<<includes>>
Document processing
Document editing
administrator
Trains the system
end-user

OVERALL CLASS DIAGRAM:-
Document
docid : integer
docname : String
docsize : integer
doctype : String
getDocumentDetails()
scanDocument()
covertToImage()
storeImage()
Editor
cut()
copy()
paste()
new()
open()
find()
HelpFrame
HEntry
hLineClear()
vLineClear()
findBounds()
TrainingSet
inputCount : int
outputcount : int
trainingSetCount : int
setInputCount()
setOutputCount()
setTrainingSetCount()
setClassify()
1..*
1
1..*
1
MainScreen
editor()
helpFrame()
printedFrame()
handWrittenFrame()
Entry
recog : int
downSampleLeft : int
downSampleRight : int
downSampleTop : int
downSampleBottom : int
hLineClear()
hLineClearWithin()
vLineClear()
vLineClearWithin()
PrintedFrame
open_action()
train_action()
topen_action()
recogniseAll_action()
1..*
1
1..*
1
KohenNetwork
LearnMethod = 1:int
LearnRate = 0.3:double
quitError : double
copyWeights()
clearWeights()
winner()
normalizeInput()
1..*1..* 1..*1..* 1..*1..* 1..*1..*

DESIGN OF SCREEN SHOTS FOR OCR:-
 Main Screen
 Hand Written Recognition Screen
 Scanned Document Recognition Screen
 Training Screen
 Recognition Screen
 Editor Screen
The screenshots that describe the operations carried out by our
system are as follows :-

CONCLUSION:-
The Grid infrastructure used in the implementation of
Optical Character Recognition system can be efficiently used to
speed up the translation of image based documents into structured
documents that are currently easy to discover, search and process.
The automated entry of data by OCR is one of the most
attractive, labor reducing technology
The recognition of new font characters by the system is very
easy and quick.
We can edit the information of the documents more
conveniently and we can reuse the edited information as and
when required.
The extension to software other than editing and searching is
topic for future works.

• Training and recognition speeds can
be increased greater and greater by
making it more user-friendly.
• Many applications exist where it
would be desirable to read
handwritten entries. Reading
handwriting is a very difficult task
considering the diversities that exist
in ordinary penmanship. However,
progress is being made.

optical character recognition system

optical character recognition system

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to optical character recognition system

Similar to optical character recognition system (20)

Recently uploaded

Recently uploaded (20)

optical character recognition system