Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

optical character recognition system

  • Login to see the comments

optical character recognition system

  1. 1. OCR System Presented By:- Vijay apurva(9910103462), From 4th year,CSEGuided By:- Mr. Ankur kulhari
  2. 2. The current capacity to translate paper documents quickly and accurately into machine readable form using optical character recognition technology augments the opportunities in document searching and storing, as well as the automated document processing. A fast response in translating large collections of image- based electronic documents into structured electronic documents is still a problem. The availability of a large number of processing units in Grid environments and of free optical character recognition tools can be exploited to produce a fast translation. ABSTRACT:-
  3. 3. CONTENTS :-  What is OCR?  When and Why OCR?  Existing System.  Proposed System.  Architecture of OCR.  Algorithms of OCR.  Modules of OCR.  Design of OCR.  Design of Screen shots for OCR.  Conclusion.
  4. 4. WHAT IS OCR? :- OCR stands for Optical Character Recognition. It is one such system that allows us to scan printed, typewritten or hand written text (numerals, letters or symbols) and/or convert scanned image in to a computer process able format, either in the form of a plain text or a word document.  Later the converted documents can be edited, used or reused in other documents. Thus the documents become editable.
  5. 5. WHEN AND WHY OCR? :-  OCR is used when recreating a similar document in paper as a document in electronic form takes more time.  The converted text files take less space than the original image file and can be indexed. Hence the use of OCR adds an advantage to the user who had to deal with conversion of great amount of paper works in to electronic form.
  6. 6. EXISTING SYSTEM:- In the running world there is a growing demand for the users to convert the printed documents in to electronic documents for maintaining the security of their data. Hence the basic OCR system was invented to convert the data available on papers in to computer process able documents, So that the documents can be editable and reusable.
  7. 7. PROPOSED SYSTEM:- Our proposed system is OCR ON A GRID INFRASTRUCTURE which is a character recognition system that supports recognition of the characters of multiple languages. This feature is what we call grid infrastructure which eliminates the problem of heterogeneous character recognition. In this context, Grid infrastructure means the infrastructure that supports group of specific set of languages. Thus OCR on a grid infrastructure is multi- lingual.
  8. 8. ARCHITECTURE :-  The Architecture of the optical character recognition system on a grid infrastructure consists of the three main components. They are:-  Scanner  OCR Hardware or Software  Output Interface
  9. 9. Document Illuminator Detector Document Analysis Character Recognition Contextual Processing Scanner OCR Hard-Ware Or Soft-Ware Document image Output Interface Recognition Results To application user
  10. 10. TYPES OF TRAINING:- Basically there are two major types of training using which we can train a neural network system. They are:-  Supervised Training  Unsupervised Training
  11. 11. FLOWCHART FOR UNSUPERVISED LEARNING:-
  12. 12. KOHONEN NETWORK:- The Kohonen network is presented with data, but the correct output that corresponds to that data is not specified. Using the Kohonen network this data can be classified into groups.
  13. 13. FLOWCHART FOR KOHONEN TRAINING:-
  14. 14. ALGORITHMS OF OCR:- TRAINING ALGORITHM:- One of the most common learning algorithms is called Hebb’s Rule. This rule was developed to assist with unsupervised training.  Hebb’s rule is expressed as: Δ Wi j= µ ai aj (d-a)
  15. 15. MODULES :- The Modules that were identified in the Optical Character Recognition system are as follows:-  Document Processing  Neural network System Training  Document Recognition  Document Editing and  Document Searching
  16. 16. DESIGN OF OCR :- The design of our OCR system can be best explained with the following diagram:- Scan Store Recognize Editing Searching Document and users Database
  17. 17. OVERALL USECASE DIAGRAM:- end-user1 end-user2 Document modification Document deletion Document recognition scan documents store documents Document processing <<includes>> <<includes>> Document processing Document editing administrator Trains the system end-user
  18. 18. OVERALL CLASS DIAGRAM:- Document docid : integer docname : String docsize : integer doctype : String getDocumentDetails() scanDocument() covertToImage() storeImage() Editor cut() copy() paste() new() open() find() HelpFrame HEntry hLineClear() vLineClear() findBounds() TrainingSet inputCount : int outputcount : int trainingSetCount : int setInputCount() setOutputCount() setTrainingSetCount() setClassify() 1..* 1 1..* 1 MainScreen editor() helpFrame() printedFrame() handWrittenFrame() Entry recog : int downSampleLeft : int downSampleRight : int downSampleTop : int downSampleBottom : int hLineClear() hLineClearWithin() vLineClear() vLineClearWithin() PrintedFrame open_action() train_action() topen_action() recogniseAll_action() 1..* 1 1..* 1 KohenNetwork LearnMethod = 1:int LearnRate = 0.3:double quitError : double copyWeights() clearWeights() winner() normalizeInput() 1..*1..* 1..*1..* 1..*1..* 1..*1..*
  19. 19. DESIGN OF SCREEN SHOTS FOR OCR:-  Main Screen  Hand Written Recognition Screen  Scanned Document Recognition Screen  Training Screen  Recognition Screen  Editor Screen The screenshots that describe the operations carried out by our system are as follows :-
  20. 20. CONCLUSION:- The Grid infrastructure used in the implementation of Optical Character Recognition system can be efficiently used to speed up the translation of image based documents into structured documents that are currently easy to discover, search and process. The automated entry of data by OCR is one of the most attractive, labor reducing technology The recognition of new font characters by the system is very easy and quick. We can edit the information of the documents more conveniently and we can reuse the edited information as and when required. The extension to software other than editing and searching is topic for future works.
  21. 21. • Training and recognition speeds can be increased greater and greater by making it more user-friendly. • Many applications exist where it would be desirable to read handwritten entries. Reading handwriting is a very difficult task considering the diversities that exist in ordinary penmanship. However, progress is being made.

×