OCR with Neural Network Made By: Marwa Fadhel Jassim Karam Samir Khalid
Introduction Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website.
<ul><li>OCR makes it possible to edit the text, search for a word or phrase, store it more compactly, display or print a copy free of scanning artifacts, and apply techniques such as machine translation, text-to-speech and text mining to it. OCR is a field of research in pattern recognition, artificial intelligence and computer vision. OCR systems require calibration to read a specific font; early versions needed to be </li></ul>
programmed with images of each character, and worked on one font at a time. "Intelligent" systems with a high degree of recognition accuracy for most fonts are now common.Some systems are capable ofreproducing formatted output that closely approximates the original scanned page including images, columns and other non-textual components.
OCR: picture of text -> text <ul><li>And Bruno, conqueror of Carthage, strode up to me and said: </li></ul><ul><li>"Devil take you, Edith!" </li></ul><ul><li>"Finally, you scoundrel - are you going to confess your love for me?" I retorted. </li></ul><ul><li>The German warrior stood stoically. He surveyed the landscape before him; grinned; spoke: </li></ul><ul><li>"You are much better with an axe than Jane - I grant you that." </li></ul><ul><li>(Killing, I admit, was my favourite pastime. Long before I enlisted in the Order of the Knights of Malta, I liked playing with knives. No-one objected.) </li></ul><ul><li>"Overall, how would you rank/rate my performance in axing?" </li></ul><ul><li>"Performance evaluations are meaningless!" </li></ul><ul><li>(Quite true.) </li></ul><ul><li>Radically changing the topic, I asked: </li></ul><ul><li>"So, what are your thoughts on Empress Teresa?" </li></ul><ul><li>"Unshareable; irrelevant; bitter." </li></ul><ul><li>"Secret? Very unsurprising. We warrior/troubadours are quite reserved - nay - silent." </li></ul><ul><li>(Xenophobia played a role too. You knew that. So did my friend, Zoe.) </li></ul>
OCR Step 2: Identify each letter <ul><li>“ P” </li></ul>
Identifying letters is hard <ul><li>Letters can be: </li></ul><ul><ul><li>Blurry </li></ul></ul><ul><ul><li>Rotated / squashed / skewed </li></ul></ul><ul><ul><li>In different fonts </li></ul></ul><ul><ul><li>Bold or in italics </li></ul></ul><ul><li>Background can have: </li></ul><ul><ul><li>Speckles, dirt </li></ul></ul><ul><ul><li>Texture from paper </li></ul></ul>
Approaches <ul><li>Compare with reference images </li></ul><ul><li>Find major lines, use heuristics, eg “vertical line on left, vertical line on right, horizontal line in the middle -> H” </li></ul><ul><li>Etc... </li></ul><ul><li>How do humans do it? -> neural networks </li></ul>
What is Neural Networks? <ul><li>A neural network is a powerful data modeling tool that is </li></ul><ul><li>able to capture and represent complex input/output </li></ul><ul><li>relationships. The motivation for the development of </li></ul><ul><li>neural network technology stemmed from the desire to </li></ul><ul><li>develop an artificial system </li></ul><ul><li>that could perform </li></ul><ul><li>"intelligent“ tasks similar to </li></ul><ul><li>Those performed by the </li></ul><ul><li>human brain. </li></ul>
...which in turn excite others Firing threshold
Inputs can be weighted Firing threshold 0.7 0.4
Neurons can suppress others Firing threshold 0.7 0.4 -0.5
And they can have a starting bias Firing threshold Bias 0.3
(So they're basically logic gates) 0.5 0.5 1 1 -1 Bias: 1 AND OR NOT
Neurons arranged in layers <ul><li>Neurons in one layer excite/suppress neurons in the next one </li></ul><ul><li>Excitation of neurons in first layer set according to the input </li></ul><ul><li>“ Hidden” layer(s) in between </li></ul><ul><li>Final layer is output </li></ul>In Hid Out
Simple letter identification network <ul><li>One input neuron per pixel in scaled picture of letter </li></ul><ul><li>One output neuron per possible letter </li></ul><ul><li>Train network to excite the output neuron that corresponds to the letter input </li></ul>A B C D E F G H I J K L M ...
Training Method The most popular and simple approach to OCR problem is based on feed forward neural network with back propagation learning. The main idea is that we should first prepare a training set and then train a neural network to recognize patterns from the training set. In the training step we teach the network to respond with desired output for a specified input. For this purpose each training sample is represented by two components: possible input and the desired network's output for the input. After the training step is done, we can give an arbitrary input to the network and the network will form an output, from which we can resolve a pattern type presented to the network.
Let's assume that we want to train a network to recognize 26 capital letters represented as images of 5x6 pixels, something like this one: One of the most obvious ways to convert an image to an input part of a training sample is to create a vector of size 30 (for our case), containing "1" in all positions corresponding to the letter pixel and "0" in all positions corresponding to the background pixels. But, in many neural network training tasks, it's preferred to represent training patterns in so called "bipolar" way, placing into input vector "0.5" instead of "1" and "-0.5" instead of "0". Such sort of pattern coding will lead to a greater learning performance improvement.
our training sample should look something like this: For each possible input we need to create a desired network's output to complete the training samples. For OCR task it's very common to code each pattern as a vector of size 26 (because we have 26 different letters), placing into the vector "0.5" for positions corresponding to the patterns type number and "-0.5" for all other positions
So, a desired output vector for letter "K“ will look something like this: After having such training samples for all letters, we can start to train our network. But, the last question is about the network's structure. For the above task we can use one layer of neural network, which will have 30 inputs corresponding to the size of input vector and 26 neurons in the layer corresponding to the size of the output vector.
The OCR software breaks the image into sub-images, each containing a single character. The sub-images are then translated from an image format into a binary format, where each 0 and 1 represents an individual pixel of the sub-image. The binary data is then fed into a neural network that has been trained to make the association between the character image data and a numeric value that corresponds to the character. The output from the neural network is then translated into ASCII text and saved as a file. Another Method