Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Neural Networks for OCR


Published on

Slides of Cambridge Geek Night talk about using convolutional neural networks to do optical character recognition, specifically in the Longan library.

Published in: Technology
  • Login to see the comments

Neural Networks for OCR

  1. 1. Neural Networks for OCR
  2. 2. OCR: picture of text -> text And Bruno, conqueror of Carthage, strode up to me and said: "Devil take you, Edith!" "Finally, you scoundrel - are you going to confess your love for me?" I retorted. The German warrior stood stoically. He surveyed the landscape before him; grinned; spoke: "You are much better with an axe than Jane - I grant you that." (Killing, I admit, was my favourite pastime. Long before I enlisted in the Order of the Knights of Malta, I liked playing with knives. No-one objected.) "Overall, how would you rank/rate my performance in axing?" "Performance evaluations are meaningless!" (Quite true.) Radically changing the topic, I asked: "So, what are your thoughts on Empress Teresa?" "Unshareable; irrelevant; bitter." "Secret? Very unsurprising. We warrior/troubadours are quite reserved - nay - silent." (Xenophobia played a role too. You knew that. So did my friend, Zoe.)
  3. 3. OCR Step 1: Find letters
  4. 4. OCR Step 2: Identify each letter <ul>“ P” </ul>
  5. 5. Identifying letters is hard <ul><li>Letters can be: </li><ul><li>Blurry
  6. 6. Rotated / squashed / skewed
  7. 7. In different fonts
  8. 8. Bold or in italics </li></ul><li>Background can have: </li><ul><li>Speckles, dirt
  9. 9. Texture from paper </li></ul></ul>
  10. 10. Approaches <ul><li>Compare with reference images
  11. 11. Find major lines, use heuristics, eg “vertical line on left, vertical line on right, horizontal line in the middle -> H”
  12. 12. Clustering algorithms
  13. 13. Etc...
  14. 14. How do humans do it? -> neural networks </li></ul>
  15. 15. Artificial Neural Networks <ul><li>Based on a simplified model of biological neurons
  16. 16. Can be trained to categorize inputs </li></ul>
  17. 17. Real Neurons
  18. 18. Neuronal Connections
  19. 19. Firing neurons excite others Firing threshold
  20. 20. Firing neurons excite others Firing threshold
  21. 21. Firing neurons excite others Firing threshold
  22. 22. ...which in turn excite others Firing threshold
  23. 23. Inputs can be weighted Firing threshold 0.7 0.4
  24. 24. Neurons can suppress others Firing threshold 0.7 0.4 -0.5
  25. 25. And they can have a starting bias Firing threshold Bias 0.3
  26. 26. (So they're basically logic gates) 0.5 0.5 1 1 -1 Bias: 1 AND OR NOT
  27. 27. Smooth threshold Smaller effect Larger effect
  28. 28. Neurons arranged in layers <ul><li>Neurons in one layer excite/suppress neurons in the next one
  29. 29. Excitation of neurons in first layer set according to the input
  30. 30. “ Hidden” layer(s) in between
  31. 31. Final layer is output </li></ul>In Hid Out
  32. 32. Simple letter identification network <ul><li>One input neuron per pixel in scaled picture of letter
  33. 33. One output neuron per possible letter
  34. 34. Train network to excite the output neuron that corresponds to the letter input </li></ul>A B C D E F G H I J K L M ...
  35. 35. Training Networks <ul><li>Start out with all connection weights and biases randomized
  36. 36. Present network with a sample input and a desired output
  37. 37. Then adjust the input weightings of its neurons so that the output becomes more like the desired one
  38. 38. The algorithm for this is called back-propagation </li></ul>
  39. 39. Do this 100000 times: What's this letter? Is it an A? No, it's a Q. What's this letter? Is it a B? No, it's a P. What's this letter? Is it a W? Yes, it's a W. etc.
  40. 40. Problem: Lots of neurons and connections <ul><li>Each input pixel is a neuron -> ~1000 inputs
  41. 41. At least one layer between the input and the output, of similar size -> 1 million connections -> lots of CPU time
  42. 42. All neurons in the hidden layer(s) are given the same inputs and targets, so they gravitate towards behaving the same </li></ul>
  43. 43. Result: It doesn't work <ul><li>Neural network takes forever to train
  44. 44. Gives pretty much the same response to all inputs: “Meh, it could be any letter really” </li></ul>
  45. 45. How to make it work: Local receptive fields <ul><li>Connect neurons in the hidden layer only to a small number of “local” input neurons
  46. 46. Different parts of hidden layer notice different bits about image
  47. 47. Far fewer connections </li></ul>
  48. 48. How to make it work: Weight sharing <ul><li>Use the same weight for all connections of neurons in the same relative position
  49. 49. Causes network to notice local features (eg horizontal lines) consistently </li></ul>-0.133 -0.133 -0.133 -0.133
  50. 50. How to make it work, cont'd <ul><li>Add more hidden layers, and parallel layers
  51. 51. Train network to output different patterns for each letter instead of just exciting one neuron per letter. NNs learn better if each output neuron is trained to be active roughly half of the time.
  52. 52. -> “Convolutional neural network” (LeCun '95) </li></ul>
  53. 53. The result: Longan <ul><li>Java-based OCR system I'm working on
  54. 54. Goals: </li><ul><li>Well-documented (eventually)
  55. 55. Easy to install
  56. 56. Easy to use
  57. 57. Reasonable accuracy </li></ul><li>Currently: </li><ul><li>Not well-documented: very experimental
  58. 58. Pretty accurate on clean sans-serif greyscale text </li></ul><li> </li></ul>
  59. 59. Further reading <ul><li>An introduction to neural networks:
  60. 60. Simple neural network implementation in Python:
  61. 61. Back propagation paper:
  62. 62. Convolutional neural networks:
  63. 63. Exhaustive NN FAQ: </li></ul>