Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
OCR PROCESSING WITH
DEEP LEARNING: APPLY TO
VIETNAMESE DOCUMENTS
VIET-TRUNG TRAN, ANH PHI NGUYEN, KHUYEN NGUYEN
OUTLINE
• OCR overview
• History
• Pipelining
• Deep learning for OCR
• Motivation
• Connectionist temporal classification...
WHAT IS OCR
• Optical character recognition (optical character reader) (OCR) is the
mechanical or electronic conversion of...
OCR TYPES
• Optical Character Recognition (OCR)
• Targets typewritten text, one character at a time
• Optical Word Recogni...
HISTORY OF OCR: TESSERACT OCR ENGINE
TIMELINE
TESSERACT SYSTEM ARCHITECTURE
ARCHITECTURE [CONT’D]
ADAPTIVE THRESHOLDING
PAGE LAYOUT ANALYSIS
Smith, Ray. "Hybrid page layout analysis via tab-stop
detection." Document Analysis and Recognition, ...
IMAGE LEVEL PAGE LAYOUT ANALYSIS
• Using the morphological processing from Leptonica
• http://www.slideshare.net/versae/ja...
CONNECTED COMPONENT ANALYSIS
COLUMN FINDING
BLOCK FINDING
TESSERACT WORD RECOGNIZER
http://www.slideshare.net/temsolin/2-architecture-anddatastructures
FEATURES AND WORD CLASSIFIER
Classical character classification
SEGMENTATION GRAPH
CHAR SEGMENTATION, LANGUAGE MODEL AND
BEAM SEARCH
OCR CHALLENGES
1. Fonts specifics
Never overcome their ability to understand a limited numbers of fonts and page
formats
2...
TESSERACT TUTORIAL @ DAS 2014
RECENT IMPROVEMENTS
1. Multilanguages
2. Full layout analysis
3. Table detection
4. Equation detection
5. Better language ...
LSTM FOR TEXT RECOGNITION
MOTIVATION
• Segmentation is difficult for cursive or unconstrained text
• R. Smith, “History of the Tesseract OCR engine:...
RESEARCH BREAKTHROUGH
A. Graves, M. Liwicki, S. Fernandez, Bertolami, H. Bunke, and J.
Schmidhuber, “A Novel Connectionist...
TEXT LINE NORMALIZATION
TEXT LINE RECOGNITION
MOTIVATION
• Real-world sequence learning task
• OCR (Optical character recognition)
• ASR (Automatic speech recognition)
...
CONNECTIONIST TEMPORAL CLASSIFICATION
(CTC)
• Graves, Alex, et al. "Connectionist temporal classification: labelling
unseg...
THE SPEECH RECOGNITION PROBLEM
DYNAMIC TIME WRAPERING
• Because the length of y might differ from (often longer than) l, so the
inference of l from y is ...
CONNECTIONIST TEMPORAL CLASSIFICATION
• o transform the network outputs into a conditional probability
distribution over l...
PREFIX SEARCH DECODING ON THE LABEL
ALPHABET X,Y
LONG SHORT-TERM MEMORY (LSTM)
• One type of RNN networks
• RNN vanishing gradient problem
• influence of a given input on ...
LSTM MEMORY BLOCK
FORGET GATE
INPUT GATE
OUTPUT GATE
LSTM -> CTC OUTPUT LAYER: OCR
DEMO TIME: OCR FOR VIETNAMESE DOCUMENTS
Thank you!
REFERENCES - CREDITS
• https://github.com/yiwangbaidu/notes/blob/master/CTC/CTC.pdf
• http://colah.github.io/posts/2015-08...
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
Upcoming SlideShare
Loading in …5
×

OCR processing with deep learning: Apply to Vietnamese documents

OCR processing with deep learning: Apply to Vietnamese documents

  • Login to see the comments

OCR processing with deep learning: Apply to Vietnamese documents

  1. 1. OCR PROCESSING WITH DEEP LEARNING: APPLY TO VIETNAMESE DOCUMENTS VIET-TRUNG TRAN, ANH PHI NGUYEN, KHUYEN NGUYEN
  2. 2. OUTLINE • OCR overview • History • Pipelining • Deep learning for OCR • Motivation • Connectionist temporal classification (CTC) network • LSTM + CTC for sequence recognition
  3. 3. WHAT IS OCR • Optical character recognition (optical character reader) (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text
  4. 4. OCR TYPES • Optical Character Recognition (OCR) • Targets typewritten text, one character at a time • Optical Word Recognition (OWR) • Typewritten text, one word at a time • Intelligent Character Recognition (ICR) • Handwritten print script, one character at a time • Intelligent Word Recognition (IWR) • Handwritten, one word at a time
  5. 5. HISTORY OF OCR: TESSERACT OCR ENGINE TIMELINE
  6. 6. TESSERACT SYSTEM ARCHITECTURE
  7. 7. ARCHITECTURE [CONT’D]
  8. 8. ADAPTIVE THRESHOLDING
  9. 9. PAGE LAYOUT ANALYSIS Smith, Ray. "Hybrid page layout analysis via tab-stop detection." Document Analysis and Recognition, 2009. ICDAR'09. 10th International Conference on. IEEE, 2009.
  10. 10. IMAGE LEVEL PAGE LAYOUT ANALYSIS • Using the morphological processing from Leptonica • http://www.slideshare.net/versae/javier-de-larosacs9883-5912825
  11. 11. CONNECTED COMPONENT ANALYSIS
  12. 12. COLUMN FINDING
  13. 13. BLOCK FINDING
  14. 14. TESSERACT WORD RECOGNIZER http://www.slideshare.net/temsolin/2-architecture-anddatastructures
  15. 15. FEATURES AND WORD CLASSIFIER Classical character classification
  16. 16. SEGMENTATION GRAPH
  17. 17. CHAR SEGMENTATION, LANGUAGE MODEL AND BEAM SEARCH
  18. 18. OCR CHALLENGES 1. Fonts specifics Never overcome their ability to understand a limited numbers of fonts and page formats 2. Character bounding boxes 3. Extracting features unreliable 4. Slow performance
  19. 19. TESSERACT TUTORIAL @ DAS 2014
  20. 20. RECENT IMPROVEMENTS 1. Multilanguages 2. Full layout analysis 3. Table detection 4. Equation detection 5. Better language models 6. Hand-written text
  21. 21. LSTM FOR TEXT RECOGNITION
  22. 22. MOTIVATION • Segmentation is difficult for cursive or unconstrained text • R. Smith, “History of the Tesseract OCR engine: what worked and what didn’t ,” in DRR XX, San Francisco, USA, Feb. 2013. • there was not a single method proposed for OCR, that can achieve very low error rates without using aforementioned sophisticated post-processing techniques.
  23. 23. RESEARCH BREAKTHROUGH A. Graves, M. Liwicki, S. Fernandez, Bertolami, H. Bunke, and J. Schmidhuber, “A Novel Connectionist System for Unconstrained Handwriting Recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 855–868, May 2008.
  24. 24. TEXT LINE NORMALIZATION
  25. 25. TEXT LINE RECOGNITION
  26. 26. MOTIVATION • Real-world sequence learning task • OCR (Optical character recognition) • ASR (Automatic speech recognition) • Requires • prediction of sequences of labels from noisy, unsegmented input data • Recurrent neural networks (RNN) can be used for sequence learning, but ask for • pre-segmented training data • post-processing to transform outputs into label sequences
  27. 27. CONNECTIONIST TEMPORAL CLASSIFICATION (CTC) • Graves, Alex, et al. "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks." Proceedings of the 23rd international conference on Machine learning. ACM, 2006. • WHAT CTC IS ALL ABOUT? •a novel method for training RNNs to label unsegmented sequences directly
  28. 28. THE SPEECH RECOGNITION PROBLEM
  29. 29. DYNAMIC TIME WRAPERING • Because the length of y might differ from (often longer than) l, so the inference of l from y is actually a dynamic time warping problem.
  30. 30. CONNECTIONIST TEMPORAL CLASSIFICATION • o transform the network outputs into a conditional probability distribution over label sequences • A CTC network has a softmax output layer with one more unit than there are labels in L • activations of the first |L| units are interpreted as the probabilities of observing the corresponding labels at particular times • activation of the extra unit is the probability of observing a ‘blank’, or no label
  31. 31. PREFIX SEARCH DECODING ON THE LABEL ALPHABET X,Y
  32. 32. LONG SHORT-TERM MEMORY (LSTM) • One type of RNN networks • RNN vanishing gradient problem • influence of a given input on the hidden layer, and therefore on the network output, either decays or blows up exponentially as it cycles around the network’s recurrent connections • LSTM is designed to address vanishing gradient problem • An LSTM hidden layer consists of recurrently connected subnets, called memory blocks • Each block contains a set of internal units, or cells, whose activation is controlled by three multiplicative gates: the input gate, forget gate and output gate
  33. 33. LSTM MEMORY BLOCK
  34. 34. FORGET GATE
  35. 35. INPUT GATE
  36. 36. OUTPUT GATE
  37. 37. LSTM -> CTC OUTPUT LAYER: OCR
  38. 38. DEMO TIME: OCR FOR VIETNAMESE DOCUMENTS Thank you!
  39. 39. REFERENCES - CREDITS • https://github.com/yiwangbaidu/notes/blob/master/CTC/CTC.pdf • http://colah.github.io/posts/2015-08-Understanding-LSTMs/ • Ray Smith. Everything you always wanted to know about Tesseract. Tesseract tutorial @ DAS 2014

×