SlideShare a Scribd company logo
1 of 47
OCR PROCESSING WITH
DEEP LEARNING: APPLY TO
VIETNAMESE DOCUMENTS
VIET-TRUNG TRAN, ANH PHI NGUYEN, KHUYEN NGUYEN
OUTLINE
• OCR overview
• History
• Pipelining
• Deep learning for OCR
• Motivation
• Connectionist temporal classification (CTC) network
• LSTM + CTC for sequence recognition
WHAT IS OCR
• Optical character recognition (optical character reader) (OCR) is the
mechanical or electronic conversion of images of typed, handwritten or
printed text into machine-encoded text
OCR TYPES
• Optical Character Recognition (OCR)
• Targets typewritten text, one character at a time
• Optical Word Recognition (OWR)
• Typewritten text, one word at a time
• Intelligent Character Recognition (ICR)
• Handwritten print script, one character at a time
• Intelligent Word Recognition (IWR)
• Handwritten, one word at a time
HISTORY OF OCR: TESSERACT OCR ENGINE
TIMELINE
TESSERACT SYSTEM ARCHITECTURE
ARCHITECTURE [CONT’D]
ADAPTIVE THRESHOLDING
PAGE LAYOUT ANALYSIS
Smith, Ray. "Hybrid page layout analysis via tab-stop
detection." Document Analysis and Recognition, 2009. ICDAR'09. 10th
International Conference on. IEEE, 2009.
IMAGE LEVEL PAGE LAYOUT ANALYSIS
• Using the morphological processing from Leptonica
• http://www.slideshare.net/versae/javier-de-larosacs9883-5912825
CONNECTED COMPONENT ANALYSIS
COLUMN FINDING
BLOCK FINDING
TESSERACT WORD RECOGNIZER
http://www.slideshare.net/temsolin/2-architecture-anddatastructures
FEATURES AND WORD CLASSIFIER
Classical character classification
SEGMENTATION GRAPH
CHAR SEGMENTATION, LANGUAGE MODEL AND
BEAM SEARCH
OCR CHALLENGES
1. Fonts specifics
Never overcome their ability to understand a limited numbers of fonts and page
formats
2. Character bounding boxes
3. Extracting features unreliable
4. Slow performance
TESSERACT TUTORIAL @ DAS 2014
RECENT IMPROVEMENTS
1. Multilanguages
2. Full layout analysis
3. Table detection
4. Equation detection
5. Better language models
6. Hand-written text
LSTM FOR TEXT RECOGNITION
MOTIVATION
• Segmentation is difficult for cursive or unconstrained text
• R. Smith, “History of the Tesseract OCR engine: what worked and
what didn’t ,” in DRR XX, San Francisco, USA, Feb. 2013.
• there was not a single method proposed for OCR, that can achieve
very low error rates without using aforementioned sophisticated
post-processing techniques.
RESEARCH BREAKTHROUGH
A. Graves, M. Liwicki, S. Fernandez, Bertolami, H. Bunke, and J.
Schmidhuber, “A Novel Connectionist System for Unconstrained
Handwriting Recognition,” IEEE Trans. on Pattern Analysis and Machine
Intelligence, vol. 31, no. 5, pp. 855–868, May 2008.
TEXT LINE NORMALIZATION
TEXT LINE RECOGNITION
MOTIVATION
• Real-world sequence learning task
• OCR (Optical character recognition)
• ASR (Automatic speech recognition)
• Requires
• prediction of sequences of labels from noisy, unsegmented input data
• Recurrent neural networks (RNN) can be used for sequence learning, but
ask for
• pre-segmented training data
• post-processing to transform outputs into label sequences
CONNECTIONIST TEMPORAL CLASSIFICATION
(CTC)
• Graves, Alex, et al. "Connectionist temporal classification: labelling
unsegmented sequence data with recurrent neural
networks." Proceedings of the 23rd international conference on Machine
learning. ACM, 2006.
• WHAT CTC IS ALL ABOUT?
•a novel method for training RNNs to label
unsegmented sequences directly
THE SPEECH RECOGNITION PROBLEM
DYNAMIC TIME WRAPERING
• Because the length of y might differ from (often longer than) l, so the
inference of l from y is actually a dynamic time warping problem.
CONNECTIONIST TEMPORAL CLASSIFICATION
• o transform the network outputs into a conditional probability
distribution over label sequences
• A CTC network has a softmax output layer with one more unit than there
are labels in L
• activations of the first |L| units are interpreted as the probabilities of observing the
corresponding labels at particular times
• activation of the extra unit is the probability of observing a ‘blank’, or no label
PREFIX SEARCH DECODING ON THE LABEL
ALPHABET X,Y
LONG SHORT-TERM MEMORY (LSTM)
• One type of RNN networks
• RNN vanishing gradient problem
• influence of a given input on the hidden layer, and therefore on the network output,
either decays or blows up exponentially as it cycles around the network’s recurrent
connections
• LSTM is designed to address vanishing gradient problem
• An LSTM hidden layer consists of recurrently connected subnets, called
memory blocks
• Each block contains a set of internal units, or cells, whose activation is
controlled by three multiplicative gates: the input gate, forget gate and
output gate
LSTM MEMORY BLOCK
FORGET GATE
INPUT GATE
OUTPUT GATE
LSTM -> CTC OUTPUT LAYER: OCR
DEMO TIME: OCR FOR VIETNAMESE DOCUMENTS
Thank you!
REFERENCES - CREDITS
• https://github.com/yiwangbaidu/notes/blob/master/CTC/CTC.pdf
• http://colah.github.io/posts/2015-08-Understanding-LSTMs/
• Ray Smith. Everything you always wanted to know about
Tesseract. Tesseract tutorial @ DAS 2014

More Related Content

What's hot

Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
Dustin Smith
 

What's hot (20)

Text mining
Text miningText mining
Text mining
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
META-LEARNING.pptx
META-LEARNING.pptxMETA-LEARNING.pptx
META-LEARNING.pptx
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine Learning(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine Learning
 
1.Introduction to deep learning
1.Introduction to deep learning1.Introduction to deep learning
1.Introduction to deep learning
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learning
 
Loan approval prediction based on machine learning approach
Loan approval prediction based on machine learning approachLoan approval prediction based on machine learning approach
Loan approval prediction based on machine learning approach
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Image captioning
Image captioningImage captioning
Image captioning
 
OCR (Optical Character Recognition)
OCR (Optical Character Recognition) OCR (Optical Character Recognition)
OCR (Optical Character Recognition)
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
Deep learning and Healthcare
Deep learning and HealthcareDeep learning and Healthcare
Deep learning and Healthcare
 
Intro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionIntro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer Vision
 
Information retrieval 14 fuzzy set models of ir
Information retrieval 14 fuzzy set models of irInformation retrieval 14 fuzzy set models of ir
Information retrieval 14 fuzzy set models of ir
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 

Similar to OCR processing with deep learning: Apply to Vietnamese documents

Long Short Term Memory LSTM
Long Short Term Memory LSTMLong Short Term Memory LSTM
Long Short Term Memory LSTM
Abdullah al Mamun
 
Tesseract OCR Engine
Tesseract OCR EngineTesseract OCR Engine
Tesseract OCR Engine
Raghu nath
 
is2015_poster
is2015_posteris2015_poster
is2015_poster
Jan Svec
 

Similar to OCR processing with deep learning: Apply to Vietnamese documents (20)

Long Short Term Memory LSTM
Long Short Term Memory LSTMLong Short Term Memory LSTM
Long Short Term Memory LSTM
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
Modi script character recognition
Modi script character recognitionModi script character recognition
Modi script character recognition
 
Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0
 
Recurrent neural networks for sequence learning and learning human identity f...
Recurrent neural networks for sequence learning and learning human identity f...Recurrent neural networks for sequence learning and learning human identity f...
Recurrent neural networks for sequence learning and learning human identity f...
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A Survey
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 
NextWordPrediction_ppt[1].pptx
NextWordPrediction_ppt[1].pptxNextWordPrediction_ppt[1].pptx
NextWordPrediction_ppt[1].pptx
 
Machine learning with R
Machine learning with RMachine learning with R
Machine learning with R
 
Tesseract OCR Engine
Tesseract OCR EngineTesseract OCR Engine
Tesseract OCR Engine
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
 
Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdf
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
 
is2015_poster
is2015_posteris2015_poster
is2015_poster
 
Deep learning fundamentals workshop
Deep learning fundamentals workshopDeep learning fundamentals workshop
Deep learning fundamentals workshop
 
131 133
131 133131 133
131 133
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
Post Quantum Cryptography: Technical Overview
Post Quantum Cryptography: Technical OverviewPost Quantum Cryptography: Technical Overview
Post Quantum Cryptography: Technical Overview
 

More from Viet-Trung TRAN

Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
Viet-Trung TRAN
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
Viet-Trung TRAN
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
Viet-Trung TRAN
 

More from Viet-Trung TRAN (20)

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
 
Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on Spark
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
 
GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar items
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applications
 

Recently uploaded

Recently uploaded (20)

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

OCR processing with deep learning: Apply to Vietnamese documents

  • 1. OCR PROCESSING WITH DEEP LEARNING: APPLY TO VIETNAMESE DOCUMENTS VIET-TRUNG TRAN, ANH PHI NGUYEN, KHUYEN NGUYEN
  • 2. OUTLINE • OCR overview • History • Pipelining • Deep learning for OCR • Motivation • Connectionist temporal classification (CTC) network • LSTM + CTC for sequence recognition
  • 3. WHAT IS OCR • Optical character recognition (optical character reader) (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text
  • 4. OCR TYPES • Optical Character Recognition (OCR) • Targets typewritten text, one character at a time • Optical Word Recognition (OWR) • Typewritten text, one word at a time • Intelligent Character Recognition (ICR) • Handwritten print script, one character at a time • Intelligent Word Recognition (IWR) • Handwritten, one word at a time
  • 5. HISTORY OF OCR: TESSERACT OCR ENGINE TIMELINE
  • 9. PAGE LAYOUT ANALYSIS Smith, Ray. "Hybrid page layout analysis via tab-stop detection." Document Analysis and Recognition, 2009. ICDAR'09. 10th International Conference on. IEEE, 2009.
  • 10. IMAGE LEVEL PAGE LAYOUT ANALYSIS • Using the morphological processing from Leptonica • http://www.slideshare.net/versae/javier-de-larosacs9883-5912825
  • 15. FEATURES AND WORD CLASSIFIER Classical character classification
  • 17. CHAR SEGMENTATION, LANGUAGE MODEL AND BEAM SEARCH
  • 18. OCR CHALLENGES 1. Fonts specifics Never overcome their ability to understand a limited numbers of fonts and page formats 2. Character bounding boxes 3. Extracting features unreliable 4. Slow performance
  • 20. RECENT IMPROVEMENTS 1. Multilanguages 2. Full layout analysis 3. Table detection 4. Equation detection 5. Better language models 6. Hand-written text
  • 21. LSTM FOR TEXT RECOGNITION
  • 22. MOTIVATION • Segmentation is difficult for cursive or unconstrained text • R. Smith, “History of the Tesseract OCR engine: what worked and what didn’t ,” in DRR XX, San Francisco, USA, Feb. 2013. • there was not a single method proposed for OCR, that can achieve very low error rates without using aforementioned sophisticated post-processing techniques.
  • 23. RESEARCH BREAKTHROUGH A. Graves, M. Liwicki, S. Fernandez, Bertolami, H. Bunke, and J. Schmidhuber, “A Novel Connectionist System for Unconstrained Handwriting Recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 855–868, May 2008.
  • 26. MOTIVATION • Real-world sequence learning task • OCR (Optical character recognition) • ASR (Automatic speech recognition) • Requires • prediction of sequences of labels from noisy, unsegmented input data • Recurrent neural networks (RNN) can be used for sequence learning, but ask for • pre-segmented training data • post-processing to transform outputs into label sequences
  • 27. CONNECTIONIST TEMPORAL CLASSIFICATION (CTC) • Graves, Alex, et al. "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks." Proceedings of the 23rd international conference on Machine learning. ACM, 2006. • WHAT CTC IS ALL ABOUT? •a novel method for training RNNs to label unsegmented sequences directly
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34. DYNAMIC TIME WRAPERING • Because the length of y might differ from (often longer than) l, so the inference of l from y is actually a dynamic time warping problem.
  • 35.
  • 36.
  • 37. CONNECTIONIST TEMPORAL CLASSIFICATION • o transform the network outputs into a conditional probability distribution over label sequences • A CTC network has a softmax output layer with one more unit than there are labels in L • activations of the first |L| units are interpreted as the probabilities of observing the corresponding labels at particular times • activation of the extra unit is the probability of observing a ‘blank’, or no label
  • 38. PREFIX SEARCH DECODING ON THE LABEL ALPHABET X,Y
  • 39.
  • 40. LONG SHORT-TERM MEMORY (LSTM) • One type of RNN networks • RNN vanishing gradient problem • influence of a given input on the hidden layer, and therefore on the network output, either decays or blows up exponentially as it cycles around the network’s recurrent connections • LSTM is designed to address vanishing gradient problem • An LSTM hidden layer consists of recurrently connected subnets, called memory blocks • Each block contains a set of internal units, or cells, whose activation is controlled by three multiplicative gates: the input gate, forget gate and output gate
  • 45. LSTM -> CTC OUTPUT LAYER: OCR
  • 46. DEMO TIME: OCR FOR VIETNAMESE DOCUMENTS Thank you!
  • 47. REFERENCES - CREDITS • https://github.com/yiwangbaidu/notes/blob/master/CTC/CTC.pdf • http://colah.github.io/posts/2015-08-Understanding-LSTMs/ • Ray Smith. Everything you always wanted to know about Tesseract. Tesseract tutorial @ DAS 2014

Editor's Notes

  1. http://www.slideshare.net/100002968637682/seminar-p2
  2. http://www.slideshare.net/DocuFi/improve-ocr-accuracy-with-cleanup?qid=a439a1d4-ad96-4fb3-b376-9b301e58674b&v=default&b=&from_search=13
  3. http://www.slideshare.net/temsolin/6-char-segmentation