SlideShare a Scribd company logo
1 of 77
Download to read offline
Entering the Fourth DimensionEntering the Fourth Dimension
of OCR withof OCR with
TesseractTesseract
Hanno Embregts @hannotify
What is OCR?What is OCR?What is OCR?What is OCR?What is OCR?
/ rst dimension/ rst dimension/ rst dimension/ rst dimension/ rst dimension
Atechnology that can take writtenAtechnology that can take written
words and convert then aes mowords and convert then aes mo
aulerseadans Yom, proved meyren neaulerseadans Yom, proved meyren ne
fgntrom sng tne cortec cars ometimes,fgntrom sng tne cortec cars ometimes,
ate fant gait sae and ptch, dar engugnate fant gait sae and ptch, dar engugn
onthe pase, ana Your Bropared toonthe pase, ana Your Bropared to
spend Several cantinas covtecing al mespend Several cantinas covtecing al me
one PrP areee e SP es ts anine Oe tatone PrP areee e SP es ts anine Oe tat
came autas zerseecame autas zersee
A technology that can take writtenA technology that can take written
words and convert them back intowords and convert them back into
computer-readable form, providedcomputer-readable form, provided
they're in the right font, using thethey're in the right font, using the
correct colors sometimes, at the rightcorrect colors sometimes, at the right
point size and pitch, dark enough onpoint size and pitch, dark enough on
the paper, and you're prepared to spendthe paper, and you're prepared to spend
several centuries correcting all the onesseveral centuries correcting all the ones
that came out as l's, all the O's thatthat came out as l's, all the O's that
came out as zeroes, and all the colonscame out as zeroes, and all the colons
that come out like semicolons.that come out like semicolons.
A Proper De nitionA Proper De nition
Optical character recognitionOptical character recognition (...) is(...) is
the mechanical or electronic conversionthe mechanical or electronic conversion
of images of typed, handwritten orof images of typed, handwritten or
printed text into machine-encoded text,printed text into machine-encoded text,
whether from a scanned document, awhether from a scanned document, a
photo of a document, a scene-photophoto of a document, a scene-photo
(...) or from subtitle text superimposed(...) or from subtitle text superimposed
on an image.on an image.
Pattern recognitionPattern recognition
OCR-A fontOCR-A font
Feature detectionFeature detection
19291929192919291929
Gustav Tauschek patents a basic OCR
'reading machine'.
1960s1960s1960s1960s1960s
Postal services start using OCR for
mail sorting.
19931993199319931993
The Apple Newton becomes the rst
handheld computer to feature
handwriting recognition.
ApplicationsApplications
Financial transfersFinancial transfersFinancial transfersFinancial transfersFinancial transfers
Catch me if you can!
Book digitizationBook digitizationBook digitizationBook digitizationBook digitization
Also supports Ctrl+F.
Passport scanningPassport scanningPassport scanningPassport scanningPassport scanning
Gets you to your gate in time.
Number plateNumber plateNumber plateNumber plateNumber plate
recognitionrecognitionrecognitionrecognitionrecognition
Get your speeding ticket even faster!
GettingGettingGettingGettingGetting
StartedStartedStartedStartedStarted
/ second dimension/ second dimension/ second dimension/ second dimension/ second dimension
TesseractTesseract
Development started at Hewlett-Packard in 1985
Ported to Windows in 1996
Released as open-source in 2005
Google sponsors development of Tesseract since
2006
(( ))https://github.com/tesseract-ocr/tesseracthttps://github.com/tesseract-ocr/tesseract
โ€”โ€” Anthony KayAnthony Kay
in "Linux Journal", July 2007in "Linux Journal", July 2007
"The core feature, text recognition, is"The core feature, text recognition, is
drastically better than anything elsedrastically better than anything else
I've tried from the Open SourceI've tried from the Open Source
community."community."
FeaturesFeatures
character recognition
support for Unicode
input: JPEG, GIF, PNG, TIFF or BMP
output: searchable PDF, TSV, plain text or HOCR
HOCR exampleHOCR example
<p class="ocr_par" lang="deu" title="bbox930">
<span class="ocr_line" title="bbox 348 797 1482 838; baseline -
<span class="ocrx_word" title="bbox 348 805 402 832; x_wconf
<span class="ocrx_word" title="bbox 421 804 697 832; x_wconf
<span class="ocrx_word" title="bbox 717 803 755 831; x_wconf
<span class="ocrx_word" title="bbox 773 803 802 831; x_wconf
<span class="ocrx_word" title="bbox 821 803 917 830; x_wconf
<span class="ocrx_word" title="bbox 935 799 1180 838; x_wconf
<span class="ocrx_word" title="bbox 1199 797 1343 832; x_wcon
<span class="ocrx_word" title="bbox 1362 805 1399 823; x_wcon
<span class="ocrx_word" title="bbox 1417 x_wconf 96">ver-</sp
</span>
</p>
Used by GoogleUsed by Google
For text detection on mobile devices
In video
In Gmail image spam detection
New featuresNew features
in v3.0in v3.0
support for over 100 languages
page layout analysis
in v4.0in v4.0
LSTM recognition engine
Tess4JTess4JTess4JTess4JTess4J
A Java JNA wrapper for Tesseract
Tess4J featuresTess4J features
PDF input
Multi-page TIFF input
Image optimization
(( ))https://github.com/nguyenq/tess4jhttps://github.com/nguyenq/tess4j
DemoDemo
Install Tesseract (for multilanguage support)
Add Tess4J dependency
Convert image to plain text (English)
Convert image to plain text (Greek)
Choosing theChoosing theChoosing theChoosing theChoosing the
Right LibraryRight LibraryRight LibraryRight LibraryRight Library
/ third dimension/ third dimension/ third dimension/ third dimension/ third dimension
CompetitorsCompetitors
ABBYY FineReaderABBYY FineReader
Development started at ABBYY in 1993
Supports 192 languages
20 million users worldwide
Outputs to MS O ce, RTF, HTML, (searchable) PDF
and plain text
(( ))https://www.abbyy.com/en-eu/ nereaderhttps://www.abbyy.com/en-eu/ nereader
Google Cloud VisionGoogle Cloud Vision
APIAPI
Launched in 2016 by Google
Supports 56 languages
Outputs to JSON
Integrates nicely with Google Images and Google
SafeSearch
(( ))https://cloud.google.com/vision/https://cloud.google.com/vision/
ABBYY GCV Tesseract
ABBYY GCV Tesseract
costs $200
per computer
$1.50
per 1000 images, per
month
$0
ABBYY GCV Tesseract
costs $200
per computer
$1.50
per 1000 images, per
month
$0
languages 192 56 102
ABBYY GCV Tesseract
costs $200
per computer
$1.50
per 1000 images, per
month
$0
languages 192 56 102
Java integration through SDK through REST
API
through JNA
wrapper
ABBYY GCV Tesseract
costs $200
per computer
$1.50
per 1000 images, per
month
$0
languages 192 56 102
Java integration through SDK through REST
API
through JNA
wrapper
handwriting
recognition
'handprinted'
text
supported not supported
ABBYY GCV Tesseract
costs $200
per computer
$1.50
per 1000 images, per
month
$0
languages 192 56 102
Java integration through SDK through REST
API
through JNA
wrapper
handwriting
recognition
'handprinted'
text
supported not supported
custom training supported not supported supported
ABBYY GCV Tesseract
costs $200
per computer
$1.50
per 1000 images, per
month
$0
languages 192 56 102
Java integration through SDK through REST
API
through JNA
wrapper
handwriting
recognition
'handprinted'
text
supported not supported
custom training supported not supported supported
accuracy 9/10 8/10 7/10
Case studyCase studyCase studyCase studyCase study
Paper archives going digital.
AdvancedAdvancedAdvancedAdvancedAdvanced
FeaturesFeaturesFeaturesFeaturesFeatures
/ fourth dimension/ fourth dimension/ fourth dimension/ fourth dimension/ fourth dimension
What AdvancedWhat Advanced
Features?Features?
Reporting con dence
Multiple languages in a single document
Image optimization
Speed/accuracy tradeo s
Training
Improving accuracyImproving accuracyImproving accuracyImproving accuracyImproving accuracy
To better recognize the expected
input documents.
What is con dence?What is con dence?
What is con dence?What is con dence?
Reporting con denceReporting con dence
Tess4J supports two return types:Tess4J supports two return types:
String (containing the OCR'ed text)
List<OCRResult> (OCR result is written to a le)
int confidence
List<Word> words
Multiple languages in aMultiple languages in a
single documentsingle document
Concatenate the language codes and separate themConcatenate the language codes and separate them
by a plus sign:by a plus sign:
tesseract.setLanguage("eng+nld");
DemoDemo
Reporting con dence
Multiple languages in a single document
Image optimizationImage optimization
Tess4J is bundled with theTess4J is bundled with the ImageHelperImageHelper class, whichclass, which
contains a few image optimization tricks.contains a few image optimization tricks.
Image optimizationImage optimization
Tess4J is bundled with theTess4J is bundled with the ImageHelperImageHelper class, whichclass, which
contains a few image optimization tricks.contains a few image optimization tricks.
convertImageToBinary(BufferedImage
image)
Image optimizationImage optimization
Tess4J is bundled with theTess4J is bundled with the ImageHelperImageHelper class, whichclass, which
contains a few image optimization tricks.contains a few image optimization tricks.
convertImageToBinary(BufferedImage
image)
convertImageToGrayscale(BufferedImage
image)
Image optimizationImage optimization
Tess4J is bundled with theTess4J is bundled with the ImageHelperImageHelper class, whichclass, which
contains a few image optimization tricks.contains a few image optimization tricks.
convertImageToBinary(BufferedImage
image)
convertImageToGrayscale(BufferedImage
image)
invertImageColor(BufferedImage image)
Image optimizationImage optimization
Tess4J is bundled with theTess4J is bundled with the ImageHelperImageHelper class, whichclass, which
contains a few image optimization tricks.contains a few image optimization tricks.
convertImageToBinary(BufferedImage
image)
convertImageToGrayscale(BufferedImage
image)
invertImageColor(BufferedImage image)
rotateImage(BufferedImage image,
double angle)
Still having problems?Still having problems?
https://github.com/tesseract-https://github.com/tesseract-
ocr/tesseract/wiki/ImproveQualityocr/tesseract/wiki/ImproveQuality
Speed/accuracySpeed/accuracy
tradeo stradeo s
Two types of training data:Two types of training data:
https://github.com/tesseract-ocr/tessdata_fast
https://github.com/tesseract-ocr/tessdata_best
DemoDemo
Image optimization
Speed/accuracy tradeo s
Training dataTraining data
400,000 textlines
4500 fonts
(for Latin-based languages)(for Latin-based languages)
Custom trainingCustom training
Custom trainingCustom training
Fine tune (e.g. for an unusual font)
Custom trainingCustom training
Fine tune (e.g. for an unusual font)
Cut o the top layer (e.g. for a new language)
Custom trainingCustom training
Fine tune (e.g. for an unusual font)
Cut o the top layer (e.g. for a new language)
Retrain from scratch (e.g. don't do this!)
FurtherFurtherFurtherFurtherFurther
readingreadingreadingreadingreading
Further readingFurther reading
"An Overview of the Tesseract OCR Engine" by Ray
Smith
( )
Useful resourcesUseful resources
Tesseract on Github
( )
Try Tesseract online
( )
https://research.google.com/pubs/archive/33418.pdf
https://github.com/tesseract-ocr/tesseract
newocr.com
AnyAnyAnyAnyAny
questions?questions?questions?questions?questions?
Thank you! โ˜บย Thank you! โ˜บย 
https://hannotify.github.io
@hannotify
hanno.embregts@infosupport.com

More Related Content

What's hot

Introduction to python
Introduction to pythonIntroduction to python
Introduction to pythonSyed Zaid Irshad
ย 
Tesseract OCR Engine
Tesseract OCR EngineTesseract OCR Engine
Tesseract OCR EngineRaghu nath
ย 
Introduction to Python Basics Programming
Introduction to Python Basics ProgrammingIntroduction to Python Basics Programming
Introduction to Python Basics ProgrammingCollaboration Technologies
ย 
Python - An Introduction
Python - An IntroductionPython - An Introduction
Python - An IntroductionSwarit Wadhe
ย 
Python presentation by Monu Sharma
Python presentation by Monu SharmaPython presentation by Monu Sharma
Python presentation by Monu SharmaMayank Sharma
ย 
Python basics
Python basicsPython basics
Python basicsJyoti shukla
ย 
Intro to Python
Intro to PythonIntro to Python
Intro to Pythonprimeteacher32
ย 
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...pythoncharmers
ย 
Python Tutorial for Beginner
Python Tutorial for BeginnerPython Tutorial for Beginner
Python Tutorial for Beginnerrajkamaltibacademy
ย 
Python Tutorial Part 2
Python Tutorial Part 2Python Tutorial Part 2
Python Tutorial Part 2Haitham El-Ghareeb
ย 
Introduction to python for Beginners
Introduction to python for Beginners Introduction to python for Beginners
Introduction to python for Beginners Sujith Kumar
ย 
Python programming introduction
Python programming introductionPython programming introduction
Python programming introductionSiddique Ibrahim
ย 
Python final presentation kirti ppt1
Python final presentation kirti ppt1Python final presentation kirti ppt1
Python final presentation kirti ppt1Kirti Verma
ย 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to pythonJaya Kumari
ย 
Python course syllabus
Python course syllabusPython course syllabus
Python course syllabusSugantha T
ย 
Presentation on python
Presentation on pythonPresentation on python
Presentation on pythonVenkat Projects
ย 
Python-00 | Introduction and installing
Python-00 | Introduction and installingPython-00 | Introduction and installing
Python-00 | Introduction and installingMohd Sajjad
ย 
Why I Love Python
Why I Love PythonWhy I Love Python
Why I Love Pythondidip
ย 
Python | What is Python | History of Python | Python Tutorial
Python | What is Python | History of Python | Python TutorialPython | What is Python | History of Python | Python Tutorial
Python | What is Python | History of Python | Python TutorialQA TrainingHub
ย 
Python presentation
Python presentationPython presentation
Python presentationgaganapponix
ย 

What's hot (20)

Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
ย 
Tesseract OCR Engine
Tesseract OCR EngineTesseract OCR Engine
Tesseract OCR Engine
ย 
Introduction to Python Basics Programming
Introduction to Python Basics ProgrammingIntroduction to Python Basics Programming
Introduction to Python Basics Programming
ย 
Python - An Introduction
Python - An IntroductionPython - An Introduction
Python - An Introduction
ย 
Python presentation by Monu Sharma
Python presentation by Monu SharmaPython presentation by Monu Sharma
Python presentation by Monu Sharma
ย 
Python basics
Python basicsPython basics
Python basics
ย 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
ย 
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
ย 
Python Tutorial for Beginner
Python Tutorial for BeginnerPython Tutorial for Beginner
Python Tutorial for Beginner
ย 
Python Tutorial Part 2
Python Tutorial Part 2Python Tutorial Part 2
Python Tutorial Part 2
ย 
Introduction to python for Beginners
Introduction to python for Beginners Introduction to python for Beginners
Introduction to python for Beginners
ย 
Python programming introduction
Python programming introductionPython programming introduction
Python programming introduction
ย 
Python final presentation kirti ppt1
Python final presentation kirti ppt1Python final presentation kirti ppt1
Python final presentation kirti ppt1
ย 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
ย 
Python course syllabus
Python course syllabusPython course syllabus
Python course syllabus
ย 
Presentation on python
Presentation on pythonPresentation on python
Presentation on python
ย 
Python-00 | Introduction and installing
Python-00 | Introduction and installingPython-00 | Introduction and installing
Python-00 | Introduction and installing
ย 
Why I Love Python
Why I Love PythonWhy I Love Python
Why I Love Python
ย 
Python | What is Python | History of Python | Python Tutorial
Python | What is Python | History of Python | Python TutorialPython | What is Python | History of Python | Python Tutorial
Python | What is Python | History of Python | Python Tutorial
ย 
Python presentation
Python presentationPython presentation
Python presentation
ย 

Similar to Entering the Fourth Dimension of OCR with Tesseract

Ocr abstract
Ocr abstractOcr abstract
Ocr abstractPunya Prakash
ย 
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCRA SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCRIRJET Journal
ย 
OPTICAL CHARACTER RECOGNIZATION NEERAJ.pptx
OPTICAL CHARACTER RECOGNIZATION  NEERAJ.pptxOPTICAL CHARACTER RECOGNIZATION  NEERAJ.pptx
OPTICAL CHARACTER RECOGNIZATION NEERAJ.pptxNeerajBudhlakoti
ย 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Karan Panjwani
ย 
Evgen Terpil "OCR in the Wild World of Social Media"
Evgen Terpil "OCR in the Wild World of Social Media"Evgen Terpil "OCR in the Wild World of Social Media"
Evgen Terpil "OCR in the Wild World of Social Media"Fwdays
ย 
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...ijiert bestjournal
ย 
Bne demoday postcorrection_and_profiler
Bne demoday postcorrection_and_profilerBne demoday postcorrection_and_profiler
Bne demoday postcorrection_and_profilerIMPACT Centre of Competence
ย 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition systemVijay Apurva
ย 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character RecognitionRahul Mallik
ย 
Postcorrection and profiler_bne_demoday
Postcorrection and profiler_bne_demodayPostcorrection and profiler_bne_demoday
Postcorrection and profiler_bne_demodayIMPACT Centre of Competence
ย 
BL Demo Day - July2011 - (7) OCR Profiler and Post-Correction
BL Demo Day - July2011 - (7) OCR Profiler and Post-CorrectionBL Demo Day - July2011 - (7) OCR Profiler and Post-Correction
BL Demo Day - July2011 - (7) OCR Profiler and Post-CorrectionIMPACT Centre of Competence
ย 
4Developers 2015: Talking and listening to web pages - Aurelio De Rosa
4Developers 2015: Talking and listening to web pages - Aurelio De Rosa4Developers 2015: Talking and listening to web pages - Aurelio De Rosa
4Developers 2015: Talking and listening to web pages - Aurelio De RosaPROIDEA
ย 
OCR 's Functions
OCR 's FunctionsOCR 's Functions
OCR 's Functionsprithvi764
ย 
How to create a corpus of machine-readable texts: challenges and solutions
How to create a corpus of machine-readable texts: challenges and solutionsHow to create a corpus of machine-readable texts: challenges and solutions
How to create a corpus of machine-readable texts: challenges and solutionsMonika Renate Barget
ย 
Python intro01classes in_navi_mumbai
Python intro01classes in_navi_mumbaiPython intro01classes in_navi_mumbai
Python intro01classes in_navi_mumbaivibrantuser
ย 
Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Basis Technology
ย 
Google Cloud Platform Munich
Google Cloud Platform MunichGoogle Cloud Platform Munich
Google Cloud Platform MunichVMware Tanzu
ย 
69. OCR meaning.pdf
69. OCR meaning.pdf69. OCR meaning.pdf
69. OCR meaning.pdfBarcode Live
ย 

Similar to Entering the Fourth Dimension of OCR with Tesseract (20)

Entering the Fourth Dimension of OCR with Tesseract
Entering the Fourth Dimension of OCR with TesseractEntering the Fourth Dimension of OCR with Tesseract
Entering the Fourth Dimension of OCR with Tesseract
ย 
Ocr abstract
Ocr abstractOcr abstract
Ocr abstract
ย 
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCRA SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
ย 
OPTICAL CHARACTER RECOGNIZATION NEERAJ.pptx
OPTICAL CHARACTER RECOGNIZATION  NEERAJ.pptxOPTICAL CHARACTER RECOGNIZATION  NEERAJ.pptx
OPTICAL CHARACTER RECOGNIZATION NEERAJ.pptx
ย 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )
ย 
Evgen Terpil "OCR in the Wild World of Social Media"
Evgen Terpil "OCR in the Wild World of Social Media"Evgen Terpil "OCR in the Wild World of Social Media"
Evgen Terpil "OCR in the Wild World of Social Media"
ย 
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
ย 
Bne demoday postcorrection_and_profiler
Bne demoday postcorrection_and_profilerBne demoday postcorrection_and_profiler
Bne demoday postcorrection_and_profiler
ย 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition system
ย 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
ย 
Postcorrection and profiler_bne_demoday
Postcorrection and profiler_bne_demodayPostcorrection and profiler_bne_demoday
Postcorrection and profiler_bne_demoday
ย 
BL Demo Day - July2011 - (7) OCR Profiler and Post-Correction
BL Demo Day - July2011 - (7) OCR Profiler and Post-CorrectionBL Demo Day - July2011 - (7) OCR Profiler and Post-Correction
BL Demo Day - July2011 - (7) OCR Profiler and Post-Correction
ย 
4Developers 2015: Talking and listening to web pages - Aurelio De Rosa
4Developers 2015: Talking and listening to web pages - Aurelio De Rosa4Developers 2015: Talking and listening to web pages - Aurelio De Rosa
4Developers 2015: Talking and listening to web pages - Aurelio De Rosa
ย 
OCR 's Functions
OCR 's FunctionsOCR 's Functions
OCR 's Functions
ย 
How to create a corpus of machine-readable texts: challenges and solutions
How to create a corpus of machine-readable texts: challenges and solutionsHow to create a corpus of machine-readable texts: challenges and solutions
How to create a corpus of machine-readable texts: challenges and solutions
ย 
Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...
Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...
Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...
ย 
Python intro01classes in_navi_mumbai
Python intro01classes in_navi_mumbaiPython intro01classes in_navi_mumbai
Python intro01classes in_navi_mumbai
ย 
Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020
ย 
Google Cloud Platform Munich
Google Cloud Platform MunichGoogle Cloud Platform Munich
Google Cloud Platform Munich
ย 
69. OCR meaning.pdf
69. OCR meaning.pdf69. OCR meaning.pdf
69. OCR meaning.pdf
ย 

More from ๐ŸŽค Hanno Embregts ๐ŸŽธ

Pattern Matching - Small Enhancement or Major Feature? from Developer Week 202
Pattern Matching - Small Enhancement or Major Feature? from Developer Week 202Pattern Matching - Small Enhancement or Major Feature? from Developer Week 202
Pattern Matching - Small Enhancement or Major Feature? from Developer Week 202๐ŸŽค Hanno Embregts ๐ŸŽธ
ย 
"Will Git Be Around Forever? A List of Possible Successors" from Devoxx 2022
"Will Git Be Around Forever? A List of Possible Successors" from Devoxx 2022"Will Git Be Around Forever? A List of Possible Successors" from Devoxx 2022
"Will Git Be Around Forever? A List of Possible Successors" from Devoxx 2022๐ŸŽค Hanno Embregts ๐ŸŽธ
ย 
"Will Git Be Around Forever? A List of Possible Successors" from FrontMania 2022
"Will Git Be Around Forever? A List of Possible Successors" from FrontMania 2022"Will Git Be Around Forever? A List of Possible Successors" from FrontMania 2022
"Will Git Be Around Forever? A List of Possible Successors" from FrontMania 2022๐ŸŽค Hanno Embregts ๐ŸŽธ
ย 
JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"
JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"
JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"๐ŸŽค Hanno Embregts ๐ŸŽธ
ย 
"Will Git Be Around Forever? A List of Possible Successors" at UtrechtJUG
"Will Git Be Around Forever? A List of Possible Successors" at UtrechtJUG"Will Git Be Around Forever? A List of Possible Successors" at UtrechtJUG
"Will Git Be Around Forever? A List of Possible Successors" at UtrechtJUG๐ŸŽค Hanno Embregts ๐ŸŽธ
ย 
Pattern Matching: Small Enhancement or Major Feature? (talk from jLove 2021)
Pattern Matching: Small Enhancement or Major Feature? (talk from jLove 2021)Pattern Matching: Small Enhancement or Major Feature? (talk from jLove 2021)
Pattern Matching: Small Enhancement or Major Feature? (talk from jLove 2021)๐ŸŽค Hanno Embregts ๐ŸŽธ
ย 
Pattern Matching: From Small Enhancement to Major Feature (talk from JavaLand...
Pattern Matching: From Small Enhancement to Major Feature (talk from JavaLand...Pattern Matching: From Small Enhancement to Major Feature (talk from JavaLand...
Pattern Matching: From Small Enhancement to Major Feature (talk from JavaLand...๐ŸŽค Hanno Embregts ๐ŸŽธ
ย 
Beware of Survivorship Bias! (conference talk at J-Fall 2019)
Beware of Survivorship Bias! (conference talk at J-Fall 2019)Beware of Survivorship Bias! (conference talk at J-Fall 2019)
Beware of Survivorship Bias! (conference talk at J-Fall 2019)๐ŸŽค Hanno Embregts ๐ŸŽธ
ย 
Will Git Be Around Forever? A List of Possible Successors
Will Git Be Around Forever? A List of Possible SuccessorsWill Git Be Around Forever? A List of Possible Successors
Will Git Be Around Forever? A List of Possible Successors๐ŸŽค Hanno Embregts ๐ŸŽธ
ย 
QWERTY or DVORAK? Debunking the Keyboard Layout Myths -- from GeeCON 2018
QWERTY or DVORAK? Debunking the Keyboard Layout Myths -- from GeeCON 2018QWERTY or DVORAK? Debunking the Keyboard Layout Myths -- from GeeCON 2018
QWERTY or DVORAK? Debunking the Keyboard Layout Myths -- from GeeCON 2018๐ŸŽค Hanno Embregts ๐ŸŽธ
ย 
Building a Spring Boot 2 Application - Ask the Audience! (from Voxxed Days Vi...
Building a Spring Boot 2 Application - Ask the Audience! (from Voxxed Days Vi...Building a Spring Boot 2 Application - Ask the Audience! (from Voxxed Days Vi...
Building a Spring Boot 2 Application - Ask the Audience! (from Voxxed Days Vi...๐ŸŽค Hanno Embregts ๐ŸŽธ
ย 
Building a Spring Boot Application - Ask the Audience! (from JVMCon 2018)
Building a Spring Boot Application - Ask the Audience! (from JVMCon 2018)Building a Spring Boot Application - Ask the Audience! (from JVMCon 2018)
Building a Spring Boot Application - Ask the Audience! (from JVMCon 2018)๐ŸŽค Hanno Embregts ๐ŸŽธ
ย 
Building a Spring Boot Application - Ask the Audience! (from JavaLand 2017)
Building a Spring Boot Application - Ask the Audience!  (from JavaLand 2017)Building a Spring Boot Application - Ask the Audience!  (from JavaLand 2017)
Building a Spring Boot Application - Ask the Audience! (from JavaLand 2017)๐ŸŽค Hanno Embregts ๐ŸŽธ
ย 

More from ๐ŸŽค Hanno Embregts ๐ŸŽธ (19)

Pattern Matching: Small Enhancement or Major Feature?
Pattern Matching: Small Enhancement or Major Feature?Pattern Matching: Small Enhancement or Major Feature?
Pattern Matching: Small Enhancement or Major Feature?
ย 
Pattern Matching - Small Enhancement or Major Feature? from Developer Week 202
Pattern Matching - Small Enhancement or Major Feature? from Developer Week 202Pattern Matching - Small Enhancement or Major Feature? from Developer Week 202
Pattern Matching - Small Enhancement or Major Feature? from Developer Week 202
ย 
"Will Git Be Around Forever? A List of Possible Successors" from Devoxx 2022
"Will Git Be Around Forever? A List of Possible Successors" from Devoxx 2022"Will Git Be Around Forever? A List of Possible Successors" from Devoxx 2022
"Will Git Be Around Forever? A List of Possible Successors" from Devoxx 2022
ย 
"Will Git Be Around Forever? A List of Possible Successors" from FrontMania 2022
"Will Git Be Around Forever? A List of Possible Successors" from FrontMania 2022"Will Git Be Around Forever? A List of Possible Successors" from FrontMania 2022
"Will Git Be Around Forever? A List of Possible Successors" from FrontMania 2022
ย 
Pattern Matching: Small Enhancement or Major Feature?
Pattern Matching: Small Enhancement or Major Feature?Pattern Matching: Small Enhancement or Major Feature?
Pattern Matching: Small Enhancement or Major Feature?
ย 
JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"
JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"
JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"
ย 
"Will Git Be Around Forever? A List of Possible Successors" at UtrechtJUG
"Will Git Be Around Forever? A List of Possible Successors" at UtrechtJUG"Will Git Be Around Forever? A List of Possible Successors" at UtrechtJUG
"Will Git Be Around Forever? A List of Possible Successors" at UtrechtJUG
ย 
Pattern Matching: Small Enhancement or Major Feature? (talk from jLove 2021)
Pattern Matching: Small Enhancement or Major Feature? (talk from jLove 2021)Pattern Matching: Small Enhancement or Major Feature? (talk from jLove 2021)
Pattern Matching: Small Enhancement or Major Feature? (talk from jLove 2021)
ย 
Pattern Matching: From Small Enhancement to Major Feature (talk from JavaLand...
Pattern Matching: From Small Enhancement to Major Feature (talk from JavaLand...Pattern Matching: From Small Enhancement to Major Feature (talk from JavaLand...
Pattern Matching: From Small Enhancement to Major Feature (talk from JavaLand...
ย 
The Soft Side of Software Development / Devoxx 2019
The Soft Side of Software Development / Devoxx 2019The Soft Side of Software Development / Devoxx 2019
The Soft Side of Software Development / Devoxx 2019
ย 
Beware of Survivorship Bias! (conference talk at J-Fall 2019)
Beware of Survivorship Bias! (conference talk at J-Fall 2019)Beware of Survivorship Bias! (conference talk at J-Fall 2019)
Beware of Survivorship Bias! (conference talk at J-Fall 2019)
ย 
Will Git Be Around Forever? A List of Possible Successors
Will Git Be Around Forever? A List of Possible SuccessorsWill Git Be Around Forever? A List of Possible Successors
Will Git Be Around Forever? A List of Possible Successors
ย 
QWERTY or DVORAK? Debunking the Keyboard Layout Myths -- from GeeCON 2018
QWERTY or DVORAK? Debunking the Keyboard Layout Myths -- from GeeCON 2018QWERTY or DVORAK? Debunking the Keyboard Layout Myths -- from GeeCON 2018
QWERTY or DVORAK? Debunking the Keyboard Layout Myths -- from GeeCON 2018
ย 
Building a Spring Boot 2 Application - Ask the Audience! (from Voxxed Days Vi...
Building a Spring Boot 2 Application - Ask the Audience! (from Voxxed Days Vi...Building a Spring Boot 2 Application - Ask the Audience! (from Voxxed Days Vi...
Building a Spring Boot 2 Application - Ask the Audience! (from Voxxed Days Vi...
ย 
Building a Spring Boot Application - Ask the Audience! (from JVMCon 2018)
Building a Spring Boot Application - Ask the Audience! (from JVMCon 2018)Building a Spring Boot Application - Ask the Audience! (from JVMCon 2018)
Building a Spring Boot Application - Ask the Audience! (from JVMCon 2018)
ย 
Building a Spring Boot Application - Ask the Audience!
Building a Spring Boot Application - Ask the Audience!Building a Spring Boot Application - Ask the Audience!
Building a Spring Boot Application - Ask the Audience!
ย 
QWERTY or DVORAK? Debunking the Keyboard Layout Myths
QWERTY or DVORAK? Debunking the Keyboard Layout MythsQWERTY or DVORAK? Debunking the Keyboard Layout Myths
QWERTY or DVORAK? Debunking the Keyboard Layout Myths
ย 
Building a Spring Boot Application - Ask the Audience! (from JavaLand 2017)
Building a Spring Boot Application - Ask the Audience!  (from JavaLand 2017)Building a Spring Boot Application - Ask the Audience!  (from JavaLand 2017)
Building a Spring Boot Application - Ask the Audience! (from JavaLand 2017)
ย 
Migrating 25K lines of Ant scripting to Gradle
Migrating 25K lines of Ant scripting to GradleMigrating 25K lines of Ant scripting to Gradle
Migrating 25K lines of Ant scripting to Gradle
ย 

Recently uploaded

Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
ย 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy Lรณpez
ย 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
ย 
GOING AOT WITH GRAALVM โ€“ DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM โ€“ DEVOXX GREECE.pdfGOING AOT WITH GRAALVM โ€“ DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM โ€“ DEVOXX GREECE.pdfAlina Yurenko
ย 
่‹ฑๅ›ฝUNๅญฆไฝ่ฏ,ๅŒ—ๅฎ‰ๆ™ฎ้กฟๅคงๅญฆๆฏ•ไธš่ฏไนฆ1:1ๅˆถไฝœ
่‹ฑๅ›ฝUNๅญฆไฝ่ฏ,ๅŒ—ๅฎ‰ๆ™ฎ้กฟๅคงๅญฆๆฏ•ไธš่ฏไนฆ1:1ๅˆถไฝœ่‹ฑๅ›ฝUNๅญฆไฝ่ฏ,ๅŒ—ๅฎ‰ๆ™ฎ้กฟๅคงๅญฆๆฏ•ไธš่ฏไนฆ1:1ๅˆถไฝœ
่‹ฑๅ›ฝUNๅญฆไฝ่ฏ,ๅŒ—ๅฎ‰ๆ™ฎ้กฟๅคงๅญฆๆฏ•ไธš่ฏไนฆ1:1ๅˆถไฝœqr0udbr0
ย 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
ย 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
ย 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
ย 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
ย 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
ย 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
ย 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
ย 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
ย 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
ย 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
ย 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
ย 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
ย 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
ย 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
ย 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
ย 

Recently uploaded (20)

Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
ย 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
ย 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
ย 
GOING AOT WITH GRAALVM โ€“ DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM โ€“ DEVOXX GREECE.pdfGOING AOT WITH GRAALVM โ€“ DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM โ€“ DEVOXX GREECE.pdf
ย 
่‹ฑๅ›ฝUNๅญฆไฝ่ฏ,ๅŒ—ๅฎ‰ๆ™ฎ้กฟๅคงๅญฆๆฏ•ไธš่ฏไนฆ1:1ๅˆถไฝœ
่‹ฑๅ›ฝUNๅญฆไฝ่ฏ,ๅŒ—ๅฎ‰ๆ™ฎ้กฟๅคงๅญฆๆฏ•ไธš่ฏไนฆ1:1ๅˆถไฝœ่‹ฑๅ›ฝUNๅญฆไฝ่ฏ,ๅŒ—ๅฎ‰ๆ™ฎ้กฟๅคงๅญฆๆฏ•ไธš่ฏไนฆ1:1ๅˆถไฝœ
่‹ฑๅ›ฝUNๅญฆไฝ่ฏ,ๅŒ—ๅฎ‰ๆ™ฎ้กฟๅคงๅญฆๆฏ•ไธš่ฏไนฆ1:1ๅˆถไฝœ
ย 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
ย 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
ย 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
ย 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
ย 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
ย 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
ย 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
ย 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
ย 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
ย 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
ย 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
ย 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
ย 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
ย 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
ย 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
ย 

Entering the Fourth Dimension of OCR with Tesseract

  • 1. Entering the Fourth DimensionEntering the Fourth Dimension of OCR withof OCR with TesseractTesseract Hanno Embregts @hannotify
  • 2.
  • 3. What is OCR?What is OCR?What is OCR?What is OCR?What is OCR? / rst dimension/ rst dimension/ rst dimension/ rst dimension/ rst dimension
  • 4. Atechnology that can take writtenAtechnology that can take written words and convert then aes mowords and convert then aes mo aulerseadans Yom, proved meyren neaulerseadans Yom, proved meyren ne fgntrom sng tne cortec cars ometimes,fgntrom sng tne cortec cars ometimes, ate fant gait sae and ptch, dar engugnate fant gait sae and ptch, dar engugn onthe pase, ana Your Bropared toonthe pase, ana Your Bropared to spend Several cantinas covtecing al mespend Several cantinas covtecing al me one PrP areee e SP es ts anine Oe tatone PrP areee e SP es ts anine Oe tat came autas zerseecame autas zersee
  • 5.
  • 6.
  • 7. A technology that can take writtenA technology that can take written words and convert them back intowords and convert them back into computer-readable form, providedcomputer-readable form, provided they're in the right font, using thethey're in the right font, using the correct colors sometimes, at the rightcorrect colors sometimes, at the right point size and pitch, dark enough onpoint size and pitch, dark enough on the paper, and you're prepared to spendthe paper, and you're prepared to spend several centuries correcting all the onesseveral centuries correcting all the ones that came out as l's, all the O's thatthat came out as l's, all the O's that came out as zeroes, and all the colonscame out as zeroes, and all the colons that come out like semicolons.that come out like semicolons.
  • 8. A Proper De nitionA Proper De nition Optical character recognitionOptical character recognition (...) is(...) is the mechanical or electronic conversionthe mechanical or electronic conversion of images of typed, handwritten orof images of typed, handwritten or printed text into machine-encoded text,printed text into machine-encoded text, whether from a scanned document, awhether from a scanned document, a photo of a document, a scene-photophoto of a document, a scene-photo (...) or from subtitle text superimposed(...) or from subtitle text superimposed on an image.on an image.
  • 12. 19291929192919291929 Gustav Tauschek patents a basic OCR 'reading machine'.
  • 13. 1960s1960s1960s1960s1960s Postal services start using OCR for mail sorting.
  • 14. 19931993199319931993 The Apple Newton becomes the rst handheld computer to feature handwriting recognition.
  • 16. Financial transfersFinancial transfersFinancial transfersFinancial transfersFinancial transfers Catch me if you can!
  • 17. Book digitizationBook digitizationBook digitizationBook digitizationBook digitization Also supports Ctrl+F.
  • 18. Passport scanningPassport scanningPassport scanningPassport scanningPassport scanning Gets you to your gate in time.
  • 19. Number plateNumber plateNumber plateNumber plateNumber plate recognitionrecognitionrecognitionrecognitionrecognition Get your speeding ticket even faster!
  • 20.
  • 21. GettingGettingGettingGettingGetting StartedStartedStartedStartedStarted / second dimension/ second dimension/ second dimension/ second dimension/ second dimension
  • 22. TesseractTesseract Development started at Hewlett-Packard in 1985 Ported to Windows in 1996 Released as open-source in 2005 Google sponsors development of Tesseract since 2006 (( ))https://github.com/tesseract-ocr/tesseracthttps://github.com/tesseract-ocr/tesseract
  • 23. โ€”โ€” Anthony KayAnthony Kay in "Linux Journal", July 2007in "Linux Journal", July 2007 "The core feature, text recognition, is"The core feature, text recognition, is drastically better than anything elsedrastically better than anything else I've tried from the Open SourceI've tried from the Open Source community."community."
  • 24. FeaturesFeatures character recognition support for Unicode input: JPEG, GIF, PNG, TIFF or BMP output: searchable PDF, TSV, plain text or HOCR
  • 25. HOCR exampleHOCR example <p class="ocr_par" lang="deu" title="bbox930"> <span class="ocr_line" title="bbox 348 797 1482 838; baseline - <span class="ocrx_word" title="bbox 348 805 402 832; x_wconf <span class="ocrx_word" title="bbox 421 804 697 832; x_wconf <span class="ocrx_word" title="bbox 717 803 755 831; x_wconf <span class="ocrx_word" title="bbox 773 803 802 831; x_wconf <span class="ocrx_word" title="bbox 821 803 917 830; x_wconf <span class="ocrx_word" title="bbox 935 799 1180 838; x_wconf <span class="ocrx_word" title="bbox 1199 797 1343 832; x_wcon <span class="ocrx_word" title="bbox 1362 805 1399 823; x_wcon <span class="ocrx_word" title="bbox 1417 x_wconf 96">ver-</sp </span> </p>
  • 26. Used by GoogleUsed by Google For text detection on mobile devices In video In Gmail image spam detection
  • 27. New featuresNew features in v3.0in v3.0 support for over 100 languages page layout analysis in v4.0in v4.0 LSTM recognition engine
  • 29. Tess4J featuresTess4J features PDF input Multi-page TIFF input Image optimization (( ))https://github.com/nguyenq/tess4jhttps://github.com/nguyenq/tess4j
  • 30. DemoDemo Install Tesseract (for multilanguage support) Add Tess4J dependency Convert image to plain text (English) Convert image to plain text (Greek)
  • 31. Choosing theChoosing theChoosing theChoosing theChoosing the Right LibraryRight LibraryRight LibraryRight LibraryRight Library / third dimension/ third dimension/ third dimension/ third dimension/ third dimension
  • 33. ABBYY FineReaderABBYY FineReader Development started at ABBYY in 1993 Supports 192 languages 20 million users worldwide Outputs to MS O ce, RTF, HTML, (searchable) PDF and plain text (( ))https://www.abbyy.com/en-eu/ nereaderhttps://www.abbyy.com/en-eu/ nereader
  • 34. Google Cloud VisionGoogle Cloud Vision APIAPI Launched in 2016 by Google Supports 56 languages Outputs to JSON Integrates nicely with Google Images and Google SafeSearch (( ))https://cloud.google.com/vision/https://cloud.google.com/vision/
  • 36. ABBYY GCV Tesseract costs $200 per computer $1.50 per 1000 images, per month $0
  • 37. ABBYY GCV Tesseract costs $200 per computer $1.50 per 1000 images, per month $0 languages 192 56 102
  • 38. ABBYY GCV Tesseract costs $200 per computer $1.50 per 1000 images, per month $0 languages 192 56 102 Java integration through SDK through REST API through JNA wrapper
  • 39. ABBYY GCV Tesseract costs $200 per computer $1.50 per 1000 images, per month $0 languages 192 56 102 Java integration through SDK through REST API through JNA wrapper handwriting recognition 'handprinted' text supported not supported
  • 40. ABBYY GCV Tesseract costs $200 per computer $1.50 per 1000 images, per month $0 languages 192 56 102 Java integration through SDK through REST API through JNA wrapper handwriting recognition 'handprinted' text supported not supported custom training supported not supported supported
  • 41. ABBYY GCV Tesseract costs $200 per computer $1.50 per 1000 images, per month $0 languages 192 56 102 Java integration through SDK through REST API through JNA wrapper handwriting recognition 'handprinted' text supported not supported custom training supported not supported supported accuracy 9/10 8/10 7/10
  • 42. Case studyCase studyCase studyCase studyCase study Paper archives going digital.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50. AdvancedAdvancedAdvancedAdvancedAdvanced FeaturesFeaturesFeaturesFeaturesFeatures / fourth dimension/ fourth dimension/ fourth dimension/ fourth dimension/ fourth dimension
  • 51. What AdvancedWhat Advanced Features?Features? Reporting con dence Multiple languages in a single document Image optimization Speed/accuracy tradeo s Training
  • 52. Improving accuracyImproving accuracyImproving accuracyImproving accuracyImproving accuracy To better recognize the expected input documents.
  • 53. What is con dence?What is con dence?
  • 54. What is con dence?What is con dence?
  • 55. Reporting con denceReporting con dence Tess4J supports two return types:Tess4J supports two return types: String (containing the OCR'ed text) List<OCRResult> (OCR result is written to a le) int confidence List<Word> words
  • 56. Multiple languages in aMultiple languages in a single documentsingle document Concatenate the language codes and separate themConcatenate the language codes and separate them by a plus sign:by a plus sign: tesseract.setLanguage("eng+nld");
  • 57. DemoDemo Reporting con dence Multiple languages in a single document
  • 58. Image optimizationImage optimization Tess4J is bundled with theTess4J is bundled with the ImageHelperImageHelper class, whichclass, which contains a few image optimization tricks.contains a few image optimization tricks.
  • 59. Image optimizationImage optimization Tess4J is bundled with theTess4J is bundled with the ImageHelperImageHelper class, whichclass, which contains a few image optimization tricks.contains a few image optimization tricks. convertImageToBinary(BufferedImage image)
  • 60. Image optimizationImage optimization Tess4J is bundled with theTess4J is bundled with the ImageHelperImageHelper class, whichclass, which contains a few image optimization tricks.contains a few image optimization tricks. convertImageToBinary(BufferedImage image) convertImageToGrayscale(BufferedImage image)
  • 61. Image optimizationImage optimization Tess4J is bundled with theTess4J is bundled with the ImageHelperImageHelper class, whichclass, which contains a few image optimization tricks.contains a few image optimization tricks. convertImageToBinary(BufferedImage image) convertImageToGrayscale(BufferedImage image) invertImageColor(BufferedImage image)
  • 62. Image optimizationImage optimization Tess4J is bundled with theTess4J is bundled with the ImageHelperImageHelper class, whichclass, which contains a few image optimization tricks.contains a few image optimization tricks. convertImageToBinary(BufferedImage image) convertImageToGrayscale(BufferedImage image) invertImageColor(BufferedImage image) rotateImage(BufferedImage image, double angle)
  • 63. Still having problems?Still having problems? https://github.com/tesseract-https://github.com/tesseract- ocr/tesseract/wiki/ImproveQualityocr/tesseract/wiki/ImproveQuality
  • 64. Speed/accuracySpeed/accuracy tradeo stradeo s Two types of training data:Two types of training data: https://github.com/tesseract-ocr/tessdata_fast https://github.com/tesseract-ocr/tessdata_best
  • 66. Training dataTraining data 400,000 textlines 4500 fonts (for Latin-based languages)(for Latin-based languages)
  • 68. Custom trainingCustom training Fine tune (e.g. for an unusual font)
  • 69. Custom trainingCustom training Fine tune (e.g. for an unusual font) Cut o the top layer (e.g. for a new language)
  • 70. Custom trainingCustom training Fine tune (e.g. for an unusual font) Cut o the top layer (e.g. for a new language) Retrain from scratch (e.g. don't do this!)
  • 71.
  • 72.
  • 73.
  • 75. Further readingFurther reading "An Overview of the Tesseract OCR Engine" by Ray Smith ( ) Useful resourcesUseful resources Tesseract on Github ( ) Try Tesseract online ( ) https://research.google.com/pubs/archive/33418.pdf https://github.com/tesseract-ocr/tesseract newocr.com
  • 77. Thank you! โ˜บย Thank you! โ˜บย  https://hannotify.github.io @hannotify hanno.embregts@infosupport.com