SlideShare a Scribd company logo
1 of 30
Download to read offline
VitaFlow
Video Image Text Audio - Flow
Mageswaran Dhandapani <
mageswaran.dhandapani@imaginea.com>
Agenda
- Introduction
- Receipt Information Extraction
- How it is done?
- Demo
Flashback
- We deal with lot of text related problems/challenges like clustering/classification docs, information extraction etc..
- Form filling: https://github.com/Imaginea/i-tagger
- Infinity : Pramati level innovation competition
- 2018 with GANs @ https://github.com/dhiraa/asariri
- 2019 with Audio and CNNs
- Our exploration and code base were scattered.
- We organized all our explorations under one code base called VitaFlow
- Planned R&D
- Information Extraction with an aim to generalize i.e independent of dataset (Text + Image)
- Audio related Android applications (Tensorflow Lite + Audio + Android Applications) (need funding and
resources ;))
Introduction
- Problem
- A pipeline to train and deploy DL models
- Address domain specific information extraction
- Traditional IE depends on rule based engines
- Often not easily extensible for new data
- Solution
- Design a ML/DL model pipeline
- Plug and play modules at each stage
- Data sets
- Annotations
- Pre-processing / Post processing
- Training and serving the models
- Metrics to evaluate
- Feedback loop
Pipeline
1. Raw Images
2. Image Annotations (Bounding boxes)
3. Text Detection - Text Localisation / Document Orientation Analysis + Fix
a. EAST
b. DOCT2TEXT
4. Text-Cleaner/Binarization
5. Text Recognition - OCR
a. Calamari (CNN+LSTM models)
b. Tesseract
6. Text Annotations
7. ML / Statistical Inference / Rules
8. Domain Specific Extraction
9. Data Store
Plug and Play Design
tan chay yee
0.2s4 JALAH HARMOHI
312
Date 09/0112019 8:01:11
PM
Total Amount : 31.00
….
EAST/FOTS
Information Extraction
- Rules
- Statistics Inference
- ML/DL Models
(Natural
Scene) Text
Segmentation
Image to Text
CNN +
LSTM
Models
(OCR)
Extract Text
Line segments
Vendor : tan chay yee
Total: 31
Date : 09/01/2019
Domain Specific Information Extraction
Positional Information
Annotation Tool
OCR
- OCR : Text Localization + Text Extraction
- Text Localization
- EAST (https://arxiv.org/abs/1704.03155)
- FOTS (https://arxiv.org/abs/1801.01671)
- Text Extraction
- Calamari (https://github.com/Calamari-OCR/calamari)
- Tesseract
ICDAR Dataset
- ICDAR 2015
Natural images with incidental scene text
- ICDAR 2019
Receipts and invoices
- What's unique about data preparation for OCR Text recognition?
- Its Text + Image
- Format of Images : JPEG or PNG
- Ground truth :
● One text file per image,
● UTF-8 format
● Each line specifies the coordinates of one word's bounding box and its transcription in a comma
separated format
img_01.png <-> img_01.txt
x1_1, y1_1,x2_1,y2_1,x3_1,y3_1,x4_1,y4_1, transcript_1
x1_2,y1_2,x2_2,y2_2,x3_2,y3_2,x4_2,y4_2, transcript_2
x1_3,y1_3,x2_3,y2_3,x3_3,y3_3,x4_3,y4_3, transcript_3
Data Preparation
Input:
- RGB color image (height×width×3) or a grayscale image (height×width×1)
Output
- Image matrix (height×width×3)
- Score map matrix (height×width×1) : Distance to the nearest vertex
- Geometry map matrix (height×width×5) : Bit complicated, expect a post on this soon!
https://www.jeremyjordan.me/semantic-segmentation/
Some Refreshers....
Conv Net
- http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006
- http://teleported.in/posts/decoding-resnet-architecture/
- Increase the depth of the layer without affecting its generalization power
- The network can be mathematically depicted as:
H(x) = F(x) + x, where F(x) = W2*relu(W1*x+b1)+b2
- During training period, the residual network learns the weights of its layers such that if the identity mapping were
optimal, all the weights get set to 0. In effect F(x) become 0, as in x gets directly mapped to H(x) and no
corrections need to be made. Hence these become your identity mappings which help grow the network deep.
And if there is a deviation from optimal identity mapping, weights and biases of F(x) are learned to adjust for it.
Think of F(x) as learning how to adjust our predictions to match the actuals.
ResNet
EAST
- An Efficient and Accurate Scene Text Detector
- No image matrix algorithms involved like edge detection, filtering, smoothening etc.,
- Character and word segmentation graphs
- Basically no complicated algorithms
- Detects text in an image and videos
- Geometry and confidence scores for the detected text.
- The network architecture is based on U-Net.
- Feed forward “stem” of this network may vary
-  PVANet, VGG16 used in the paper
- Our pipeline uses Resnet
- A popular text detector- Got adopted by OpenCV library.
EAST ARCHITECTURE
These skip connections from earlier layers in the network (prior to a downsampling operation) should provide
the necessary detail in order to reconstruct accurate shapes for segmentation boundaries.
Loss Function
- Cross entropy loss, won’t work
efficiently in this case as one
segmentation can dominate
other
- Dice Loss
- Where |A∩B|
represents the common
elements between sets A
and B, and |A|
represents the number
of elements in set A (and
likewise for set B).
Demo - VitaFlow in Action
Looking for Demo…
We just have to move to next
slide… ;)
Calamari OCR
Output - Image to Text
GUARANTEE
N.ASDA.COM/PRICEGUARANT
MILK
£1.46D
ACCOUNT WILL BE DEBITED AS
Calamari OCR
Challenges
250 KG @ E0.67/KG
PACQTQN
SNUING YOU MONEY EUERY
£180D
OHN LEUIS NEUBURY AT HONE
Takeaways
- Deliver model pipeline, not just models
- Make the pipeline debuggable at each stage
- Provide feedback loop, so that humans can aid the whole process
- Extracting text images has its own challenges
- Identifying the text from its background
- Varying size and fonts
- Similar looking characters (o/0, y/g)
- Recovering text from scanned/aged images
- Multi oriented text
- Black and white images
VitaFlow
For more information
● Code: https://github.com/Imaginea/vitaFlow
** We are in the process of piecing together all our individual efforts as a pipeline
** ReadME will be updated shortly with end to end replication of this talk
Thank You! ** Conditions Apply ;)
Q & A

More Related Content

Similar to VitaFlow | Mageswaran Dhandapani [Pramati]

IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...Christopher Diamantopoulos
 
Tek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with GraphiteTek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with Graphitenanderoo
 
JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19Joseph Kuo
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processingideas2ignite
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Intel® Software
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryDeepak Shankar
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Iulian Pintoiu
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Jason Dai
 
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA PlatformAccelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA PlatformDatabricks
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...Kohei KaiGai
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGaiKohei KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...Equnix Business Solutions
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiDatabricks
 
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMDEdge AI and Vision Alliance
 
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...InfluxData
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeIntel® Software
 

Similar to VitaFlow | Mageswaran Dhandapani [Pramati] (20)

IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
 
Tek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with GraphiteTek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with Graphite
 
PowerAI Deep dive
PowerAI Deep divePowerAI Deep dive
PowerAI Deep dive
 
JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
Shantanu's Resume
Shantanu's ResumeShantanu's Resume
Shantanu's Resume
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA PlatformAccelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
 
Evolution of Spark APIs
Evolution of Spark APIsEvolution of Spark APIs
Evolution of Spark APIs
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
 
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
 
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's Stampede
 

More from Pramati Technologies

Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]Pramati Technologies
 
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati Technologies
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati TechnologiesClojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati Technologies
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati TechnologiesPramati Technologies
 
Swift UI - Declarative Programming [Pramati Technologies]
Swift UI - Declarative Programming [Pramati Technologies]Swift UI - Declarative Programming [Pramati Technologies]
Swift UI - Declarative Programming [Pramati Technologies]Pramati Technologies
 
Adaptive Cards - Pramati Technologies
Adaptive Cards - Pramati TechnologiesAdaptive Cards - Pramati Technologies
Adaptive Cards - Pramati TechnologiesPramati Technologies
 
Typography Style Transfer using GANs | Pramati
Typography Style Transfer using GANs | Pramati Typography Style Transfer using GANs | Pramati
Typography Style Transfer using GANs | Pramati Pramati Technologies
 
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Pramati Technologies
 
Pramati - Chennai Development Center
Pramati - Chennai Development CenterPramati - Chennai Development Center
Pramati - Chennai Development CenterPramati Technologies
 

More from Pramati Technologies (7)

Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]
 
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati Technologies
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati TechnologiesClojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati Technologies
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati Technologies
 
Swift UI - Declarative Programming [Pramati Technologies]
Swift UI - Declarative Programming [Pramati Technologies]Swift UI - Declarative Programming [Pramati Technologies]
Swift UI - Declarative Programming [Pramati Technologies]
 
Adaptive Cards - Pramati Technologies
Adaptive Cards - Pramati TechnologiesAdaptive Cards - Pramati Technologies
Adaptive Cards - Pramati Technologies
 
Typography Style Transfer using GANs | Pramati
Typography Style Transfer using GANs | Pramati Typography Style Transfer using GANs | Pramati
Typography Style Transfer using GANs | Pramati
 
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
 
Pramati - Chennai Development Center
Pramati - Chennai Development CenterPramati - Chennai Development Center
Pramati - Chennai Development Center
 

Recently uploaded

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 

Recently uploaded (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

VitaFlow | Mageswaran Dhandapani [Pramati]

  • 1. VitaFlow Video Image Text Audio - Flow Mageswaran Dhandapani < mageswaran.dhandapani@imaginea.com>
  • 2. Agenda - Introduction - Receipt Information Extraction - How it is done? - Demo
  • 3. Flashback - We deal with lot of text related problems/challenges like clustering/classification docs, information extraction etc.. - Form filling: https://github.com/Imaginea/i-tagger - Infinity : Pramati level innovation competition - 2018 with GANs @ https://github.com/dhiraa/asariri - 2019 with Audio and CNNs - Our exploration and code base were scattered. - We organized all our explorations under one code base called VitaFlow - Planned R&D - Information Extraction with an aim to generalize i.e independent of dataset (Text + Image) - Audio related Android applications (Tensorflow Lite + Audio + Android Applications) (need funding and resources ;))
  • 4. Introduction - Problem - A pipeline to train and deploy DL models - Address domain specific information extraction - Traditional IE depends on rule based engines - Often not easily extensible for new data - Solution - Design a ML/DL model pipeline - Plug and play modules at each stage - Data sets - Annotations - Pre-processing / Post processing - Training and serving the models - Metrics to evaluate - Feedback loop
  • 5. Pipeline 1. Raw Images 2. Image Annotations (Bounding boxes) 3. Text Detection - Text Localisation / Document Orientation Analysis + Fix a. EAST b. DOCT2TEXT 4. Text-Cleaner/Binarization 5. Text Recognition - OCR a. Calamari (CNN+LSTM models) b. Tesseract 6. Text Annotations 7. ML / Statistical Inference / Rules 8. Domain Specific Extraction 9. Data Store
  • 6. Plug and Play Design
  • 7. tan chay yee 0.2s4 JALAH HARMOHI 312 Date 09/0112019 8:01:11 PM Total Amount : 31.00 …. EAST/FOTS Information Extraction - Rules - Statistics Inference - ML/DL Models (Natural Scene) Text Segmentation Image to Text CNN + LSTM Models (OCR) Extract Text Line segments Vendor : tan chay yee Total: 31 Date : 09/01/2019 Domain Specific Information Extraction Positional Information
  • 9. OCR - OCR : Text Localization + Text Extraction - Text Localization - EAST (https://arxiv.org/abs/1704.03155) - FOTS (https://arxiv.org/abs/1801.01671) - Text Extraction - Calamari (https://github.com/Calamari-OCR/calamari) - Tesseract
  • 10. ICDAR Dataset - ICDAR 2015 Natural images with incidental scene text - ICDAR 2019 Receipts and invoices - What's unique about data preparation for OCR Text recognition? - Its Text + Image - Format of Images : JPEG or PNG - Ground truth : ● One text file per image, ● UTF-8 format ● Each line specifies the coordinates of one word's bounding box and its transcription in a comma separated format img_01.png <-> img_01.txt x1_1, y1_1,x2_1,y2_1,x3_1,y3_1,x4_1,y4_1, transcript_1 x1_2,y1_2,x2_2,y2_2,x3_2,y3_2,x4_2,y4_2, transcript_2 x1_3,y1_3,x2_3,y2_3,x3_3,y3_3,x4_3,y4_3, transcript_3
  • 11. Data Preparation Input: - RGB color image (height×width×3) or a grayscale image (height×width×1) Output - Image matrix (height×width×3) - Score map matrix (height×width×1) : Distance to the nearest vertex - Geometry map matrix (height×width×5) : Bit complicated, expect a post on this soon! https://www.jeremyjordan.me/semantic-segmentation/
  • 12.
  • 15. - http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006 - http://teleported.in/posts/decoding-resnet-architecture/ - Increase the depth of the layer without affecting its generalization power - The network can be mathematically depicted as: H(x) = F(x) + x, where F(x) = W2*relu(W1*x+b1)+b2 - During training period, the residual network learns the weights of its layers such that if the identity mapping were optimal, all the weights get set to 0. In effect F(x) become 0, as in x gets directly mapped to H(x) and no corrections need to be made. Hence these become your identity mappings which help grow the network deep. And if there is a deviation from optimal identity mapping, weights and biases of F(x) are learned to adjust for it. Think of F(x) as learning how to adjust our predictions to match the actuals. ResNet
  • 16. EAST - An Efficient and Accurate Scene Text Detector - No image matrix algorithms involved like edge detection, filtering, smoothening etc., - Character and word segmentation graphs - Basically no complicated algorithms - Detects text in an image and videos - Geometry and confidence scores for the detected text. - The network architecture is based on U-Net. - Feed forward “stem” of this network may vary -  PVANet, VGG16 used in the paper - Our pipeline uses Resnet - A popular text detector- Got adopted by OpenCV library.
  • 18.
  • 19. These skip connections from earlier layers in the network (prior to a downsampling operation) should provide the necessary detail in order to reconstruct accurate shapes for segmentation boundaries.
  • 20. Loss Function - Cross entropy loss, won’t work efficiently in this case as one segmentation can dominate other - Dice Loss - Where |A∩B| represents the common elements between sets A and B, and |A| represents the number of elements in set A (and likewise for set B).
  • 21. Demo - VitaFlow in Action
  • 22. Looking for Demo… We just have to move to next slide… ;)
  • 23.
  • 24.
  • 25.
  • 26. Calamari OCR Output - Image to Text GUARANTEE N.ASDA.COM/PRICEGUARANT MILK £1.46D ACCOUNT WILL BE DEBITED AS
  • 27. Calamari OCR Challenges 250 KG @ E0.67/KG PACQTQN SNUING YOU MONEY EUERY £180D OHN LEUIS NEUBURY AT HONE
  • 28. Takeaways - Deliver model pipeline, not just models - Make the pipeline debuggable at each stage - Provide feedback loop, so that humans can aid the whole process - Extracting text images has its own challenges - Identifying the text from its background - Varying size and fonts - Similar looking characters (o/0, y/g) - Recovering text from scanned/aged images - Multi oriented text - Black and white images
  • 29. VitaFlow For more information ● Code: https://github.com/Imaginea/vitaFlow ** We are in the process of piecing together all our individual efforts as a pipeline ** ReadME will be updated shortly with end to end replication of this talk Thank You! ** Conditions Apply ;)
  • 30. Q & A