SlideShare a Scribd company logo
1 of 26
Download to read offline
1
Evaluation on Google Cloud Vision API 1.1 (beta)
- POC Report for Japanese Health Care Report OCR
26-May, 2017
Asia Technology Office
Shinichi Hashitani
Executive Summary
• Google Cloud Vision API is an AIaaS provided by Google based on machine learning engine,
which come with OCR (Optical Character Recognition), Image classification, landmark
detection, and other features. OCR feature is highly sophisticated and able to recognize
Japanese characters at almost perfect accuracy.
• Its OCR capability is suited for standard document scanning. It does not recognizes multi-
column document structure well, and not very suited for tabular format document. The
format of health care reports vary across medical institutions, but they are all primarily in
tabular format, making it difficult to extract meaningful data accurately. Only about 30% of
text are extracted, and they are not structured well enough for processing, either.
• OCR feature does not accept any parameter other than the image itself; therefore, it requires
in-house processing of response JSON. There are two primary approaches: 1. Text mining on
content section of response. 2. Programmatic text composition based on characters
coordinate information. For health care report scanning, neither approach is feasible.
• Still Google Cloud Vision API can be used for standard format documents (Books,
whitepapers, academic writings, public announcement, etc.) It can also used for publications
(newspapers, magazines, case study reports) for text mining purpose. Further evaluation of
other format is also conducted during POC.
2
About Google Cloud Vision API
Google Cloud API is a REST API based service, accessible from any system in any language
which can communicate with JSON over HTTP. The request is authenticated on either based on
OAuth2 (recommended) or Cloud API Key.
Request format is common for all Cloud Vision services, “type” needs to be specified for a
specific use. It is also possible to request multiple services within a single call on the same
image. (In this case, each type specified is counted as one unit.)
3
The response time for A4 page with 500 characters is 3 to 6 seconds round trip.
The price model is per-unit-of-task basis, and relatively inexpensive. (1.5 USD for 1000 unit-per-
month. 1.0 USD beyond 20 million unit-per-month. Free for below 1000 unit-per-month.)
{“requests”: [
“image”: {“content”: image_base64},
“features”: [“type”, “TEXT_DETECTION”,
“maxResults”: 1}]
}
]}
{response: [
….
{“blockType”: “TEXT”,
“boundingBox”: {
“vertices”: [
{“x”: 594, “y”:327}
….
“text”: “遥”
….
Google Cloud Vision API – Output format
The output is in JSON format composite of two sets of information.
1. Character (or a small group of character) information.
2. Re-structured full text of covert text.
4
The full description mimics the actual text structure by concatenating characters based on
their coordinates and appending line break character for each line.
From the output, it is confirmed that the engine analyze character-by-character and able to
process text with characters in different languages accurately. At this point in time, this
capability is far superior than that of Microsoft Cognitive Service, which confuses alphabet
characters with similar Kanji characters.
{"boundingBox":
{"vertices": [{"x": 444,"y": 71,{"x": 485,"y":
67,{"x": 488,"y": 104,{"x": 447,"y": 108}],
"property": {"detectedLanguages":
[{"languageCode": "ja"}],
"text": "基"
}
….
"textAnnotations": [
{"boundingPoly": {"vertices": [{"x": 24,"y": 62,{"x": 1538,"y": 62,{"x": 1538,"y":
3096,{"x": 24,"y": 3096}],
"description": "-基準値¥n|今回ー前回ー前々回ー¥n総合判定¥n要経過観察!
要経過観察|要経過観察¥nメタボリックシンドローム判定¥n非該当 1予備群
該当1基準該当¥n【心電図】不完全右脚ブロック¥n甩¥n血中脂質] LDLコレス
テロールやや高値。食べ過ぎに注意し、動物¥n性脂肪や卵などコレステ
ロールの多いものを制限し、経過を見て下¥nさい。¥n[尿酸]尿酸が高めです。
注意してください。¥n治療中の場合は、この結果表を主治医にお見せ下さ
い。¥n総合判定医師名: 川口 毅 ーーーーーッ童¥n総合所見¥n",
"locale": "ja",
….
Google Cloud Vision API – Output Processing
Since Google Cloud Vision API does not support structured documents and doesn’t accept any
additional information for processing, the in-house output processing is needed in order to
extract desired data out of the out put. There are two ways:
1. Re-structure data from each character from their coordinates.
2. Text mining on the structured full text.
5
Based on the accuracy, composition of structured text, and what needs to be extracted, the
approach to take varies. Text mining approach is a simpler solution between two methods.
{"boundingBox":
{"vertices": [{"x": 444,"y": 71,{"x": 485,"y":
67,{"x": 488,"y": 104,{"x": 447,"y": 108}],
"property": {"detectedLanguages": ….
"textAnnotations": [
…
"description": "-基準値¥n|今回ー前回ー
前々回ー¥n総合判定¥n要経過観察!要経過観
察|要経過観察¥n…
….
“要“ + “経“ + “過“ + “観“+ “察“
= “要経過観察”
“…¥n総合判定¥n要経過観察…“
= “要経過観察”
Output Processing – Text Restructuring
This is a raw data processing. Like structured full text provided in the output itself, the method
is to re-structure text based on concatenating each character based on their coordinates.
Pros:
- Targets specific area to be extracted. (Suitable for structured document.)
- Less affected by the accuracy of the scan.
Cons:
- Requires complex logic to process. (Requires coordinate-based calculation for each string)
- Requires tailored logic for each type of document.
It is ideal for extracting a small amount of information out of the entire document. The logic
depends on coordinates. Therefore, it cannot process unstructured documents or semi-
structured documents. It also strongly depends on the scan positioning of the document; a
small mispositioning of scan can cause the logic to fail fetching characters to process.
6
Output Processing – Full Text Mining
Text mining disregards coordinate information of each character. Rather, it takes the
restructured full text as input, search through the string to extract text.
Pros:
- Logic is simple and known text mining techniques are directly applicable.
- Possibly re-use one logic to multiple document formats.
Cons:
- The accuracy entirely depends on the accuracy of full text extraction.
- Failing to read “key” text will also fail to extract the value.
It is ideal for processing large text, especially for analytical purpose. It is still ban be used for
extracting particular set of information if the accuracy of the extracted text is high.
7
Google Cloud Vision API – Restuctured Full Text
Google Cloud Vision is designed for a standard single column document, reading and processing
from top to bottom, left to right. When restructuring the full text, it cannot restructure it well if
it is in multi column format.
Google Cloud Vision tries to read and to process line by line. Therefore, the entire row will be
displayed as one line, each column is concatenated with spaces in between.
8
AAAAA¥n
BBBBB EEEEE HHHHH¥n
CCCCC FFFFF IIIII¥n
DDDDD GGGGG JJJJJ¥n
The sentence flows from B to C to D, but the text comes out as from B to E to H. A word can
be divided into two lines, therefore some words (words span across multiple lines) cannot be
recognized correctly.
Also, when lines don’t align horizontally beyond columns, or space between columns are too
wide, often the entire sentence is not processed.
Reading Health Care Report – POC Procedures
In this POC, the actual health care report is scanned by a MFP, in both color and monochrome
modes. Cloud API is called from a python program running on a local machine. The same report
is scanned in 3 mode (color/mono/grayscale) in the same resolution. (300dpi/JPEG) Since the
grayscale is not supported by MFP, color TIFF is converted into grayscale JPEG.
9
1. The program reads the image, encodes it into a text format (base64).
2. The program construct JSON requests including encoded image and send it to the Cloud API.
3. The Cloud API processes the image and send back text in JSON format.
4. The program dumps JSON response into a physical file for analysis.
Program (Python) 2
1
3
4
POC Result - Monochrome
10
- Overall read accuracy is
very poor. The left-most
pane is not scanned
entirely.
- Only limited parts of the
document are scanned.
When scanned, character
are recognized correctly in
most cases.
- Traditional OCR worked
better with monochrome,
but it is not in Google
Cloud Vision.
Correct
Incorrect
Not Scanned
POC Result - Grayscale
11
- Overall read accuracy is the
worst among three options.
- The left-most pane is
recognized well; able to
read outlined characters as
well.
- Only limited parts of the
document are scanned.
When scanned, character
are recognized correctly in
most cases.
Correct
Incorrect
Not Scanned
POC Result - Color
12
- Overall read accuracy is
poor, but better than other
two options.
- The left-most pane is
recognized well; able to
read outlined characters as
well.
- Only limited parts of the
document are scanned.
When scanned, character
are recognized correctly in
most cases.
Correct
Incorrect
Not Scanned
POC Result – Summary
All patterns failed to deliver dependable results for production use.
- The results varies among three patterns, but none of them recognized even a half of fields
interested for scanning.
- Character recognition accuracy itself is high. (Around 95%.) Still it is not reliable enough for
production use.
Health Care Report is often in multi-pane/tabular format and not suited for this solution.
- Due to its document structure, large part of the document is not recognized as text areas for
scanning.
- Tabular column borders are wrongly recognized as characters.
- Table columns are often not fully scanned. (Whitespaces between columns are recognized as
the end of sentence.)
13
POC Result – Critical Issues
Rows not scanned in multi column structure
- Since the entire image is scanned as a single column paragraph, some rows are entirely
skipped based on the alignment of lines across columns.
14
1 2
3
4
5
Table border is often wrongly converted to “!” or “1”
- Since the scan is processed as a single line, table border is also converted to “|” . But often
converted to some meaningful value like “1”.
- This happens by chance, and it can alter the actual value with wrongly converted character.
(In below case, 80 is converted as 180)
POC Result – Critical Issues cont’d
Columns are skipped due to whitespaces between them.
- In tabular format, the whitespace between column values often considered as the end of the
line, and the remaining columns are not scanned.
15
Follow Up Case – Overview
Considering the fact that the document structure affects the accuracy of scan significantly, the
complexity of Health Care Report is a particularly challenging for Google Cloud Vision API to
process correctly.
Additional test is conducted to divide the image into three independent images, so a single 3-
pane tabular format image is divided into 3 tabular format images. Each divided image is sent to
Cloud Vision API as a separate request.
16
Follow Up Case - Result
17
- Read accuracy is
significantly improved.
Around 90% of fields
interested are scanned.
- Character recognition
accuracy is high, about the
same level as previous
cases.
- Still all critical issues are
present. (Caused not by
multi-pane document
structure, but by tabular
format.)
Correct
Incorrect
Not Scanned
Overall Summary
Google Cloud Vision API is not suitable for HCR scanning.
- The nature of the document structure hinders it from scanning the desired value.
- Due to some critical issues in tabular data scanning, incorrect values can be extracted.
- For HCR, both Text Restructuring and Full Text Mining approach can cover for scanning
inaccuracy.
By processing partially by dividing or cutting the image, there is a possibility of using Google
Cloud Vision API as a part of solution. However…
- Each image sent will be counted as one request. # of partial images for each HCR will multiply
the cost and response time of the processing.
- Fairly good amount of effort needed for pre-process and post-process in order to extract the
right set of data.
- Logic required strongly depends on the accuracy of the service. It is a high risk that the
change in Cloud Vision API behavior affects the entire solution.
- By the same token, there is a chance of improvement of Google Cloud Vision API will
significantly simplify the overall solution. (Cloud Vision API is still in beta.)
18
Appendix 1 – Sample Scanning 1
19
Standard Report with a footer annotation
Scan Rate: 100%
Scan Accuracy (without punctuations): 100%
Scan Accuracy (with punctuations): 99%
Source: Reinsurance Trend Report by SOMPO Japan
Correct
Incorrect
Not Scanned
Appendix 1 – Sample Scanning 2
20
Standard Report within a single-column
table
Scan Rate: 99%
Scan Accuracy (without punctuations): 100%
Scan Accuracy (with punctuations): 99%
Source: Overview on Japan Pension System by
Ministry of Health, Labour, and Welfare
Correct
Incorrect
Not Scanned
Appendix 1 – Sample Scanning 3
21
Standard Report within a single-column
table and a standard paragraph
Scan Rate: 100%
Scan Accuracy (without punctuations): 99%
Scan Accuracy (with punctuations): 99%
Source: Reinsurance Trend Report by SOMPO Japan
Correct
Incorrect
Not Scanned
Appendix 1 – Sample Scanning 4
22
Standard Report within a row-wide image
Scan Rate: 100%
Scan Accuracy (without punctuations): 100%
Scan Accuracy (with punctuations): 99%
Source: Overview on Japan Pension System by
Ministry of Health, Labour, and Welfare
Correct
Incorrect
Not Scanned
Appendix 1 – Sample Scanning 5
23
Case Study Report in two columns with a
row-wide image
Scan Rate: 94%
Scan Accuracy (without punctuations): 99%
Scan Accuracy (with punctuations): 98%
Source: IoT Case Study on Fujitsu i Network Systems
by CISCO Solution
Correct
Incorrect
Not Scanned
Appendix 1 – Sample Scanning 6
24
Case Study Report in three columns with
in-text images
Scan Rate: 93%
Scan Accuracy (without punctuations): 100%
Scan Accuracy (with punctuations): 99%
Source: IoT Case Study on Fujitsu i Network Systems
by CISCO Solution
Correct
Incorrect
Not Scanned
Appendix 2 – Program Source Code (Python)
25
#coding:utf-8
import sys
import json
import base64
import requests
def process_image(image_path):
GOOGLE_CLOUD_VISION_API_URL = "https://vision.googleapis.com/v1/images:annotate?key="
GOOGLE_CLOUD_VISION_API_KEY = “you_need_a_real_API_key_here"
REQUEST_HEADER = {'Content-Type': 'application/json'}
# loading an image in binary
image_base64 = str(base64.b64encode(open(image_path, 'rb').read()).decode("utf-8"))
request_json = {
'requests': [
{
'image': {
'content': image_base64
},
'features': [
{
'type': "TEXT_DETECTION",
'maxResults': 1
}
]
}
]
}
Appendix 2 – Program Source Code (Python) cont’d
26
# prep & execution
ocr_session = requests.Session()
ocr_request = requests.Request("POST", GOOGLE_CLOUD_VISION_API_URL + GOOGLE_CLOUD_VISION_API_KEY,
data=json.dumps(request_json),
headers=REQUEST_HEADER)
ocr_response = ocr_session.send(ocr_session.prepare_request(ocr_request),
verify=True, timeout=60)
# response
if ocr_response.status_code == requests.codes.ok:
print("Process Successful")
with open("D:¥ocr_result.json", 'w', encoding="utf-8") as json_file:
json.dump(ocr_response.json(), json_file, ensure_ascii=False, indent=4, sort_keys=True)
return ocr_response.json()
else:
print("Process Failed")
ocr_response.raise_for_status()
return "error"
if __name__ == '__main__':
# Execute process_image with the file name passed as a command line parameter
print("File name:" + sys.argv[1])
process_image(sys.argv[1])

More Related Content

Similar to Google Cloud Vision API Evaluation for Japanese Healthcare Reports

Optical Character Recognition (OCR) System
Optical Character Recognition (OCR) SystemOptical Character Recognition (OCR) System
Optical Character Recognition (OCR) Systemiosrjce
 
IRJET- A Novel Approach – Automatic paper evaluation system
IRJET-  	  A Novel Approach – Automatic paper evaluation systemIRJET-  	  A Novel Approach – Automatic paper evaluation system
IRJET- A Novel Approach – Automatic paper evaluation systemIRJET Journal
 
Enhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical RecordsEnhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical Recordscsandit
 
IRJET-Optical Character Recognition using ANN
IRJET-Optical Character Recognition using ANNIRJET-Optical Character Recognition using ANN
IRJET-Optical Character Recognition using ANNIRJET Journal
 
Project report of OCR Recognition
Project report of OCR RecognitionProject report of OCR Recognition
Project report of OCR RecognitionBharat Kalia
 
IRJET- Structuring Mobile Application for Retrieving Book Data Utilizing Opti...
IRJET- Structuring Mobile Application for Retrieving Book Data Utilizing Opti...IRJET- Structuring Mobile Application for Retrieving Book Data Utilizing Opti...
IRJET- Structuring Mobile Application for Retrieving Book Data Utilizing Opti...IRJET Journal
 
IRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using AndroidIRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using AndroidIRJET Journal
 
Online Hand Written Character Recognition
Online Hand Written Character RecognitionOnline Hand Written Character Recognition
Online Hand Written Character RecognitionIOSR Journals
 
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...IRJET Journal
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character RecognitionRahul Mallik
 
IRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech RecognitionIRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech RecognitionIRJET Journal
 
Document Analyser Using Deep Learning
Document Analyser Using Deep LearningDocument Analyser Using Deep Learning
Document Analyser Using Deep LearningIRJET Journal
 
Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontIRJET Journal
 
Im symposium presentation - OCR and Text analytics for Medical Chart Review ...
Im symposium presentation -  OCR and Text analytics for Medical Chart Review ...Im symposium presentation -  OCR and Text analytics for Medical Chart Review ...
Im symposium presentation - OCR and Text analytics for Medical Chart Review ...Alex Zeltov
 
Optical character recognization word
Optical character recognization wordOptical character recognization word
Optical character recognization wordDhana K
 
IRJET- Image to Text Conversion using Tesseract
IRJET-  	  Image to Text Conversion using TesseractIRJET-  	  Image to Text Conversion using Tesseract
IRJET- Image to Text Conversion using TesseractIRJET Journal
 

Similar to Google Cloud Vision API Evaluation for Japanese Healthcare Reports (20)

Optical Character Recognition (OCR) System
Optical Character Recognition (OCR) SystemOptical Character Recognition (OCR) System
Optical Character Recognition (OCR) System
 
D017222226
D017222226D017222226
D017222226
 
IRJET- A Novel Approach – Automatic paper evaluation system
IRJET-  	  A Novel Approach – Automatic paper evaluation systemIRJET-  	  A Novel Approach – Automatic paper evaluation system
IRJET- A Novel Approach – Automatic paper evaluation system
 
Enhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical RecordsEnhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical Records
 
IRJET-Optical Character Recognition using ANN
IRJET-Optical Character Recognition using ANNIRJET-Optical Character Recognition using ANN
IRJET-Optical Character Recognition using ANN
 
Project report of OCR Recognition
Project report of OCR RecognitionProject report of OCR Recognition
Project report of OCR Recognition
 
IRJET- Structuring Mobile Application for Retrieving Book Data Utilizing Opti...
IRJET- Structuring Mobile Application for Retrieving Book Data Utilizing Opti...IRJET- Structuring Mobile Application for Retrieving Book Data Utilizing Opti...
IRJET- Structuring Mobile Application for Retrieving Book Data Utilizing Opti...
 
IRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using AndroidIRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using Android
 
Ocr 1
Ocr 1Ocr 1
Ocr 1
 
Online Hand Written Character Recognition
Online Hand Written Character RecognitionOnline Hand Written Character Recognition
Online Hand Written Character Recognition
 
CRC Final Report
CRC Final ReportCRC Final Report
CRC Final Report
 
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
 
Mr bi
Mr biMr bi
Mr bi
 
IRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech RecognitionIRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech Recognition
 
Document Analyser Using Deep Learning
Document Analyser Using Deep LearningDocument Analyser Using Deep Learning
Document Analyser Using Deep Learning
 
Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English Font
 
Im symposium presentation - OCR and Text analytics for Medical Chart Review ...
Im symposium presentation -  OCR and Text analytics for Medical Chart Review ...Im symposium presentation -  OCR and Text analytics for Medical Chart Review ...
Im symposium presentation - OCR and Text analytics for Medical Chart Review ...
 
Optical character recognization word
Optical character recognization wordOptical character recognization word
Optical character recognization word
 
IRJET- Image to Text Conversion using Tesseract
IRJET-  	  Image to Text Conversion using TesseractIRJET-  	  Image to Text Conversion using Tesseract
IRJET- Image to Text Conversion using Tesseract
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingSelcen Ozturkcan
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Google Cloud Vision API Evaluation for Japanese Healthcare Reports

  • 1. 1 Evaluation on Google Cloud Vision API 1.1 (beta) - POC Report for Japanese Health Care Report OCR 26-May, 2017 Asia Technology Office Shinichi Hashitani
  • 2. Executive Summary • Google Cloud Vision API is an AIaaS provided by Google based on machine learning engine, which come with OCR (Optical Character Recognition), Image classification, landmark detection, and other features. OCR feature is highly sophisticated and able to recognize Japanese characters at almost perfect accuracy. • Its OCR capability is suited for standard document scanning. It does not recognizes multi- column document structure well, and not very suited for tabular format document. The format of health care reports vary across medical institutions, but they are all primarily in tabular format, making it difficult to extract meaningful data accurately. Only about 30% of text are extracted, and they are not structured well enough for processing, either. • OCR feature does not accept any parameter other than the image itself; therefore, it requires in-house processing of response JSON. There are two primary approaches: 1. Text mining on content section of response. 2. Programmatic text composition based on characters coordinate information. For health care report scanning, neither approach is feasible. • Still Google Cloud Vision API can be used for standard format documents (Books, whitepapers, academic writings, public announcement, etc.) It can also used for publications (newspapers, magazines, case study reports) for text mining purpose. Further evaluation of other format is also conducted during POC. 2
  • 3. About Google Cloud Vision API Google Cloud API is a REST API based service, accessible from any system in any language which can communicate with JSON over HTTP. The request is authenticated on either based on OAuth2 (recommended) or Cloud API Key. Request format is common for all Cloud Vision services, “type” needs to be specified for a specific use. It is also possible to request multiple services within a single call on the same image. (In this case, each type specified is counted as one unit.) 3 The response time for A4 page with 500 characters is 3 to 6 seconds round trip. The price model is per-unit-of-task basis, and relatively inexpensive. (1.5 USD for 1000 unit-per- month. 1.0 USD beyond 20 million unit-per-month. Free for below 1000 unit-per-month.) {“requests”: [ “image”: {“content”: image_base64}, “features”: [“type”, “TEXT_DETECTION”, “maxResults”: 1}] } ]} {response: [ …. {“blockType”: “TEXT”, “boundingBox”: { “vertices”: [ {“x”: 594, “y”:327} …. “text”: “遥” ….
  • 4. Google Cloud Vision API – Output format The output is in JSON format composite of two sets of information. 1. Character (or a small group of character) information. 2. Re-structured full text of covert text. 4 The full description mimics the actual text structure by concatenating characters based on their coordinates and appending line break character for each line. From the output, it is confirmed that the engine analyze character-by-character and able to process text with characters in different languages accurately. At this point in time, this capability is far superior than that of Microsoft Cognitive Service, which confuses alphabet characters with similar Kanji characters. {"boundingBox": {"vertices": [{"x": 444,"y": 71,{"x": 485,"y": 67,{"x": 488,"y": 104,{"x": 447,"y": 108}], "property": {"detectedLanguages": [{"languageCode": "ja"}], "text": "基" } …. "textAnnotations": [ {"boundingPoly": {"vertices": [{"x": 24,"y": 62,{"x": 1538,"y": 62,{"x": 1538,"y": 3096,{"x": 24,"y": 3096}], "description": "-基準値¥n|今回ー前回ー前々回ー¥n総合判定¥n要経過観察! 要経過観察|要経過観察¥nメタボリックシンドローム判定¥n非該当 1予備群 該当1基準該当¥n【心電図】不完全右脚ブロック¥n甩¥n血中脂質] LDLコレス テロールやや高値。食べ過ぎに注意し、動物¥n性脂肪や卵などコレステ ロールの多いものを制限し、経過を見て下¥nさい。¥n[尿酸]尿酸が高めです。 注意してください。¥n治療中の場合は、この結果表を主治医にお見せ下さ い。¥n総合判定医師名: 川口 毅 ーーーーーッ童¥n総合所見¥n", "locale": "ja", ….
  • 5. Google Cloud Vision API – Output Processing Since Google Cloud Vision API does not support structured documents and doesn’t accept any additional information for processing, the in-house output processing is needed in order to extract desired data out of the out put. There are two ways: 1. Re-structure data from each character from their coordinates. 2. Text mining on the structured full text. 5 Based on the accuracy, composition of structured text, and what needs to be extracted, the approach to take varies. Text mining approach is a simpler solution between two methods. {"boundingBox": {"vertices": [{"x": 444,"y": 71,{"x": 485,"y": 67,{"x": 488,"y": 104,{"x": 447,"y": 108}], "property": {"detectedLanguages": …. "textAnnotations": [ … "description": "-基準値¥n|今回ー前回ー 前々回ー¥n総合判定¥n要経過観察!要経過観 察|要経過観察¥n… …. “要“ + “経“ + “過“ + “観“+ “察“ = “要経過観察” “…¥n総合判定¥n要経過観察…“ = “要経過観察”
  • 6. Output Processing – Text Restructuring This is a raw data processing. Like structured full text provided in the output itself, the method is to re-structure text based on concatenating each character based on their coordinates. Pros: - Targets specific area to be extracted. (Suitable for structured document.) - Less affected by the accuracy of the scan. Cons: - Requires complex logic to process. (Requires coordinate-based calculation for each string) - Requires tailored logic for each type of document. It is ideal for extracting a small amount of information out of the entire document. The logic depends on coordinates. Therefore, it cannot process unstructured documents or semi- structured documents. It also strongly depends on the scan positioning of the document; a small mispositioning of scan can cause the logic to fail fetching characters to process. 6
  • 7. Output Processing – Full Text Mining Text mining disregards coordinate information of each character. Rather, it takes the restructured full text as input, search through the string to extract text. Pros: - Logic is simple and known text mining techniques are directly applicable. - Possibly re-use one logic to multiple document formats. Cons: - The accuracy entirely depends on the accuracy of full text extraction. - Failing to read “key” text will also fail to extract the value. It is ideal for processing large text, especially for analytical purpose. It is still ban be used for extracting particular set of information if the accuracy of the extracted text is high. 7
  • 8. Google Cloud Vision API – Restuctured Full Text Google Cloud Vision is designed for a standard single column document, reading and processing from top to bottom, left to right. When restructuring the full text, it cannot restructure it well if it is in multi column format. Google Cloud Vision tries to read and to process line by line. Therefore, the entire row will be displayed as one line, each column is concatenated with spaces in between. 8 AAAAA¥n BBBBB EEEEE HHHHH¥n CCCCC FFFFF IIIII¥n DDDDD GGGGG JJJJJ¥n The sentence flows from B to C to D, but the text comes out as from B to E to H. A word can be divided into two lines, therefore some words (words span across multiple lines) cannot be recognized correctly. Also, when lines don’t align horizontally beyond columns, or space between columns are too wide, often the entire sentence is not processed.
  • 9. Reading Health Care Report – POC Procedures In this POC, the actual health care report is scanned by a MFP, in both color and monochrome modes. Cloud API is called from a python program running on a local machine. The same report is scanned in 3 mode (color/mono/grayscale) in the same resolution. (300dpi/JPEG) Since the grayscale is not supported by MFP, color TIFF is converted into grayscale JPEG. 9 1. The program reads the image, encodes it into a text format (base64). 2. The program construct JSON requests including encoded image and send it to the Cloud API. 3. The Cloud API processes the image and send back text in JSON format. 4. The program dumps JSON response into a physical file for analysis. Program (Python) 2 1 3 4
  • 10. POC Result - Monochrome 10 - Overall read accuracy is very poor. The left-most pane is not scanned entirely. - Only limited parts of the document are scanned. When scanned, character are recognized correctly in most cases. - Traditional OCR worked better with monochrome, but it is not in Google Cloud Vision. Correct Incorrect Not Scanned
  • 11. POC Result - Grayscale 11 - Overall read accuracy is the worst among three options. - The left-most pane is recognized well; able to read outlined characters as well. - Only limited parts of the document are scanned. When scanned, character are recognized correctly in most cases. Correct Incorrect Not Scanned
  • 12. POC Result - Color 12 - Overall read accuracy is poor, but better than other two options. - The left-most pane is recognized well; able to read outlined characters as well. - Only limited parts of the document are scanned. When scanned, character are recognized correctly in most cases. Correct Incorrect Not Scanned
  • 13. POC Result – Summary All patterns failed to deliver dependable results for production use. - The results varies among three patterns, but none of them recognized even a half of fields interested for scanning. - Character recognition accuracy itself is high. (Around 95%.) Still it is not reliable enough for production use. Health Care Report is often in multi-pane/tabular format and not suited for this solution. - Due to its document structure, large part of the document is not recognized as text areas for scanning. - Tabular column borders are wrongly recognized as characters. - Table columns are often not fully scanned. (Whitespaces between columns are recognized as the end of sentence.) 13
  • 14. POC Result – Critical Issues Rows not scanned in multi column structure - Since the entire image is scanned as a single column paragraph, some rows are entirely skipped based on the alignment of lines across columns. 14 1 2 3 4 5 Table border is often wrongly converted to “!” or “1” - Since the scan is processed as a single line, table border is also converted to “|” . But often converted to some meaningful value like “1”. - This happens by chance, and it can alter the actual value with wrongly converted character. (In below case, 80 is converted as 180)
  • 15. POC Result – Critical Issues cont’d Columns are skipped due to whitespaces between them. - In tabular format, the whitespace between column values often considered as the end of the line, and the remaining columns are not scanned. 15
  • 16. Follow Up Case – Overview Considering the fact that the document structure affects the accuracy of scan significantly, the complexity of Health Care Report is a particularly challenging for Google Cloud Vision API to process correctly. Additional test is conducted to divide the image into three independent images, so a single 3- pane tabular format image is divided into 3 tabular format images. Each divided image is sent to Cloud Vision API as a separate request. 16
  • 17. Follow Up Case - Result 17 - Read accuracy is significantly improved. Around 90% of fields interested are scanned. - Character recognition accuracy is high, about the same level as previous cases. - Still all critical issues are present. (Caused not by multi-pane document structure, but by tabular format.) Correct Incorrect Not Scanned
  • 18. Overall Summary Google Cloud Vision API is not suitable for HCR scanning. - The nature of the document structure hinders it from scanning the desired value. - Due to some critical issues in tabular data scanning, incorrect values can be extracted. - For HCR, both Text Restructuring and Full Text Mining approach can cover for scanning inaccuracy. By processing partially by dividing or cutting the image, there is a possibility of using Google Cloud Vision API as a part of solution. However… - Each image sent will be counted as one request. # of partial images for each HCR will multiply the cost and response time of the processing. - Fairly good amount of effort needed for pre-process and post-process in order to extract the right set of data. - Logic required strongly depends on the accuracy of the service. It is a high risk that the change in Cloud Vision API behavior affects the entire solution. - By the same token, there is a chance of improvement of Google Cloud Vision API will significantly simplify the overall solution. (Cloud Vision API is still in beta.) 18
  • 19. Appendix 1 – Sample Scanning 1 19 Standard Report with a footer annotation Scan Rate: 100% Scan Accuracy (without punctuations): 100% Scan Accuracy (with punctuations): 99% Source: Reinsurance Trend Report by SOMPO Japan Correct Incorrect Not Scanned
  • 20. Appendix 1 – Sample Scanning 2 20 Standard Report within a single-column table Scan Rate: 99% Scan Accuracy (without punctuations): 100% Scan Accuracy (with punctuations): 99% Source: Overview on Japan Pension System by Ministry of Health, Labour, and Welfare Correct Incorrect Not Scanned
  • 21. Appendix 1 – Sample Scanning 3 21 Standard Report within a single-column table and a standard paragraph Scan Rate: 100% Scan Accuracy (without punctuations): 99% Scan Accuracy (with punctuations): 99% Source: Reinsurance Trend Report by SOMPO Japan Correct Incorrect Not Scanned
  • 22. Appendix 1 – Sample Scanning 4 22 Standard Report within a row-wide image Scan Rate: 100% Scan Accuracy (without punctuations): 100% Scan Accuracy (with punctuations): 99% Source: Overview on Japan Pension System by Ministry of Health, Labour, and Welfare Correct Incorrect Not Scanned
  • 23. Appendix 1 – Sample Scanning 5 23 Case Study Report in two columns with a row-wide image Scan Rate: 94% Scan Accuracy (without punctuations): 99% Scan Accuracy (with punctuations): 98% Source: IoT Case Study on Fujitsu i Network Systems by CISCO Solution Correct Incorrect Not Scanned
  • 24. Appendix 1 – Sample Scanning 6 24 Case Study Report in three columns with in-text images Scan Rate: 93% Scan Accuracy (without punctuations): 100% Scan Accuracy (with punctuations): 99% Source: IoT Case Study on Fujitsu i Network Systems by CISCO Solution Correct Incorrect Not Scanned
  • 25. Appendix 2 – Program Source Code (Python) 25 #coding:utf-8 import sys import json import base64 import requests def process_image(image_path): GOOGLE_CLOUD_VISION_API_URL = "https://vision.googleapis.com/v1/images:annotate?key=" GOOGLE_CLOUD_VISION_API_KEY = “you_need_a_real_API_key_here" REQUEST_HEADER = {'Content-Type': 'application/json'} # loading an image in binary image_base64 = str(base64.b64encode(open(image_path, 'rb').read()).decode("utf-8")) request_json = { 'requests': [ { 'image': { 'content': image_base64 }, 'features': [ { 'type': "TEXT_DETECTION", 'maxResults': 1 } ] } ] }
  • 26. Appendix 2 – Program Source Code (Python) cont’d 26 # prep & execution ocr_session = requests.Session() ocr_request = requests.Request("POST", GOOGLE_CLOUD_VISION_API_URL + GOOGLE_CLOUD_VISION_API_KEY, data=json.dumps(request_json), headers=REQUEST_HEADER) ocr_response = ocr_session.send(ocr_session.prepare_request(ocr_request), verify=True, timeout=60) # response if ocr_response.status_code == requests.codes.ok: print("Process Successful") with open("D:¥ocr_result.json", 'w', encoding="utf-8") as json_file: json.dump(ocr_response.json(), json_file, ensure_ascii=False, indent=4, sort_keys=True) return ocr_response.json() else: print("Process Failed") ocr_response.raise_for_status() return "error" if __name__ == '__main__': # Execute process_image with the file name passed as a command line parameter print("File name:" + sys.argv[1]) process_image(sys.argv[1])