The document discusses Google Cloud Platform machine learning capabilities for unstructured data like text, speech and images. It introduces the Cloud Vision, Speech and Translate APIs which provide pre-trained machine learning models through REST interfaces to understand unstructured data without requiring ML expertise. Examples are given of using the APIs for tasks like content moderation, sentiment analysis and extracting text/metadata from images.
3. Confidential & ProprietaryGoogle Cloud Platform 3
Your data spans Text, Speech and Images
> 2 Billion Images/ daily> 2 Million Blog Posts daily
> 400 Million Social Media Posts
20% of Mobile Searches
4. Confidential & ProprietaryGoogle Cloud Platform 4
What can we do with all this data?
Moderate Content
Understand Sentiment
Structured Metadata
6. Confidential & ProprietaryGoogle Cloud Platform 6
Pre-Trained Machine Learning Models
Cloud
Vision
Cloud
Translate
Cloud
Speech
Fully trained ML models from Google Cloud that allow a general developer to take advantage of
rich machine learning capabilities with simple REST based services.
Stay tuned...
7. 77
Ready to use Machine
Learning models
Use your own data to
train models
Cloud
Vision API
Cloud
Speech API
Cloud
Translate API
Cloud Machine Learning
Develop - Model - Test
Google
BigQuery
Stay
Tuned….
Cloud
Storage
Cloud
Datalab
9. Confidential & ProprietaryGoogle Cloud Platform 9
Faces
Faces, facial landmarks, emotions
OCR
Read and extract text, with
support for > 10 languages
Cloud Vision API
Call API from anywhere, with support for embeddable images, and Google Cloud storage
Label
Detect entities from furniture to
transportation
Logos
Identify product logos
Landmarks & Image Properties
Detect landmarks & dominant
color of image
Safe Search
Detect explicit content - adult,
violent, medical and spoof
10. Confidential & ProprietaryGoogle Cloud Platform 10
API Usage: Detect Objects in an Image
Image Detected
Items
Vision API
Create JSON
request with the
image or pointer
to an image
Process
the JSON
response
Call the
REST API1 2 3
11. Use the Vision API - Python example
# Setup the service request for an embedded image
service_request = service.images().annotate(body={
'requests': [{
'image': {
'content': image_content.decode('UTF-8')
},
'features': [{
'type': 'LABEL_DETECTION',
'maxResults': 1
}]
}]
})
# Process the results
response = service_request.execute()
label = response['responses'][0]['labelAnnotations'][0]['description']
12. Confidential & ProprietaryGoogle Cloud Platform 12
Use Case: Image Content Moderation
Examples
● User manages a large set of images, that are crowd-
sourced.
● Identify potential explicit content on images that are
uploaded.
Enabling Technology
● Powered by Google SafeSearch, detect
inappropriate content from adult to violent content
“As a company I must detect adult content, violent content, spoof images
and medical content to protect my consumers and my brand.”
13. Confidential & ProprietaryGoogle Cloud Platform 13
Use Case: Image Sentiment Analysis
Enabling technology
● Cloud-based API that provides the most advanced
algorithms for face and logo detection.
● Ability to identify emotional state of the
face - joy/sorrow/anger.
● Ability to identify popular product brand logos within
the image.
● Ability to draw the polybox around identified product.
“As a developer, applications I build should be able to detect faces and
emotional facial attributes and detect objects and logos.”
14. Confidential & ProprietaryGoogle Cloud Platform 14
Use Case: Image Metadata
Enabling technology
● Powered by the same technologies under Google
Photos, detect 1000s of everyday objects from
transportation to home interior
● Detect 1000s’ of manmade and natural landmarks
● Ability to draw the polybox around identified entity.
“As a developer, I want to understand the contents of the image from
everyday entities, to logos and landmarks
15. Confidential & ProprietaryGoogle Cloud Platform 15
Use Case: Extract Text
Enabling technology
● Read text from any image containing receipts,
invoices or scanned documents
● Supports variety of languages from English
to Chinese
● Granular text extraction from individual words
to text summary
“As a developer, I want to extract text from receipts, invoices and images
16. Confidential & ProprietaryGoogle Cloud Platform 16
Customer testimonials
We have drones that take thousands
of photos per flight. We find that
Google Cloud Vision API is the best
way to turn those huge number of
photos, automatically produced,
into meaningful insight.
Tomoaki Kobayakawa
General Manager,
Sony - Aerosense Inc.
We did the impossible:
ML without knowing
anything about ML.
David Zuckerman
Head of Developer
Experience, WIX.com
“
”
“
”
18. • Google Cloud Vision API provides the broadest set of vision scenarios
from one single API
1. Label Detection
2. OCR
3. Explicit Content Detection
4. Facial Detection
5. Landmark Detection
6. Logo Detection
• Integrated: Vision API is integrated with other Google Cloud platform products
• Easy to use API: Inline image with JSON Response
• Pay as you go model: Users to pay only for what they use with
no upfront commitments
Cloud Vision API Summary
20. Confidential & ProprietaryGoogle Cloud Platform 20
Recognize Speech
Streaming Recognition
Cloud Speech API
Call API from anywhere, with support for streaming audio, and Google Cloud storage
Transcribe Audio
Transcribe stored audio
Global
Supports > 80 languages
21. Confidential & ProprietaryGoogle Cloud Platform 21
API Usage: Understand Speech - Batch
Stored
Audio Recognized
text
Speech API
Create JSON
request with the
audio file and
language of audio
(default is en_US)
Process
the JSON
response
Call the
REST API1 2 3
22. Confidential & ProprietaryGoogle Cloud Platform 22
API Usage: Understand Speech - Streaming
Streaming
Audio Speech API gRPC
Recognized
Text
gRPC streaming
request with
initial context
Real time
streaming
results while
speaking
Bi-directional:
Streams audio
in while stream
text out
1 2 3
23. Use the Speech API - Python Example
# Setup the service request for an embedded audio file
with open(speech_file, 'rb') as speech:
speech_content = base64.b64encode(speech.read())
service = get_speech_service()
service_request = service.speech().recognize(
body={
'initialRequest': {
'encoding': 'LINEAR16',
'sampleRate': 16000
},
'audioRequest': {
'content': speech_content.decode('UTF-8')
}
})
response = service_request.execute()
print(json.dumps(response))
24. Confidential & ProprietaryGoogle Cloud Platform 24
Use Cases
● Voice enabling chat / messaging apps: Use voice commands to dictate messages and retrieve information
● Voice controlled games: Player can control the goings on in the game using select voice commands spoken into
a microphone
● Home automation: Monitoring and controlling all various devices in your home by using the sound of your voice,
the web or a variety of other interfaces
● Meeting analytics: Identify words, phrases and patterns that correlate with important customer actions, to drive
business results
● Call-center analytics: Listen to your business' everyday interactions, to improve your customer experience
Referenceable Customers
26. ● Global footprint: Recognizing over 80 languages and variants
● Highest quality voice recognition: Neural networks-based to continuously train and improve the API
● Fast: Streaming recognition to return partial recognition results immediately as they become available
rather than waiting for the user to stop speaking
● Accurate: Noisy audio handling to transcribe audio from many environments without requiring
additional noise cancellation on the developer’s side
● Both real-time and buffered audio: You can convert the audio from users dictating to an application’s
microphone, enable command-and-control through voice, or transcribe audio files, among many other
use cases. Multiple audio file formats are supported, including FLAC, AMR, PCMU/u-Law and linear-16.
Cloud Speech API Summary