Your computer can recognize your voice and detect words in a speech dictation, but can it truly understand the meaning of what you are saying? Can it analyze your intent and respond accordingly? You don’t need a PhD in artificial intelligence to integrate speech and natural language understanding in your projects. Microsoft Cognitive Services (aka “Project Oxford”) is a portfolio of cloud-based REST APIs and SDKs powered by Machine Learning which enable developers to write applications which understand the content within the rapidly growing set of multimedia data. Cognitive Services API services will help you understand and interact with audio, text, image, and video. In this session, we’ll start with an overview of available services for speech recognition and speech synthesis. Then we’ll explore through live demos how to leverage the Language Understanding Intelligent Service which lets you determine intent, detect entities in user speech and improve language understanding models to more efficiently work with user data. Lastly, we’ll leverage Computer Vision APIs to detect human faces, analyze the content of images, and perform Optical Character Recognition (OCR) to detect and analyze words within a photo. Come learn how your apps can tap into the same active learning services behind the brain of Cortana, and get started writing smart applications that can understand what your users are saying.
Potential of AI (Generative AI) in Business: Learnings and Insights
Cognitive Services: Building Smart Apps with Speech, NLP & Vision
1. Nick Landry
Senior Technical Evangelist – Microsoft
nick.landry@microsoft.com
Blog: AgeofMobility.com
@ActiveNick | github.com/ActiveNick
Microsoft Cognitive Services:
Building Smart Applications
with Speech, NLP & Vision
6. Bringing it all together
The Seeing AI App
Computer Vision, Image, Speech Recognition, NLP,
and ML from Microsoft Cognitive Services
Watch Video HereRead Blog Here
10. Computer Vision API
Distill actionable
information from
images
Video API
Analyze, edit, and
process videos within
your app
Face API
Detect, identify,
analyze, organize, and
tag faces in photos
Emotion API
Personalize
experiences with
emotion recognition
Vision
11.
12. Updated Computer Vision API
Content of Image:
Categories v0: [{ “name”: “animal”, “score”: 0.9765625 }]
V1: [{ "name": "grass", "confidence": 0.9999992847442627 },
{ "name": "outdoor", "confidence": 0.9999072551727295 },
{ "name": "cow", "confidence": 0.99954754114151 },
{ "name": "field", "confidence": 0.9976195693016052 },
{ "name": "brown", "confidence": 0.988935649394989 },
{ "name": "animal", "confidence": 0.97904372215271 },
{ "name": "standing", "confidence": 0.9632768630981445 },
{ "name": "mammal", "confidence": 0.9366017580032349,
"hint": "animal" },
{ "name": "wire", "confidence": 0.8946959376335144 },
{ "name": "green", "confidence": 0.8844101428985596 },
{ "name": "pasture", "confidence": 0.8332059383392334 },
{ "name": "bovine", "confidence": 0.5618471503257751,
"hint": "animal" },
{ "name": "grassy", "confidence": 0.48627158999443054 },
{ "name": "lush", "confidence": 0.1874018907546997 },
{ "name": "staring", "confidence": 0.165890634059906 }]
Describe
0.975 "a brown cow standing on top of a lush green field“
0.974 “a cow standing on top of a lush green field”
0.965 “a large brown cow standing on top of a lush green field”
15. Speech
Bing Spell
Check API
Detect and correct
spelling mistakes
within your app
Language Understanding
Intelligent Service
Teach your apps to
understand
commands from
your users
Web Language
Model API
Leverage the power
of language models
trained on web-scale
data
Linguistic
Analysis API
Easily parse complex
text with language
analysis
Text Analytics
API
Detect sentiment,
key phrases, topics,
and language from
your text
Language
16. Reduce labeling effort with interactive featuring
Seamless integration to Speech API
Deploy using just a few examples with active learning
Supports 5 languages (English, Chinese, Italian, French, Spanish)
Language Understanding Models
33. Online Microsoft training delivered by experts
to help technologists continually learn
Hundreds of courses for developers, IT Pros,
students, entrepreneurs and enthusiasts
11 different languages
3M+ students registered
Build your own Learning Plan
All free!
http://mva.microsoft.com
34. • Universal Windows App Development
with Cortana and the Speech SDK
• Available for on-demand viewing now:
http://aka.ms/CortanaMVA
35. • Channel 9 Show
• Visual Studio Toolbox
with Robert Green
• New Voice Commands
• Integration with Cortana’s canvas
• Background Voice Commands
• Continuous dictation
• Poutine in Montreal!
https://channel9.msdn.com/Shows/Visual-Studio-Toolbox/App-Development-with-Cortana
More Cortana Dev on Windows 10
36. Thank You!
Slides are in SlideShare. Demos are on GitHub.
Contact me and let me know what you build, I will be happy to help promote your apps.
Blog: AgeofMobility.com
Twitter: @ActiveNick
Email: nick.landry@microsoft.com
Apps: www.bigbaldapps.com
LinkedIn: linkedin.com/in/activenick
GitHub: github.com/ActiveNick
Slideshare: slideshare.net/ActiveNick