SlideShare a Scribd company logo
1 of 52
智能音箱技術簡介
Victor Sue
Agenda
• What is Smart Speaker
• MEMS Microphone Array Technologies
• Cloud Voice Service
Agenda
• What is Smart Speaker
• MEMS Microphone Array Technologies
• Cloud Voice Service
What is Smart Speaker?
What is Smart Speaker? (cont.)
"Alexa, what's the
weather?"
"Alexa, show my
calendar."
"Alexa, show me my
timers."
"Alexa, play some
music."
"Alexa, read my
notifications"
"Alexa, what's in the
news?"
"Alexa, find me a nearby
pizza restaurant"
"Alexa, ask Uber to
request a ride"
Voice Portal
What is Smart Speaker? (cont.)
What is Smart Speaker? (cont.)
ASR
STT
TTS
Hotword
Detection
MEMS
MIC
MIC
Array
NLP
DNN
RNN
DOA
VAD
Audio
Beamfr-
oming
SVM
VAD (Voice activity detection)
Hotword Detection
Raw Audio
16KHz
16bit
mono
Precise
Engine
Combine
Score
Pre-Emphasis
Framing and
Windowing
FFT
Mel Filter
Bank
Logarithm DCT MFCC Features
MFCC
DNN
Feature Extraction
Posterior Handling
Deep Neural Network
Agenda
• What is Smart Speaker
• MEMS Microphone Array Technologies
• Cloud Voice Service
Microphone Application
Laptop PC
Mobile Phone
Digital Camera
Gaming
Voice Assistant
ECM Microphone
 A type of electrostatic capacitor-based microphone which eliminates the need for
a polarizing power supply by using a permanently charged material.
 Built-in a FET as the amplifier.
 Condenser diaphragm and plastic chassis are heat sensitive and should be careful
while soldering.
ECM Mic Circuit
ECM Mic Appearance ECM Mic Structure
MEMS Microphone
 3 types of MEMS microphone technology
 Piezoelectric (壓電式) (low sensitivity, high system noise)
 Piezoresistive (壓組式) (low sensitivity, high system noise)
 Capacitive (high sensitivity, low power consumption and low system noise)
 Capacitive type of MEMS is the main stream in the market.
ECM Mic. vs MEMS Mic.
 MEMS mics could offer the same SNR with much smaller size compare to traditional ECM mics.
 MEMS mics also provide much better consistent response to sound across all operating temperatures.
Reference of Eetimes: http://www.eetimes.com/document.asp?doc_id=1280170
ECM Mic. vs MEMS Mic.
Reference of DIGITIMES
比較項目 MEMS麥克風 ECM麥克風
元件尺寸 較小 較大
組裝方式 SMT自動組裝 人工組裝為主
操作溫度 可至攝氏200度以上 攝氏85度以上失真
防震抗撞 優 差
防EMI 優 差
防RFI 優 差
產品價格 較高 較低
MEMS Microphone Advantage
• MEMS Mic vs. Traditional Mic
• Increase performance
• Enhance manufacturability
• Reduce size
• Reduce costs
MEMS Microphone Suppliers
 Knowles
 InvenSense
 STMicroelectronics
 AAC Technologies
 Goertek
MEMS Microphone Market
MEMS Microphone Market (cont.)
• MEMS Microphone Applications
Microphone Array System diagram
Microphone Array Key Feature
• Higher Directivity
• Higher SNR
• Full Receiving Range
• Locate the Signal Source
• Smaller Size
• Acoustic Software Technologies
• Echo Cancellation
• Beamforming
• Sound Source localization
• Sensor Assistant
Beamforming Microphone Design
• Beamforming Filter(Delay of Sum)
Beamforming Microphone Design (cont.)
• Beamformer is the Spatial Filtering.
• Base on these two idea
• Narrow band signal
• Far field plane wave
• Selectively amplify a sound source
at a particular location
• Take advantage of sound
propagation through space
• Use Delay Sum Beamforming
Beamforming Microphone Design (cont.)
• Endfire Microphone array
• Algorithm: sum of signal in front microphone
and inverted delay rear microphone.
• Distance: microphones shall be match to
sample rate for correct delay of sampling
Distance: Sound speed * sample time * n of sample
• Pattern: cardioid :180o : no Signal
( frequency < Aliasing frequency )
34300*1/(48000)*3=2.14
(cm)
A
B
SUM
Beamforming Microphone Design (cont.)
2 Microphone endfire
6db 12db
42.5mm
1
2
3
4
5
6
Beamforming Microphone placement design
• 2-Order Endfire Array *6
• Sample Rate: 22Khz
• Distance: 42.5mm (3 sample delay)
• Null Frequency : 4.2Khz
• Coverage: 360 degree
• Cost: Reused the central Mic
MIC array ID consideration
Agenda
• What is Smart Speaker
• MEMS Microphone Array Technologies
• Cloud Voice Service
Cloud Voice Service
• Speech Recognition service(Speech to Text)
• NLP service (Natural Language Processing)
• TTS service (Text to Speech)
Cloud Voice Service (cont.)
Voice
Text
Intent
Feedback Activity
Device
Cloud
Cloud Voice Service (cont.)
• Amazon Alexa Voice Service
• Google Voice Service
• Baidu DuerOS
• Microsoft Voice Services
• IBM Watson Voice Service
• Nuance Communications
• Internal of Amazon AVS
Amazon Alexa Voice Service
• Alexa Voice Service(AVS)
• voice recognition service
• natural language understanding
service
• For voice-enable connected device
• Alexa Skills Kit
• API for voice application
• Include the ability to play music,
answer general questions, set an
alarm or timer, and more.
Amazon Alexa Voice Service (cont.)
Amazon Alexa Voice Service (cont.)
• Application Case
Amazon Echo Vehicle Connectivity
( BMW / Ford / Hyundai )
Google Voice Service
• Actions on Google
• Design VUI
• work with the Google Assistant
• Support Google Home
• Support DiaglogFlow
• Support Firebase
• Google Speech Recognition
• Google TTS
• Google Natural Language API
Google Voice Service (cont.)
• Application Case
Google Home
Baidu DuerOS
• DuerOS Open Platform
• 2017年1月由百度度秘推出
• 結合人工智慧對話系統與智慧家電的作業系統
• DIDP(DuerOS Intelligent Devices Platform)
• 為提供智慧設備可對話的能力,通過集成DuerOS
智能硬件與開放接口的能力,提供使用者以下的
操作體驗
1. 通過語音控制設備來播放音樂、查詢天氣及
最新新聞,獲取交通情況以及通用知識詢問
2. 通過語音來設置鬧鐘、提醒
3. 通過語音來獲取服務,如叫車、訂外賣等
4. 通過語音來獲取來自百度第三方合作夥伴創
建的技能
Baidu DuerOS (cont.)
• 技術架構
應用層
小度智能設備
開放平台
核心層
小度對話
核心系統
能力層
小度技能
開放平台
場景應用參考設計
核心接入組件
晶片模組
開發套件
SDK
麥克風陣列
機構設計
工業設計
音響設計
對話服務(DuerOS Conversational Service)
技能框架 (DuerOS Bot Framework)
語音識別 語音播報 屏幕顯示
原生技能 第三方技能
技能開發工具
Baidu DuerOS (cont.)
• 開發流程
開發者認證
選擇
場景
手機
音箱
冰箱
電視
故事機
輕量設備
Android
Linux
mbedOS
服務配置
裝置名稱
基本配置
OAUTH
配置
下載
SDK
Microsoft Voice Service
• Cognitive API with Microsoft Azure
• Speech API
• Speech to Text
• Text to Speech
• Speaker Recognition
• Speech Translation
• Language API
• Text Analytics
• Translator Text
• Bing Spell Check
• Language Understanding
• Content Moderator
Microsoft Voice Service (cont.)
• Microsoft Cortana
IBM Watson Voice Service
• Voice Agent with Watson
• Improve telephone-based customer
service
• based on IBM Voice Gateway
• Support Service Orchestration Engine
• Speech to Text API
• Text to Speech API
• Natural Language Classifier
• Interpret and classify natural
language with confidence.
IBM Watson Voice Service (cont.)
• Application case
IBM Watson-powered
driverless electric bus
Softbank Pepper robot
Nuance Communications
• The company provide the software voice technology
used for Samsung’s S Voice and Apple’s Siri.
• Voice Search
• Intelligent Personal Assistance
• Knowledge Navigator
• Natural Language User Interface
• Dragon Software Developer Kit
• Dragon Mobile SDK
• PC Recognition Software
Internal of Amazon AVS
Internal of Amazon AVS (cont.)
Amazon Skill Store
Internal of Amazon AVS (cont.)
• AVS flow
Internal of Amazon AVS (cont.)
• AVS flow (cont.)
• AVS response
HTTP/1.1 200 OK
Content-Type: multipart/related; boundary={BOUNDARY TERM}
--{BOUNDARY TERM}
Content-Type: application/json; charset=UTF-8
{ "messageHeader": {}
"messageBody" {
"directives": { {
"namespace": "Control",
"name": "TurnOnOffRequest",
"action": "TURN_ON",
"applianceID: "4704F880-6026-11E5", : {
. . .
Internal of Amazon AVS (cont.)
• AVS flow (cont.)
Voice
Intent
Internal of Amazon AVS (cont.)
• AVS flow (cont.)
Alexa Connected Home (CoHo) Skills
WeMo Lighting Skills
Intent
Activity
Internal of Amazon AVS (cont.)
• Custom Skill
• Control LED ON/OFF on
Realtek Ameba board
Voice Intent
Trigger
Skill
MQTT
AIA讀書會-智慧音箱技術簡介

More Related Content

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Featured

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

Featured (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

AIA讀書會-智慧音箱技術簡介

  • 2. Agenda • What is Smart Speaker • MEMS Microphone Array Technologies • Cloud Voice Service
  • 3. Agenda • What is Smart Speaker • MEMS Microphone Array Technologies • Cloud Voice Service
  • 4. What is Smart Speaker?
  • 5. What is Smart Speaker? (cont.) "Alexa, what's the weather?" "Alexa, show my calendar." "Alexa, show me my timers." "Alexa, play some music." "Alexa, read my notifications" "Alexa, what's in the news?" "Alexa, find me a nearby pizza restaurant" "Alexa, ask Uber to request a ride" Voice Portal
  • 6. What is Smart Speaker? (cont.)
  • 7. What is Smart Speaker? (cont.) ASR STT TTS Hotword Detection MEMS MIC MIC Array NLP DNN RNN DOA VAD Audio Beamfr- oming SVM
  • 8. VAD (Voice activity detection)
  • 9. Hotword Detection Raw Audio 16KHz 16bit mono Precise Engine Combine Score Pre-Emphasis Framing and Windowing FFT Mel Filter Bank Logarithm DCT MFCC Features MFCC DNN Feature Extraction Posterior Handling Deep Neural Network
  • 10. Agenda • What is Smart Speaker • MEMS Microphone Array Technologies • Cloud Voice Service
  • 11. Microphone Application Laptop PC Mobile Phone Digital Camera Gaming Voice Assistant
  • 12. ECM Microphone  A type of electrostatic capacitor-based microphone which eliminates the need for a polarizing power supply by using a permanently charged material.  Built-in a FET as the amplifier.  Condenser diaphragm and plastic chassis are heat sensitive and should be careful while soldering. ECM Mic Circuit ECM Mic Appearance ECM Mic Structure
  • 13. MEMS Microphone  3 types of MEMS microphone technology  Piezoelectric (壓電式) (low sensitivity, high system noise)  Piezoresistive (壓組式) (low sensitivity, high system noise)  Capacitive (high sensitivity, low power consumption and low system noise)  Capacitive type of MEMS is the main stream in the market.
  • 14. ECM Mic. vs MEMS Mic.  MEMS mics could offer the same SNR with much smaller size compare to traditional ECM mics.  MEMS mics also provide much better consistent response to sound across all operating temperatures. Reference of Eetimes: http://www.eetimes.com/document.asp?doc_id=1280170
  • 15. ECM Mic. vs MEMS Mic. Reference of DIGITIMES 比較項目 MEMS麥克風 ECM麥克風 元件尺寸 較小 較大 組裝方式 SMT自動組裝 人工組裝為主 操作溫度 可至攝氏200度以上 攝氏85度以上失真 防震抗撞 優 差 防EMI 優 差 防RFI 優 差 產品價格 較高 較低
  • 16. MEMS Microphone Advantage • MEMS Mic vs. Traditional Mic • Increase performance • Enhance manufacturability • Reduce size • Reduce costs
  • 17. MEMS Microphone Suppliers  Knowles  InvenSense  STMicroelectronics  AAC Technologies  Goertek
  • 19. MEMS Microphone Market (cont.) • MEMS Microphone Applications
  • 21. Microphone Array Key Feature • Higher Directivity • Higher SNR • Full Receiving Range • Locate the Signal Source • Smaller Size • Acoustic Software Technologies • Echo Cancellation • Beamforming • Sound Source localization • Sensor Assistant
  • 22. Beamforming Microphone Design • Beamforming Filter(Delay of Sum)
  • 23. Beamforming Microphone Design (cont.) • Beamformer is the Spatial Filtering. • Base on these two idea • Narrow band signal • Far field plane wave • Selectively amplify a sound source at a particular location • Take advantage of sound propagation through space • Use Delay Sum Beamforming
  • 24. Beamforming Microphone Design (cont.) • Endfire Microphone array • Algorithm: sum of signal in front microphone and inverted delay rear microphone. • Distance: microphones shall be match to sample rate for correct delay of sampling Distance: Sound speed * sample time * n of sample • Pattern: cardioid :180o : no Signal ( frequency < Aliasing frequency ) 34300*1/(48000)*3=2.14 (cm) A B SUM
  • 25. Beamforming Microphone Design (cont.) 2 Microphone endfire 6db 12db
  • 26. 42.5mm 1 2 3 4 5 6 Beamforming Microphone placement design • 2-Order Endfire Array *6 • Sample Rate: 22Khz • Distance: 42.5mm (3 sample delay) • Null Frequency : 4.2Khz • Coverage: 360 degree • Cost: Reused the central Mic
  • 27. MIC array ID consideration
  • 28. Agenda • What is Smart Speaker • MEMS Microphone Array Technologies • Cloud Voice Service
  • 29. Cloud Voice Service • Speech Recognition service(Speech to Text) • NLP service (Natural Language Processing) • TTS service (Text to Speech)
  • 30. Cloud Voice Service (cont.) Voice Text Intent Feedback Activity Device Cloud
  • 31. Cloud Voice Service (cont.) • Amazon Alexa Voice Service • Google Voice Service • Baidu DuerOS • Microsoft Voice Services • IBM Watson Voice Service • Nuance Communications • Internal of Amazon AVS
  • 32. Amazon Alexa Voice Service • Alexa Voice Service(AVS) • voice recognition service • natural language understanding service • For voice-enable connected device • Alexa Skills Kit • API for voice application • Include the ability to play music, answer general questions, set an alarm or timer, and more.
  • 33. Amazon Alexa Voice Service (cont.)
  • 34. Amazon Alexa Voice Service (cont.) • Application Case Amazon Echo Vehicle Connectivity ( BMW / Ford / Hyundai )
  • 35. Google Voice Service • Actions on Google • Design VUI • work with the Google Assistant • Support Google Home • Support DiaglogFlow • Support Firebase • Google Speech Recognition • Google TTS • Google Natural Language API
  • 36. Google Voice Service (cont.) • Application Case Google Home
  • 37. Baidu DuerOS • DuerOS Open Platform • 2017年1月由百度度秘推出 • 結合人工智慧對話系統與智慧家電的作業系統 • DIDP(DuerOS Intelligent Devices Platform) • 為提供智慧設備可對話的能力,通過集成DuerOS 智能硬件與開放接口的能力,提供使用者以下的 操作體驗 1. 通過語音控制設備來播放音樂、查詢天氣及 最新新聞,獲取交通情況以及通用知識詢問 2. 通過語音來設置鬧鐘、提醒 3. 通過語音來獲取服務,如叫車、訂外賣等 4. 通過語音來獲取來自百度第三方合作夥伴創 建的技能
  • 38. Baidu DuerOS (cont.) • 技術架構 應用層 小度智能設備 開放平台 核心層 小度對話 核心系統 能力層 小度技能 開放平台 場景應用參考設計 核心接入組件 晶片模組 開發套件 SDK 麥克風陣列 機構設計 工業設計 音響設計 對話服務(DuerOS Conversational Service) 技能框架 (DuerOS Bot Framework) 語音識別 語音播報 屏幕顯示 原生技能 第三方技能 技能開發工具
  • 39. Baidu DuerOS (cont.) • 開發流程 開發者認證 選擇 場景 手機 音箱 冰箱 電視 故事機 輕量設備 Android Linux mbedOS 服務配置 裝置名稱 基本配置 OAUTH 配置 下載 SDK
  • 40. Microsoft Voice Service • Cognitive API with Microsoft Azure • Speech API • Speech to Text • Text to Speech • Speaker Recognition • Speech Translation • Language API • Text Analytics • Translator Text • Bing Spell Check • Language Understanding • Content Moderator
  • 41. Microsoft Voice Service (cont.) • Microsoft Cortana
  • 42. IBM Watson Voice Service • Voice Agent with Watson • Improve telephone-based customer service • based on IBM Voice Gateway • Support Service Orchestration Engine • Speech to Text API • Text to Speech API • Natural Language Classifier • Interpret and classify natural language with confidence.
  • 43. IBM Watson Voice Service (cont.) • Application case IBM Watson-powered driverless electric bus Softbank Pepper robot
  • 44. Nuance Communications • The company provide the software voice technology used for Samsung’s S Voice and Apple’s Siri. • Voice Search • Intelligent Personal Assistance • Knowledge Navigator • Natural Language User Interface • Dragon Software Developer Kit • Dragon Mobile SDK • PC Recognition Software
  • 46. Internal of Amazon AVS (cont.) Amazon Skill Store
  • 47. Internal of Amazon AVS (cont.) • AVS flow
  • 48. Internal of Amazon AVS (cont.) • AVS flow (cont.) • AVS response HTTP/1.1 200 OK Content-Type: multipart/related; boundary={BOUNDARY TERM} --{BOUNDARY TERM} Content-Type: application/json; charset=UTF-8 { "messageHeader": {} "messageBody" { "directives": { { "namespace": "Control", "name": "TurnOnOffRequest", "action": "TURN_ON", "applianceID: "4704F880-6026-11E5", : { . . .
  • 49. Internal of Amazon AVS (cont.) • AVS flow (cont.) Voice Intent
  • 50. Internal of Amazon AVS (cont.) • AVS flow (cont.) Alexa Connected Home (CoHo) Skills WeMo Lighting Skills Intent Activity
  • 51. Internal of Amazon AVS (cont.) • Custom Skill • Control LED ON/OFF on Realtek Ameba board Voice Intent Trigger Skill MQTT