5. Naïve Bayes Algorithm
Transfer learning
Apriori Algorithm
Gaussian distribute
Random Forests
Logistic Regression
(Deep)Neural Networks
Decision Trees
Nearest Neighbour
Support Vector Machine K Means Algorithm
Linear Regression
Active learning
Domain adaptation
Semi-supervised learningReinforcement learning
unsupervised learningsupervised learning
9. 9
Emotion
Health Care
Education
Voice Recognition
Symptom diagnosis
Behavior Activity
Image Recogn
Medical
IBM Pathway Genomics
Detection of Diabetic
Retinopathy in Retinal
Fundus Photographs
customer behavior
Medical Imaging
Genomic Medicine
跨領域整合 – 與人相關
10. What do I do ?
&
What am I going to share ?
10
12. 12
Seek a window into human mind and traits…
…through engineering approach
S. Narayanan and P. G. Georgiou, “Behavioral signal processing: Deriving human behavioral informatics
from speech and language," Proceedings of the IEEE, vol. 101, no. 5, pp. 1203–1233, 2013.
13. 13
Behavioral Signal Processing (BSP)
Compute Human Behavior Traits and States for Domain Experts Decision Making
• Help experts to do things they know in a more efficient manner at scale
• Develop novel behavioral analytics framework for possible scientific discovery
from qualitative to quantitative . . .
through verbal and non-verbal behavioral cues . . .
33. QUANTITATIVE:
QUANTITATIVE EVIDENCE DIRECTLY FROM MEASURABLE SIGNALS
EFFICIENCY :
HELP DO THINGS THAT EXPERTS KNOW TO DO WELL MORE
EFFICIENTLY, CONSISTENTLY & AT SCALE
SUPPLMENTARY:
COMPLEMENT WITH GOLD STANDARD METHOD WHEN APPROPRIATE
POSSIBILITY:
TOOLS FOR NOVEL ACTIONABLE INSIGHT DISCOVERY
33
COMPUTING BEHAVIORAL TRAITS & STATES FOR DECISION MAKING & ACTION
…aim..
34. 34
BSP的Enablers . . . (半邊的拼圖)
Text
Processing
Voice Activity
Detection
Alignment
Transcription
Keyword
Spotting
Prosody
Modeling
Voice QualityDiarization
Speaker
Identification
Dialog Act
Tagging
Face
Detection
Expression
recognition
Action
recognition
Language
Understandin
Affective
Computing
Speaker State
and Trait
Joint Speech
Visual
Processing
Interaction
Modeling
Sentiment
Analysis
35. 35
訊號處理、機器學習
Enabling Technologies
領域專家知識
Domain Experts Knowledge
Low level
descriptors
Acoustic
features
Motion
features
Text
features
Image
features
Speech
recognition
Face
recognition
Action
recognition
Dialog act
tagging
Keyword
spotting
Text
processing
Sentiment
Analysis
Affect
recognition
Speaker
states and
traits
Visual-
speech
processing
Interaction
modeling
Subjective
assessment
Internal state
& construct
Neuro-
developmen
tal disorder
Evidence-
based
observational
coding
Intervention
efficacy
Coder
variability
control
Development
of coding
manual
Self report
measure
validity
Coding
mechanism
Social
behavior
Affective
behavior
Communica
tive
behavior
Dyadic
behavior
人類訊號處理
40. 40
Computational Methods that Model Human Behavior Signals
• Manifested in Overt and Covert Cues
• Processed and Used by Humans Explicitly or Implicitly
• Facilitate Human Analysis and Decision Making
Outcome of Behavioral Signal Processing
• Behavioral Analytics
QUANTIFYING HUMAN EXPRESSED BEHAVIOR AND
HUMAN “FELT SENSE”
DERIVING INTERPRETABLE BEHAVIOR ANALYTICS
FROM DATA FOR ACTIONAL INSIGHTS
56. 56
social-communicative neurodevelopmental disorder
• Prevalence: 1 in 68 children (1 in 42 males) diagnosed [CDC2014]
• ASD: “Spectrum” disorder due to the extreme heterogeneity
• Intervention leads to improved outcomes
BSP in Autism 中的角色?
What is Autism?
57. 57
ROLE OF BSP?
自動的分析醫生小孩在ADOS診斷中互動中 social and
interactive 行為
AIM?
• Analysis at scale
• Quantitative evidence from signals
• New finding beyond current status-quo
in psychiatry (?)
60. 60
Can we?
Automatic measuring spontaneous social (verbal/nonverbal) behavior between
clinician and child predicting the child rating of atypical amount of social
reciprocal communication
from qualitative to quantitative . . .
through verbal and non-verbal behavioral cues . . .
從聲音影像,開發醫生小孩社交互動行為指標,用以分析預測 相互性社會溝通數量
69. 69
where
when
how
BIIC:無聲隔離室
本來:無限制國家教育院的教室
只好:盡量不要發聲的教室
Ensure current system is not altered too much at the BEGINNING
at-scale, ease-of-application is crucial
在ecological validity & quality control 之中有拿捏
BIIC:每個校長在培訓班中的考試
本來:無
只好:在人力可以範圍內全部錄
BIIC:耳麥、多軌錄音、臉部、肢體動作,
Kinect、全部synchronized
本來:無器材、要可以用簡單人力做
只好:上半身錄影外接麥克風收音
84. Autism Diagnostic Observation Schedule [Lord 2001]
• Subject interacts with a psychologist for ~45 minutes
• Current gold standard, research-level observational coding
• Psychologists are trained using stringent training protocol
• Semi-structured assessment in eliciting socio-communicative
behavior of the ASD children for diagnostics
• Multiple subparts events (14) on rating of a wide range number of
socio-communicative behavior (28)
84
91. • Speech signal per session
• Energy every frame
– frame = 25ms
– standard deviation (normalize D.C. offset)
• 閥值Threshold
– speech percentage in the wav
• Speech Segments
– Energy > Threshold Energy
Short-Time energy
Formula:
𝑬 𝒏 =
𝒎=𝒏−𝑵+𝟏
𝒏
𝒙 𝟐
(𝒎)
簡單的聲音偵測器
98. 98
Clustering
speaker change
detection
1. Generate i-vector for each ‘segment’
2. Compute pair-wise similarity each cluster
3. Merge closest clusters
4. Update distances of remaining clusters to
new cluster
5. Iterate steps 2-4 until stopping criterion is
met
126. 126
比較快可以上手算
Versatile and Fast Audio Feature
Extractor
Open-Source and Cross-platform
Abundant speech-related features
Signal energy Loudness、
Mel-spectra、MFCC、PLP-
CC、Pitch
Audio I/O
Supported A lot I/O formats: WEKA
HTK LibSVM
可直接視覺化
稍微容易一點
PraatOpensmile
其實還很多啦 . . .
127. 127
低階訊號描述值
編碼/Profile
影像特性
Histogram of oriented gradients (HoG)
Scale-invariant feature transform (Sift)
Local binary pattern (Lbp)
3D SIFT
HOG3D
texture、shape、keypoint、edge
比較常來形容影像(照片) frame
Histogram of oriented gradients (HoG) Local binary pattern (Lbp)
137. 137
Term Weighting Method
a simplifying representation by term count
Term Frequency
How important (or
informative) a word in a document.
Inverse Document Frequency
How important (or
informative) a word in the corpus.
𝑡𝑓𝑡,𝑑
=
𝑛 𝑡,𝑑
𝑘 𝑛 𝑘,𝑑
𝑖𝑑𝑓𝑡,𝐷
= log
𝑁
1 + 𝑑 ∈ 𝐷 ∶ 𝑡 ∈ 𝑑
X
Term Frequency–Inverse Document Frequency (TF-IDF)
有時候就很有效了
138. 138
不一定依一個詞為單位 . . .
N-gram
Turn unigram term into bigram term on the word token step
for instance,
John also likes to watch football games
[ 'John also' , 'also likes' , 'likes to' , 'to watch' , 'watch football' , 'football
games' ]
[ 1 , 1 , 1 , 1 , 1 , 1 ]
可以無限延伸這些東西
那也希望能夠透過這
樣子的一個方式來…
提升我們老師的教學
文
字
139. 139
Distributed word representation
用向量表達一個字(詞)
CBOW predicting the word given its context
Skip-gram predicting the context given a word
distributed representation encoded in the hidden layer of the neural
network as representations of words
164. 164
ADOS
Emotion Part
Multimodal Turn-taking Behavior
Coordination Time Series
Automatic generating a time-series of
multimodal behavior coordination measure
across a session . . .
167. 167
ADOS
Emotion Part
Multimodal Turn-taking Behavior
Coordination Time Series
Automatic generating a time-series of
multimodal behavior coordination measure
across a session . . .
197. 197
Psychologists unconsciously alter communicative social behavior strategy (cueing
behavior?) as conditioned on ASD kids ability to carry out reciprocal communication
during interaction
200. 200
Descriptor’s
Included
Child Prosody Psych Prosody Child and Psych
Prosody
Spearman’s ρ 0.64*** 0.79*** 0.67***
Psychologists acoustics at least as predictive of child ASD severity ratings
跟以前英文ADOS發現有類似!
[1] Daniel Bone, Chi-Chun Lee, Matthew P. Black, Marian E. Williams, Pat Levitt, Sungbok Lee, and Shrikanth Narayanan, "The Psychologist as
an Interlocutor in Autism Spectrum Disorder Assessment: Insights from a Study of Spontaneous Prosody", Journal of Speech, Language, and
Hearing Research, 2014, 57(4), 1162-1177.
Hard to obtained scientific insights without such behavioral analytics for
domain experts
NEED MORE VERIFICATION
202. Is it Technical? Example Pitfall 1
Controlling for Channel Factors
• Interspeech 2013 Autism Challenge
• Baseline Approach
Black-box (works well)
2-class baseline: 92.8% UAR (chance is 50% UAR)
• Hypothesis: Model captures channel, not diagnosis
ASD/SLI from 2 clinics, TD from classrooms
• Simple experiment showed channel differences
Matched baseline
• Conclusion: Remit (or note) noise sources in data collection.
202
Daniel Bone, Theodora Chaspari, Kartik Audkhasi, James Gibson, Andreas Tsiartas, Maarten Van Segbroeck, Ming Li, Sungbok Lee, and Shrikanth
Narayanan, "Classifying Language-Related Developmental Disorders from Speech Cues: the Promise and the Potential Confounds", InterSpeech, 2013.
11/11/2014
204. Is it Technical: Example Pitfall 2
Behavior Analysis & Modeling: Cross-validation
They do not perform speaker-separated cross-fold
validation!
• Can we detect United States Senators’ party affiliations
from speech features (with black-box approach)?
Performance increases as # samples/speaker
increases
Conclusion: Always perform speaker-separated
cross-validation!
20411/11/2014
212. 212
Behavioral Signal Processing (BSP)
Compute Human Behavior Traits and States for Domain Experts Decision Making
• Help experts to do things they know in a more efficient manner at scale
• Develop novel behavioral analytics framework for possible scientific discovery
from qualitative to quantitative . . .
through verbal and non-verbal behavioral cues . . .
Transformative effort . . .
213. 213
OF
FOR
BY
COMPUTING
HUMANS
Human action and behavior data
Meaningful analysis, timely decision making &
intervention (action)
Collaborative integration of human expertise
with automated processing
By professor Shrikanth Narayanan
214. 214
訊號處理、機器學習
Enabling Technologies
領域專家知識
Domain Experts Knowledge
Low level
descriptors
Acoustic
features
Motion
features
Text
features
Image
features
Speech
recognition
Face
recognition
Action
recognition
Dialog act
tagging
Keyword
spotting
Text
processing
Sentiment
Analysis
Affect
recognition
Speaker
states and
traits
Visual-
speech
processing
Interaction
modeling
Subjective
assessment
Internal state
& construct
Neuro-
developmen
tal disorder
Evidence-
based
observational
coding
Intervention
efficacy
Coder
variability
control
Development
of coding
manual
Self report
measure
validity
Coding
mechanism
Social
behavior
Affective
behavior
Communica
tive
behavior
Dyadic
behavior
人類訊號處理
Relative New:
RICH R&D
OPPORTUNITIES
(CHALLENGES)