SlideShare a Scribd company logo
1 of 14
Considering the subjectivity to rationalise evaluation approachesThe example of Spoken Dialogue Systems Marianne Laurent, Philippe Bretier (Orange Labs)  Ioannis Kanellos (Telecom Bretagne) 23 June 2010, Qomex 2010, Trondheim, Norway
? ? Spoken Dialogue Systems Spoken Language Understanding Automatic Speech Recognition Spoken Language Generation  Text-to Speech Evaluation ? « I can't Connect the Internet! » SPEECH UNDERSTANDING Dialogue Manager SYSTEM OUTPUT Information system Complex task	    - Dynamic interactions: no comparison to an ideal (fidelity)    - Diversity of evaluators profiles, individualities and evaluation situations
Internal review of evaluation methods: Ad hoc protocols depending on the evaluator profile… Laurent, M., Bretier, P. and Manquillet, C. (2010).  Ad-hoc evaluations along the lifecycle of industrial spoken dialogue systems: heading to harmonisation?. In LREC 2010. Malta.
Internal review of evaluation methods: Ad hoc protocols ... and on the evaluation context! http://www.slideshare.net/MarianneLo/lrecmlaurentposter Laurent, M., Bretier, P. and Manquillet, C. (2010).  Ad-hocevaluationsalong the lifecycle of industrialspoken dialogue systems: heading to harmonisation?. In LREC 2010. Malta.
Toward one-size-fits-all evaluation protocols? «  Research has exerted considerably effort and attention to devising evaluation metrics that allows for comparison of disparate systems with various tasks and domain. (Paek, 2007)   «  A critical obstacle to progress in this area is the lack of a general framework for evaluating and comparing the performance of different dialogue agents. (Walker et al., 1997) «  We see a multitude of highly interesting - but virtually incomparable – evaluation exercises, which address different aspects of quality, an which rely on different aspects evaluation criteria. (Möller, 2009)
Roadmap 1 Evaluation dependent on both context and evaluator 2 The evaluator as a mediator, an anthropocentric framework 3 Software implementation and anticipated added value
1 Evaluation, a rationalising contribution for a decision process Estimate material circumstances of the family Free examination Surmise what the family had been doing before the arrival of the unexpected visitor Give the age of the people Remember the clothes  worn by the people Yarbus, A. L. (1967),  Eye Movement and Vision, Plenum, New York. Remember positions of people and objects in the room
1 Evaluation, a goal driven argumentation discourse «  Process through which  one defines, obtains and delivers useful pieces of information  to settle between the alternative possible decisions. Daniel STUFFLEBEAM L'évaluation en éducation et la prise de décision,  1980, Ottawa, Edition NHP.
2 V-Model process to define of evaluation Nature of the decision to take Take the final decision Confront the results with initial objectives Identify the  objectives Meet the objectives? Define criteria Note on a grid of criteria Compare Deduce the  indicators Process data into indicators Top-down trend Situation interpreted into evaluation needs and procedure. Bottom-up trend  Value judgment: the evaluator creates a meaning. List the data to capture Capture the data Experimental  set-up
2 A meta-model to define evaluations  Interaction performance Interaction quality Efficiencyrelated aspects Utility & Usefulness Etc. Critical viewpoints Analysis Data-Driven Goal-Driven Data Processing Techniques Log Files Question-naires 3rd Party annotation Physio-metrics Capture
2 A mediator within an “evaluation ecosystem” Resources System of constraints Situation Demand system Community of practice Normative system Corpus of evaluations Rationalising system
3 Software implementation: MPOWERS Multi Point Of vieWEvaluation Refine Studio Define KPIs Retrieval of KPIs  & reports Log files Personalised  dashboards Third-party annotations Datamart User questionnaires KPIs,  an analytical statistical view on the system Data  as collected in evaluation campaigns Parameters,  a descriptive view on the system Dashboards,  Ad hoc selection of KPIs with potential graphics ITU-T Rec P.Supp.24:  Parametersdescribing the interaction with SDS
3 Added Value: Impact both for the individual and the belonging communities Contribution & Involvement COOPERATE: Contribute, as a knowledge-farming cooperative Evaluation definition & refinement CONNECT: Identify and create contact with relevant people.  Retrieval of evaluation results COLLABORATE:  ,[object Object]
 Discuss/negotiate to converge toward common practicesFeedback & Inspiration Communities of practice Communities of interest

More Related Content

Similar to Qomex2010

Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender SystemsKatrien Verbert
 
Towards the next generation of interactive and adaptive explanation methods
Towards the next generation of interactive and adaptive explanation methodsTowards the next generation of interactive and adaptive explanation methods
Towards the next generation of interactive and adaptive explanation methodsKatrien Verbert
 
Tenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia SlideshareTenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia Slideshareguest94c824
 
JISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analyticsJISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analyticsJames Ballard
 
Interactive recommender systems: opening up the “black box”
Interactive recommender systems: opening up the “black box”Interactive recommender systems: opening up the “black box”
Interactive recommender systems: opening up the “black box”Katrien Verbert
 
Home mess systems- Prototype 2 & Evaluation
Home mess systems- Prototype 2 & EvaluationHome mess systems- Prototype 2 & Evaluation
Home mess systems- Prototype 2 & Evaluationwow!systems
 
Ieml social recommendersystems
Ieml social recommendersystemsIeml social recommendersystems
Ieml social recommendersystemsAntonio Medina
 
FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...
FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...
FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...FOME2015
 
APPLYING QUALITATIVE RESEARCH IN E-LEARNING DISCUSSION AND FINDINGS FROM THR...
APPLYING QUALITATIVE RESEARCH IN E-LEARNING  DISCUSSION AND FINDINGS FROM THR...APPLYING QUALITATIVE RESEARCH IN E-LEARNING  DISCUSSION AND FINDINGS FROM THR...
APPLYING QUALITATIVE RESEARCH IN E-LEARNING DISCUSSION AND FINDINGS FROM THR...Monica Waters
 
Introduction to OpenSemcq
Introduction to OpenSemcqIntroduction to OpenSemcq
Introduction to OpenSemcqmbtosic
 
Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?Katrien Verbert
 
Discourse-Centric Learning Analytics
Discourse-Centric Learning AnalyticsDiscourse-Centric Learning Analytics
Discourse-Centric Learning AnalyticsSimon Buckingham Shum
 
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Community Development Society
 
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Scott Hutcheson, Ph.D.
 
Getting from There to Here: Eight Characteristics of Effective Economic & Com...
Getting from There to Here: Eight Characteristics of Effective Economic & Com...Getting from There to Here: Eight Characteristics of Effective Economic & Com...
Getting from There to Here: Eight Characteristics of Effective Economic & Com...Community Development Society
 
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Community Development Society
 
Content Analysis Overview for Persona Development
Content Analysis Overview for Persona DevelopmentContent Analysis Overview for Persona Development
Content Analysis Overview for Persona DevelopmentPamela Rutledge
 
Thesis Proposal: Understanding Audience Engagement Transmedia
Thesis Proposal: Understanding Audience Engagement TransmediaThesis Proposal: Understanding Audience Engagement Transmedia
Thesis Proposal: Understanding Audience Engagement TransmediaCameron Cliff
 

Similar to Qomex2010 (20)

Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
 
Towards the next generation of interactive and adaptive explanation methods
Towards the next generation of interactive and adaptive explanation methodsTowards the next generation of interactive and adaptive explanation methods
Towards the next generation of interactive and adaptive explanation methods
 
Tenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia SlideshareTenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia Slideshare
 
JISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analyticsJISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analytics
 
Interactive recommender systems: opening up the “black box”
Interactive recommender systems: opening up the “black box”Interactive recommender systems: opening up the “black box”
Interactive recommender systems: opening up the “black box”
 
Home mess systems- Prototype 2 & Evaluation
Home mess systems- Prototype 2 & EvaluationHome mess systems- Prototype 2 & Evaluation
Home mess systems- Prototype 2 & Evaluation
 
Ieml social recommendersystems
Ieml social recommendersystemsIeml social recommendersystems
Ieml social recommendersystems
 
Presentación de la defensa de la tesis de Li Yang
Presentación de la defensa de la tesis de Li YangPresentación de la defensa de la tesis de Li Yang
Presentación de la defensa de la tesis de Li Yang
 
FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...
FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...
FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...
 
APPLYING QUALITATIVE RESEARCH IN E-LEARNING DISCUSSION AND FINDINGS FROM THR...
APPLYING QUALITATIVE RESEARCH IN E-LEARNING  DISCUSSION AND FINDINGS FROM THR...APPLYING QUALITATIVE RESEARCH IN E-LEARNING  DISCUSSION AND FINDINGS FROM THR...
APPLYING QUALITATIVE RESEARCH IN E-LEARNING DISCUSSION AND FINDINGS FROM THR...
 
Introduction to OpenSemcq
Introduction to OpenSemcqIntroduction to OpenSemcq
Introduction to OpenSemcq
 
Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?
 
Discourse-Centric Learning Analytics
Discourse-Centric Learning AnalyticsDiscourse-Centric Learning Analytics
Discourse-Centric Learning Analytics
 
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
 
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
 
Getting from There to Here: Eight Characteristics of Effective Economic & Com...
Getting from There to Here: Eight Characteristics of Effective Economic & Com...Getting from There to Here: Eight Characteristics of Effective Economic & Com...
Getting from There to Here: Eight Characteristics of Effective Economic & Com...
 
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
 
Content Analysis Overview for Persona Development
Content Analysis Overview for Persona DevelopmentContent Analysis Overview for Persona Development
Content Analysis Overview for Persona Development
 
master_thesis.pdf
master_thesis.pdfmaster_thesis.pdf
master_thesis.pdf
 
Thesis Proposal: Understanding Audience Engagement Transmedia
Thesis Proposal: Understanding Audience Engagement TransmediaThesis Proposal: Understanding Audience Engagement Transmedia
Thesis Proposal: Understanding Audience Engagement Transmedia
 

Recently uploaded

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Recently uploaded (20)

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

Qomex2010

  • 1. Considering the subjectivity to rationalise evaluation approachesThe example of Spoken Dialogue Systems Marianne Laurent, Philippe Bretier (Orange Labs) Ioannis Kanellos (Telecom Bretagne) 23 June 2010, Qomex 2010, Trondheim, Norway
  • 2. ? ? Spoken Dialogue Systems Spoken Language Understanding Automatic Speech Recognition Spoken Language Generation Text-to Speech Evaluation ? « I can't Connect the Internet! » SPEECH UNDERSTANDING Dialogue Manager SYSTEM OUTPUT Information system Complex task - Dynamic interactions: no comparison to an ideal (fidelity) - Diversity of evaluators profiles, individualities and evaluation situations
  • 3. Internal review of evaluation methods: Ad hoc protocols depending on the evaluator profile… Laurent, M., Bretier, P. and Manquillet, C. (2010). Ad-hoc evaluations along the lifecycle of industrial spoken dialogue systems: heading to harmonisation?. In LREC 2010. Malta.
  • 4. Internal review of evaluation methods: Ad hoc protocols ... and on the evaluation context! http://www.slideshare.net/MarianneLo/lrecmlaurentposter Laurent, M., Bretier, P. and Manquillet, C. (2010). Ad-hocevaluationsalong the lifecycle of industrialspoken dialogue systems: heading to harmonisation?. In LREC 2010. Malta.
  • 5. Toward one-size-fits-all evaluation protocols? «  Research has exerted considerably effort and attention to devising evaluation metrics that allows for comparison of disparate systems with various tasks and domain. (Paek, 2007) «  A critical obstacle to progress in this area is the lack of a general framework for evaluating and comparing the performance of different dialogue agents. (Walker et al., 1997) «  We see a multitude of highly interesting - but virtually incomparable – evaluation exercises, which address different aspects of quality, an which rely on different aspects evaluation criteria. (Möller, 2009)
  • 6. Roadmap 1 Evaluation dependent on both context and evaluator 2 The evaluator as a mediator, an anthropocentric framework 3 Software implementation and anticipated added value
  • 7. 1 Evaluation, a rationalising contribution for a decision process Estimate material circumstances of the family Free examination Surmise what the family had been doing before the arrival of the unexpected visitor Give the age of the people Remember the clothes worn by the people Yarbus, A. L. (1967), Eye Movement and Vision, Plenum, New York. Remember positions of people and objects in the room
  • 8. 1 Evaluation, a goal driven argumentation discourse «  Process through which one defines, obtains and delivers useful pieces of information to settle between the alternative possible decisions. Daniel STUFFLEBEAM L'évaluation en éducation et la prise de décision, 1980, Ottawa, Edition NHP.
  • 9. 2 V-Model process to define of evaluation Nature of the decision to take Take the final decision Confront the results with initial objectives Identify the objectives Meet the objectives? Define criteria Note on a grid of criteria Compare Deduce the indicators Process data into indicators Top-down trend Situation interpreted into evaluation needs and procedure. Bottom-up trend Value judgment: the evaluator creates a meaning. List the data to capture Capture the data Experimental set-up
  • 10. 2 A meta-model to define evaluations Interaction performance Interaction quality Efficiencyrelated aspects Utility & Usefulness Etc. Critical viewpoints Analysis Data-Driven Goal-Driven Data Processing Techniques Log Files Question-naires 3rd Party annotation Physio-metrics Capture
  • 11. 2 A mediator within an “evaluation ecosystem” Resources System of constraints Situation Demand system Community of practice Normative system Corpus of evaluations Rationalising system
  • 12. 3 Software implementation: MPOWERS Multi Point Of vieWEvaluation Refine Studio Define KPIs Retrieval of KPIs & reports Log files Personalised dashboards Third-party annotations Datamart User questionnaires KPIs, an analytical statistical view on the system Data as collected in evaluation campaigns Parameters, a descriptive view on the system Dashboards, Ad hoc selection of KPIs with potential graphics ITU-T Rec P.Supp.24: Parametersdescribing the interaction with SDS
  • 13.
  • 14. Discuss/negotiate to converge toward common practicesFeedback & Inspiration Communities of practice Communities of interest
  • 15. merci ? ? ? @warius