SlideShare a Scribd company logo
1 of 35
Methodology and Campaign Design for the Evaluation of Semantic Search Tools Stuart N. Wrigley1, Dorothee Reinhard2, Khadija Elbedweihy1, Abraham Bernstein2, Fabio Ciravegna1 4/27/10 1 1University of Sheffield, UK 2University of Zurich, Switzerland
Outline SEALS initiative Evaluation design Criteria Two phase approach API Workflow Data Results and Analyses Conclusions 4/27/10 2
Seals Initiative 4/27/10 3
SEALS goals Develop and diffuse best practices in evaluation of  semantic technologies Create a lasting reference infrastructure for semantic technology evaluation This infrastructure will be the SEALS Platform Organise two worldwide Evaluation Campaigns One this summer Next in late 2011 / early 2012 Facilitate the continuous evaluation of semantic technologies Allow easy access to both:  evaluation results (for developers and researchers)  technology roadmaps (for non-technical adopters) Transfer all infrastructure to the community 4/27/10 4
Targeted technologies Five different types of semantic technologies: Ontology Engineering tools Ontology Storage and Reasoning Systems Ontology Matching tools Semantic Web Service tools Semantic Search tools 4/27/10 5
What’s our general approach? Low overhead to the participant Automate as far as possible We provide the compute We initiate the actual evaluation run We perform the analysis Provide infrastructure for more than simply running high profile evaluation campaigns reuse existing evaluations for your personal testing create new ones evaluations  store / publish / download test data sets Encourage participation in evaluation campaign definitions and design Open Source (Apache 2.0) 4/27/10 6
SEALS Platform 4/27/10 7 Technology Developers Evaluation Organisers Technology Users SEALS Service Manager Runtime Evaluation Service Result Repository Service Tool Repository Service Test Data Repository Service Evaluation Repository Service
Search Evaluation Design 4/27/10 8
What do we want to do? Evaluate / benchmark semantic search tools with respect to their semantic peers. Allow as wide a range of interface styles as possible Assess tools on basis of a number of criteria including usability Automate (part) of it 4/27/10 9
Evaluation criteria User-centred search methodologies will be evaluated according to the following criteria: Query expressiveness Usability (effectiveness, efficiency, satisfaction) Scalability Quality of documentation Performance 4/27/10 10 ,[object Object]
How complex can the queries be?
Is it easy to understand?
Is it well structured?
How easy is the tool to use?
How easy is it to formulate the queries?
How easy is it to work with the answers?
Ability to cope with a large ontology
Ability to query a large repository in a reasonable time
Ability to cope with a large amount of results returnedResource consumption: ,[object Object]
CPU load
memory required,[object Object]
Evaluation criteria Each phase will address a different subset of criteria. Automated evaluation: query expressiveness, scalability, performance, quality of documentation User-in-the-loop: usability, query expressiveness 4/27/10 12
Running the Evaluation 4/27/10 13
Automated evaluation 4/27/10 14 Tools uploaded to platform. Includes: wrapper implementing API supporting libraries Test data and questions stored on platform Workflow specifies details of evaluation sequence Evaluation executed offline in batch mode Results stored on platform Analyses performed and stored on platform SEALS Platform API Runtime Evaluation Service Search tool
User in the loop evaluation 4/27/10 15 Performed at tool provider site All materials provided Controller software Instructions (leader and subjects) Questionnaires Data downloaded from platform Results uploaded to platform Over the web API Tool provider machine SEALSPlatform Search tool Controller
API A range of information needs to be acquired from the tool in both phases In automated phase, the tool has to be executed and interrogated with no human assistance. Interface between the SEALS platform and the tool must be formalised 4/27/10 16
API – common Load ontology success / failure informs the interoperability Determine result type ranked list or set? Results ready? used to determine execution time Get results list of URIs number of results to be determined by developer 4/27/10 17
API – user in the loop User query input complete? used to determine input time Get user query String representation of user’s query if NL interface, same as text inputted Get internal query String representation of the internal query for use with… 4/27/10 18
API – automated Execute query mustn’t constrain tool type to particular format tool provider given questions shortly before evaluation is executed tool provider converts those questions into some form of ‘internal representation’ which can be serialised as a String serialised internal representation passed to this method 4/27/10 19
Data 4/27/10 20
Data set – user in the loop Mooney Natural Language Learning Data used by previous semantic search evaluation simple and well-known domain using geography subset 9 classes 11 datatype properties 17 object properties and  697 instances 877 questions already available 4/27/10 21
Data set – automated EvoOnt set of object-oriented software source code ontologies easy to create different ABox sizes given a TBox 5 data set sizes: 1k, 10k, 100k, 1M, 10M triples questions generated by software engineers 4/27/10 22
Results and Analyses 4/27/10 23
Questionnaires 3 questionnaires: SUS questionnaire Extended questionnaire similar to SUS in terms of type of question but more detailed Demographics questionnaire 4/27/10 24
System Usability Scale (SUS) score SUS is a Likert scale 10-item questionnaire Each question has 5 levels (strongly disagree to strongly agree) SUS scores have a range of 0 to 100. A score of around 60 and above is generally considered as an indicator of good usability. 4/27/10 25

More Related Content

What's hot

The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...Ali Ouni
 
Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011
Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011
Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011Michaela Greiler
 
Performance Evaluation of Open Source Data Mining Tools
Performance Evaluation of Open Source Data Mining ToolsPerformance Evaluation of Open Source Data Mining Tools
Performance Evaluation of Open Source Data Mining Toolsijsrd.com
 
Smart debugger
Smart debuggerSmart debugger
Smart debuggerTao He
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSung Kim
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving softwareSung Kim
 

What's hot (7)

The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
 
Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011
Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011
Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011
 
Performance Evaluation of Open Source Data Mining Tools
Performance Evaluation of Open Source Data Mining ToolsPerformance Evaluation of Open Source Data Mining Tools
Performance Evaluation of Open Source Data Mining Tools
 
Smart debugger
Smart debuggerSmart debugger
Smart debugger
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
 

Viewers also liked

Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...Amparo Elizabeth Cano Basave
 
Exploring the similarity between Social Knowledge Sources and Twitter for Cro...
Exploring the similarity between Social Knowledge Sources and Twitter for Cro...Exploring the similarity between Social Knowledge Sources and Twitter for Cro...
Exploring the similarity between Social Knowledge Sources and Twitter for Cro...Andrea Varga
 
Volatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity StreamsVolatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity StreamsAmparo Elizabeth Cano Basave
 
Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013Anna Lisa Gentile
 
Asterid: Linked Data Asterisms
Asterid: Linked Data AsterismsAsterid: Linked Data Asterisms
Asterid: Linked Data AsterismsGregoire Burel
 
SEO, SEM & Social - Intro, Best Practices & Campaign Tools
SEO, SEM & Social - Intro, Best Practices & Campaign ToolsSEO, SEM & Social - Intro, Best Practices & Campaign Tools
SEO, SEM & Social - Intro, Best Practices & Campaign ToolsLuke Freeman
 
Smart Cities and E-governance
Smart Cities and E-governanceSmart Cities and E-governance
Smart Cities and E-governancesteveking1225
 
E governance
E governanceE governance
E governanceGoa App
 
Role of technology in SMART governance “Smart City, Safe City"
Role of technology in SMART governance “Smart City, Safe City"Role of technology in SMART governance “Smart City, Safe City"
Role of technology in SMART governance “Smart City, Safe City"KRITYANAND UNESCO CLUB Jamshedpur
 

Viewers also liked (15)

Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
 
Exploring the similarity between Social Knowledge Sources and Twitter for Cro...
Exploring the similarity between Social Knowledge Sources and Twitter for Cro...Exploring the similarity between Social Knowledge Sources and Twitter for Cro...
Exploring the similarity between Social Knowledge Sources and Twitter for Cro...
 
Lodie overview
Lodie overviewLodie overview
Lodie overview
 
Volatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity StreamsVolatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity Streams
 
Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013
 
SEALS @ WWW2012
SEALS @ WWW2012SEALS @ WWW2012
SEALS @ WWW2012
 
Mapping Keywords to
Mapping Keywords to Mapping Keywords to
Mapping Keywords to
 
Asterid: Linked Data Asterisms
Asterid: Linked Data AsterismsAsterid: Linked Data Asterisms
Asterid: Linked Data Asterisms
 
SEO vs SEM
SEO vs SEMSEO vs SEM
SEO vs SEM
 
SEO, SEM & Social - Intro, Best Practices & Campaign Tools
SEO, SEM & Social - Intro, Best Practices & Campaign ToolsSEO, SEM & Social - Intro, Best Practices & Campaign Tools
SEO, SEM & Social - Intro, Best Practices & Campaign Tools
 
e-governance in India
e-governance in Indiae-governance in India
e-governance in India
 
Smart Cities and E-governance
Smart Cities and E-governanceSmart Cities and E-governance
Smart Cities and E-governance
 
E governance
E governanceE governance
E governance
 
Role of technology in SMART governance “Smart City, Safe City"
Role of technology in SMART governance “Smart City, Safe City"Role of technology in SMART governance “Smart City, Safe City"
Role of technology in SMART governance “Smart City, Safe City"
 
E governance
E governanceE governance
E governance
 

Similar to Evaluating Semantic Search Tools

CREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 MayCREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 MayMartin Turner
 
Ui Design And Usability For Everybody
Ui Design And Usability For EverybodyUi Design And Usability For Everybody
Ui Design And Usability For EverybodyEmpatika
 
7. evalution of interactive system
7. evalution of interactive system7. evalution of interactive system
7. evalution of interactive systemKh Ravy
 
Replicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearchReplicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearchAndrea Wiggins
 
Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...Gunther Eysenbach
 
Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...Gunther Eysenbach
 
Preservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
Preservation Planning using Plato, by Hannes Kulovits and Andreas RauberPreservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
Preservation Planning using Plato, by Hannes Kulovits and Andreas RauberJISC KeepIt project
 
The Planets Testbed
The Planets TestbedThe Planets Testbed
The Planets TestbedMax Kaiser
 
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...Baden Hughes
 
The recommendations system for source code components retrieval
The recommendations system for source code components retrievalThe recommendations system for source code components retrieval
The recommendations system for source code components retrievalAYESHA JAVED
 
2004 01 10 Chef Sa V01
2004 01 10 Chef Sa V012004 01 10 Chef Sa V01
2004 01 10 Chef Sa V01jiali zhang
 
WS-* Protocol Workshop Process Overview
WS-* Protocol Workshop Process OverviewWS-* Protocol Workshop Process Overview
WS-* Protocol Workshop Process OverviewJorgen Thelin
 
Data for Science Service Portfolio
Data for Science Service PortfolioData for Science Service Portfolio
Data for Science Service PortfolioEUDAT
 
Using Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online TrainingsUsing Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online TrainingsShalin Hai-Jew
 

Similar to Evaluating Semantic Search Tools (20)

CREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 MayCREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 May
 
Ui Design And Usability For Everybody
Ui Design And Usability For EverybodyUi Design And Usability For Everybody
Ui Design And Usability For Everybody
 
7. evalution of interactive system
7. evalution of interactive system7. evalution of interactive system
7. evalution of interactive system
 
Replicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearchReplicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearch
 
Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...
 
Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...
 
Preservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
Preservation Planning using Plato, by Hannes Kulovits and Andreas RauberPreservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
Preservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...
WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...
WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...
 
The Planets Testbed
The Planets TestbedThe Planets Testbed
The Planets Testbed
 
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
 
The recommendations system for source code components retrieval
The recommendations system for source code components retrievalThe recommendations system for source code components retrieval
The recommendations system for source code components retrieval
 
Mobile Healthcare App
Mobile Healthcare AppMobile Healthcare App
Mobile Healthcare App
 
2004 01 10 Chef Sa V01
2004 01 10 Chef Sa V012004 01 10 Chef Sa V01
2004 01 10 Chef Sa V01
 
WS-* Protocol Workshop Process Overview
WS-* Protocol Workshop Process OverviewWS-* Protocol Workshop Process Overview
WS-* Protocol Workshop Process Overview
 
Chapter04
Chapter04Chapter04
Chapter04
 
ESSENSE
ESSENSEESSENSE
ESSENSE
 
Data for Science Service Portfolio
Data for Science Service PortfolioData for Science Service Portfolio
Data for Science Service Portfolio
 
Using Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online TrainingsUsing Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online Trainings
 
Chapter 7)
Chapter 7)Chapter 7)
Chapter 7)
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Evaluating Semantic Search Tools

  • 1. Methodology and Campaign Design for the Evaluation of Semantic Search Tools Stuart N. Wrigley1, Dorothee Reinhard2, Khadija Elbedweihy1, Abraham Bernstein2, Fabio Ciravegna1 4/27/10 1 1University of Sheffield, UK 2University of Zurich, Switzerland
  • 2. Outline SEALS initiative Evaluation design Criteria Two phase approach API Workflow Data Results and Analyses Conclusions 4/27/10 2
  • 4. SEALS goals Develop and diffuse best practices in evaluation of semantic technologies Create a lasting reference infrastructure for semantic technology evaluation This infrastructure will be the SEALS Platform Organise two worldwide Evaluation Campaigns One this summer Next in late 2011 / early 2012 Facilitate the continuous evaluation of semantic technologies Allow easy access to both: evaluation results (for developers and researchers) technology roadmaps (for non-technical adopters) Transfer all infrastructure to the community 4/27/10 4
  • 5. Targeted technologies Five different types of semantic technologies: Ontology Engineering tools Ontology Storage and Reasoning Systems Ontology Matching tools Semantic Web Service tools Semantic Search tools 4/27/10 5
  • 6. What’s our general approach? Low overhead to the participant Automate as far as possible We provide the compute We initiate the actual evaluation run We perform the analysis Provide infrastructure for more than simply running high profile evaluation campaigns reuse existing evaluations for your personal testing create new ones evaluations store / publish / download test data sets Encourage participation in evaluation campaign definitions and design Open Source (Apache 2.0) 4/27/10 6
  • 7. SEALS Platform 4/27/10 7 Technology Developers Evaluation Organisers Technology Users SEALS Service Manager Runtime Evaluation Service Result Repository Service Tool Repository Service Test Data Repository Service Evaluation Repository Service
  • 9. What do we want to do? Evaluate / benchmark semantic search tools with respect to their semantic peers. Allow as wide a range of interface styles as possible Assess tools on basis of a number of criteria including usability Automate (part) of it 4/27/10 9
  • 10.
  • 11. How complex can the queries be?
  • 12. Is it easy to understand?
  • 13. Is it well structured?
  • 14. How easy is the tool to use?
  • 15. How easy is it to formulate the queries?
  • 16. How easy is it to work with the answers?
  • 17. Ability to cope with a large ontology
  • 18. Ability to query a large repository in a reasonable time
  • 19.
  • 21.
  • 22. Evaluation criteria Each phase will address a different subset of criteria. Automated evaluation: query expressiveness, scalability, performance, quality of documentation User-in-the-loop: usability, query expressiveness 4/27/10 12
  • 24. Automated evaluation 4/27/10 14 Tools uploaded to platform. Includes: wrapper implementing API supporting libraries Test data and questions stored on platform Workflow specifies details of evaluation sequence Evaluation executed offline in batch mode Results stored on platform Analyses performed and stored on platform SEALS Platform API Runtime Evaluation Service Search tool
  • 25. User in the loop evaluation 4/27/10 15 Performed at tool provider site All materials provided Controller software Instructions (leader and subjects) Questionnaires Data downloaded from platform Results uploaded to platform Over the web API Tool provider machine SEALSPlatform Search tool Controller
  • 26. API A range of information needs to be acquired from the tool in both phases In automated phase, the tool has to be executed and interrogated with no human assistance. Interface between the SEALS platform and the tool must be formalised 4/27/10 16
  • 27. API – common Load ontology success / failure informs the interoperability Determine result type ranked list or set? Results ready? used to determine execution time Get results list of URIs number of results to be determined by developer 4/27/10 17
  • 28. API – user in the loop User query input complete? used to determine input time Get user query String representation of user’s query if NL interface, same as text inputted Get internal query String representation of the internal query for use with… 4/27/10 18
  • 29. API – automated Execute query mustn’t constrain tool type to particular format tool provider given questions shortly before evaluation is executed tool provider converts those questions into some form of ‘internal representation’ which can be serialised as a String serialised internal representation passed to this method 4/27/10 19
  • 31. Data set – user in the loop Mooney Natural Language Learning Data used by previous semantic search evaluation simple and well-known domain using geography subset 9 classes 11 datatype properties 17 object properties and 697 instances 877 questions already available 4/27/10 21
  • 32. Data set – automated EvoOnt set of object-oriented software source code ontologies easy to create different ABox sizes given a TBox 5 data set sizes: 1k, 10k, 100k, 1M, 10M triples questions generated by software engineers 4/27/10 22
  • 33. Results and Analyses 4/27/10 23
  • 34. Questionnaires 3 questionnaires: SUS questionnaire Extended questionnaire similar to SUS in terms of type of question but more detailed Demographics questionnaire 4/27/10 24
  • 35. System Usability Scale (SUS) score SUS is a Likert scale 10-item questionnaire Each question has 5 levels (strongly disagree to strongly agree) SUS scores have a range of 0 to 100. A score of around 60 and above is generally considered as an indicator of good usability. 4/27/10 25
  • 36. Demographics Age Gender Profession Number of years in education Highest qualification Number of years in employment knowledge of informatics knowledge of linguistics knowledge of formal query languages knowledge of English … 4/27/10 26
  • 37. Automated Results Execution success (OK / FAIL / PLATFORM ERROR) Triples returned Time to execute each query CPU load, memory usage Analyses Ability to load ontology and query (interoperability) Precision and Recall (search accuracy and query expressiveness) Tool robustness: ratio of all benchmarks executed to number of failed executions 4/27/10 27
  • 38. User in the loop Results (other than core results similar to automated phase) Query captured by the tool Underlying query (e.g., SPARQL) Is answer in result set? (user may try a number of queries before being successful) time required to obtain answer number of queries required to answer question Analyses Precision and Recall Correlations between results and SUS scores, demographics, etc 4/27/10 28
  • 39. Dissemination Results browsable on the SEALS portal Split into three areas: performance usability comparison between tools 4/27/10 29
  • 41.
  • 43. Get involved! First Evaluation Campaign in all SEALS technology areas this Summer Get involved – your input and participation is crucial Workshop planned for ISWC 2010 after campaign Find out more (and take part!) at: http://www.seals-project.eu or talk to me, or email me (s.wrigley@dcs.shef.ac.uk) 4/27/10 33
  • 44. Timeline May 2010: Registration opens May-June 2010: Evaluation materials and documentation are provided to participants July 2010: Participants upload their tools August 2010: Evaluation scenarios are executed September 2010: Evaluation results are analysed November 2010: Evaluation results are discussed at ISWC 2010 workshop (tbc) 4/27/10 34
  • 45. Best paper award SEALS is proud to be sponsoring the best paper award here at SemSearch2010 Congratulations to the winning authors! 4/27/10 35