SlideShare a Scribd company logo
1 of 23
A Comprehensive Information
Retrieval Portal for Canadian
Scientific Researchers
              Research Proposal for CISTI
                              Andre Vellino
                              August 2006
Overview

   Context: CISTI Strategic Plan
   Proposal Statement
   System Architecture
   Proposal Components
   Partnerships
   Outcomes and Draft Workplan
   Andre’s Relevant Experience
Holy Grail

 “It’s easy to say what would be the ideal
 online resource for scholars and scientist: all
 papers in all fields, systematically
 interconnected, effortlessly accessible and
 rationally navigable from any researcher’s
 desk, worldwide, for free”

                                  Stevan Harnad, 1999
                         Professor of Cognitive Science
                            University of Southampton
Excerpts from CISTI Strategic Plan
   “Goal 1: Provide universal, seamless, and permanent
    access to information for Canadian research and
    innovation.”
   “Canadians look to CISTI to deliver distilled,
    aggregated, and validated information that is relevant
    to their research and innovation activities.”
   “Available at the client’s desktop, these services are
    provided through a technologically sophisticated
    infrastructure.”
   “[All users] will have electronic access at their desktop
    to a wealth of national and international STM
    information resources, supported by intelligent search
    and analysis tools and expert advice.”
Proposal Vision

 To develop a web-based information
 portal that offers universal, seamless
 access to highly relevant, distilled and
 aggregated SMT information using
 intelligent search and analysis tools that
 support scientific innovation.
High Level Functional Architecture
       LitMiner
   Content Analysis

                                                           Personalized
                                                            Scientific
  Content Aggregator               Web Application
                                                            Literature
   OpenURL Resolver                   Server                Research
                                                              Portal


                                               Personalization Engine
 Commercial
 Science                               User          Collaborative
 Publishers           CISTI &         Agents           Filtering
                      University
                      Libraries
                                                 Taste (open source)
Proposal Components

   User Needs
   Content Aggregation
   Collaborative Filtering
   Content Mining
   Results Visualization
   Partnerships
User Needs
   Customers of CISTI services and content are elite –
    highly educated and exacting in their requirements;
   Compared to mass-market or intranet commercial
    search-portals, the number of CISTI end-users is
    small (30,000 – 100,000);
   User needs are (likely) varied but focused: e.g.
    bibliographic literature searches / peer reviews /
    competitive analysis / historical research;
   Contribution to “innovation” can be measured (in the
    short term) by asking the user directly.
User Profiling

Enables
 Customized services
       Alerts / Notifications
   Higher precision search results
       Greater user satisfaction
   Item and User based recommender system
       Broadens scope of search to semantically
        cognate but otherwise disparate domains
Content Aggregation

   Most end users will (likely) not care where the
    information they seek resides;
   Results for a search should show that many
    sources are available and provide links to
    these sources (Open Access / Commercial /
    Academic / Government);
   Requires partnerships with content providers
    and search engines.
Collaborative Filtering
   Monitors user’s browsing behaviour (and / or explicit
    feedback) to build a profile of the users choices;
   Other users with “similar” profiles can share
    (anonymously) their opinions (e.g. on the value or
    usefulness of an article or book) with others. “People
    who ordered article X also ordered article Y”);
   Enables serendipitous recommendations (options
    that the “active user” might not have considered
    otherwise)
       May stimulate “innovation”;
       May complement citation indexing as a relevance criterion;
   Untested technology in the scientific information
    retrieval community;
Content Mining

   Concept discovery using:
       Automatic Classification (Categorization)
       Named Entity Tagging
       Document meta-tagging w/ Concepts
   Value:
       Improved Precision in Search Results
       May add dimensions to meta-data about content
       “Related Articles” feature in Google Scholar
       Enables novel visualization of results
Entrust Toolkit

                                           Categorizer
  DB                                                     Categories


                                            Concepts     Concepts,
                                                         Meta-Data

                                                          Summarie
                                           Summarizer     s,
                                                          Ranked
 File
                                Entrust                   Phrases
System
         n o C t n e mu c o D




                                Content      Search      Hits,
                                Analysis                 Locations
                                 Toolkit
Example: Healthcare Concept Tree
Results Visualization
   Content Analysis and
    Personalization
       May allow different
        display paradigms for
        “more documents like
        this” or “similar articles”
                                      Interactive Vizualization of Multiple
                                      Query Results – Battelle
       Feedback on relevance
        of the query terms to the
        selected item.

                                      Using Visualisation to Interpret Search
                                      Engine Results– Wolverhampton
Partners
   Google (Books / Scholar)
     http://scholar.google.com/

   Online Computer Library Center - WorldCat
     http://www.worldcat.org/

   Public Library of Science
     http://www.plos.org

   Science.gov
     http://www.science.gov/

   International Association of STM Publishers
     http://www.stm-assoc.org/

   Annual Reviews
     http://www.annualreviews.org/

   BioMed Central (UK)
     http://www.biomedcentral.com/
Related Areas of Research
   Digital Archiving
       Mechanisms for preserving digital objects (multi-media)
   Valuation and payment models for Digital Objects
       To decide what to preserve / for how long / how much to
        charge
   Application of Metadata Standards
       Dublin core / Semantic Web Ontologies (OWL)
   Digital Rights Management & Security
       Access control / Intellectual Property protection
Project Phases & Outcomes
   Project Phases
       Requirements / Research Phase
       Analysis / Design Phase
       Development / Test Phase
   Outcomes
       Develop prototype of content-aggregation search portal
        with collaborative filtering and content analysis engine
       Establish partnerships with content providers and search
        engine organizations
       Test user satisfaction and "return use" improvements on a
        sample population
       Publish results
Requirements /Research Phase
   User Requirements
       Find out what classes of users there are and what
        features users want in an information portal that
        would help them innovate;
   Technology Literature Review
       Content Aggregation
       Visualization
       Categorization
       Personalization / Collaborative Filtering
Analysis / Design Phase
   Use-Cases
       For each category of user, enumerate the use-
        cases (behavioural scenarios).
   User Interface Design
       Design the interface for query, query-refinement,
        results visualization and recommendations.
   Software Evaluation
       Portal web-application components
       Collaborative Filtering packages
       Categorization / LitMiner interfaces
Development / Test Phase
   Prototype Information Portal
       Develop Content Aggregator
       Personalization / Recommendation agents
   Integrate Content Analysis
       LitMiner or Categorization / Concept Tagging
        toolkits
   Test and Evaluate in a Pilot program.
       Experiments with test group to determine
           Measure of user acceptance
           Rates of Return Usage
Draft Work Plan
Andre Vellino – Relevant Experience
   Entrust
       Content Analysis Policy Architect - Concept extraction and automatic categorization.
   imGenie – startup
       Systems architect for a wireless, bi-modal (voice / text), personalized information
        retrieval and groupware application.
   National Research Council
       Research Scientist, IIT – Information Retrieval on small-format displays.
   Nortel Networks
       Senior Systems Architect, Disruptive Network Solutions - Personal Identity
        Management for intelligent mediation of content-delivery in the network.
   Carleton University
       Cognitive Science Ph.D. program, Adjunct Research Professor
   NCF Internet
       Server-side Web architect for new NCF web-portal – registration, payment,
        single sign-on to integrated applications.
   University of Georgia / Environmental Protection Agency
       Research Associate, Advanced Computational Methods Center - development of
        expert system for predicting chemical reactivity from chemical structure.

More Related Content

What's hot

Knowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisKnowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisCOST action BM1006
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2 Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2 Pistoia Alliance
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Urm concept for sharing information inside of communities
Urm concept for sharing information inside of communitiesUrm concept for sharing information inside of communities
Urm concept for sharing information inside of communitiesKarel Charvat
 
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...Dan Keldsen
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
 
Metadata & controlled vocabulary
Metadata & controlled vocabularyMetadata & controlled vocabulary
Metadata & controlled vocabularyDaryl Superio
 
Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesKerstin Forsberg
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introductionnimmyjans4
 
DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic SearchPaul Wlodarczyk
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search EngineIRJET Journal
 
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...cseij
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxfPhilippe Rocca-Serra
 
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Tom Plasterer
 
DataONE Education Module 07: Metadata
DataONE Education Module 07: MetadataDataONE Education Module 07: Metadata
DataONE Education Module 07: MetadataDataONE
 

What's hot (20)

Knowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisKnowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysis
 
thesis defense1
thesis defense1thesis defense1
thesis defense1
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2 Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Urm concept for sharing information inside of communities
Urm concept for sharing information inside of communitiesUrm concept for sharing information inside of communities
Urm concept for sharing information inside of communities
 
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
Metadata & controlled vocabulary
Metadata & controlled vocabularyMetadata & controlled vocabulary
Metadata & controlled vocabulary
 
Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiences
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data Citation
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
 
M045067275
M045067275M045067275
M045067275
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search Engine
 
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
 
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
 
Linked data in pharma R&D
Linked data in pharma R&DLinked data in pharma R&D
Linked data in pharma R&D
 
DataONE Education Module 07: Metadata
DataONE Education Module 07: MetadataDataONE Education Module 07: Metadata
DataONE Education Module 07: Metadata
 

Viewers also liked

Proposal Development- building fundamentals
Proposal Development- building fundamentalsProposal Development- building fundamentals
Proposal Development- building fundamentalsnooone
 
Business_Proposal_ISM
Business_Proposal_ISMBusiness_Proposal_ISM
Business_Proposal_ISMPerfectYantra
 
How To Develop Ict Businesses To Support Open City Portal
How To Develop Ict Businesses To Support Open City PortalHow To Develop Ict Businesses To Support Open City Portal
How To Develop Ict Businesses To Support Open City PortalOpenCity
 
Sample Guide for Writing Website Development Proposal
Sample Guide for Writing Website Development ProposalSample Guide for Writing Website Development Proposal
Sample Guide for Writing Website Development ProposalPatrick Ogbuitepu
 
Beginning Real World iOS App Development
Beginning Real World iOS App DevelopmentBeginning Real World iOS App Development
Beginning Real World iOS App DevelopmentAndri Yadi
 
Project proposal-presentation
Project proposal-presentationProject proposal-presentation
Project proposal-presentationMohammad Ali Khan
 
How Not To Win A Government Contract
How Not To Win A Government ContractHow Not To Win A Government Contract
How Not To Win A Government ContractDouglas Burdett
 
Software Proposal Portal Inc.
Software Proposal Portal Inc.Software Proposal Portal Inc.
Software Proposal Portal Inc.swproposal
 
Web design proposal pdf
Web design proposal pdfWeb design proposal pdf
Web design proposal pdfMarathi Vivah
 
Web Development on Web Project Presentation
Web Development on Web Project PresentationWeb Development on Web Project Presentation
Web Development on Web Project PresentationMilind Gokhale
 
Website Development Process
Website Development ProcessWebsite Development Process
Website Development ProcessHend Al-Khalifa
 
Management Consultancy Proposals
Management Consultancy ProposalsManagement Consultancy Proposals
Management Consultancy ProposalsJoe O'Mahoney
 
Mobile Application Development With Android
Mobile Application Development With AndroidMobile Application Development With Android
Mobile Application Development With Androidguest213e237
 
Website Development and Design Proposal
Website Development and Design ProposalWebsite Development and Design Proposal
Website Development and Design ProposalCreative 3D Design
 
Web design proposal sample
Web design proposal sampleWeb design proposal sample
Web design proposal sampleAdviacent
 

Viewers also liked (20)

Proposal Development- building fundamentals
Proposal Development- building fundamentalsProposal Development- building fundamentals
Proposal Development- building fundamentals
 
Business_Proposal_ISM
Business_Proposal_ISMBusiness_Proposal_ISM
Business_Proposal_ISM
 
Irs proposal
Irs proposalIrs proposal
Irs proposal
 
How To Develop Ict Businesses To Support Open City Portal
How To Develop Ict Businesses To Support Open City PortalHow To Develop Ict Businesses To Support Open City Portal
How To Develop Ict Businesses To Support Open City Portal
 
Sample Guide for Writing Website Development Proposal
Sample Guide for Writing Website Development ProposalSample Guide for Writing Website Development Proposal
Sample Guide for Writing Website Development Proposal
 
Beginning Real World iOS App Development
Beginning Real World iOS App DevelopmentBeginning Real World iOS App Development
Beginning Real World iOS App Development
 
Project proposal-presentation
Project proposal-presentationProject proposal-presentation
Project proposal-presentation
 
How Not To Win A Government Contract
How Not To Win A Government ContractHow Not To Win A Government Contract
How Not To Win A Government Contract
 
Software Proposal Portal Inc.
Software Proposal Portal Inc.Software Proposal Portal Inc.
Software Proposal Portal Inc.
 
Consulting proposal
Consulting proposalConsulting proposal
Consulting proposal
 
Website designing proposal with price
Website designing proposal with priceWebsite designing proposal with price
Website designing proposal with price
 
Web page concept final ppt
Web page concept  final pptWeb page concept  final ppt
Web page concept final ppt
 
Web design proposal pdf
Web design proposal pdfWeb design proposal pdf
Web design proposal pdf
 
Web Development on Web Project Presentation
Web Development on Web Project PresentationWeb Development on Web Project Presentation
Web Development on Web Project Presentation
 
Website Development Process
Website Development ProcessWebsite Development Process
Website Development Process
 
Management Consultancy Proposals
Management Consultancy ProposalsManagement Consultancy Proposals
Management Consultancy Proposals
 
Mobile Application Development With Android
Mobile Application Development With AndroidMobile Application Development With Android
Mobile Application Development With Android
 
Consulting proposals
Consulting proposalsConsulting proposals
Consulting proposals
 
Website Development and Design Proposal
Website Development and Design ProposalWebsite Development and Design Proposal
Website Development and Design Proposal
 
Web design proposal sample
Web design proposal sampleWeb design proposal sample
Web design proposal sample
 

Similar to Vellino presentationtocisti

Inteligent Catalogue Final
Inteligent Catalogue FinalInteligent Catalogue Final
Inteligent Catalogue Finalguestcaef1d
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madnesssemanticsconference
 
Negotiated Studies - A semantic social network based expert recommender system
Negotiated Studies - A semantic social network based expert recommender systemNegotiated Studies - A semantic social network based expert recommender system
Negotiated Studies - A semantic social network based expert recommender systemPremsankar Chakkingal
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewAngelo Salatino
 
COLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLES
COLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLESCOLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLES
COLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLESijcsit
 
Semtech 2011 Elsevier PureDiscovery
Semtech 2011 Elsevier PureDiscoverySemtech 2011 Elsevier PureDiscovery
Semtech 2011 Elsevier PureDiscoveryvisha1gupta
 
Evaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery ServicesEvaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery ServicesNikesh Narayanan
 
Academic SEO, or: How do I get my research to show up in search engines and d...
Academic SEO, or: How do I get my research to show up in search engines and d...Academic SEO, or: How do I get my research to show up in search engines and d...
Academic SEO, or: How do I get my research to show up in search engines and d...Open Knowledge Maps
 
Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...Nikesh Narayanan
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands Vivien Bonazzi
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexSusanna-Assunta Sansone
 
A comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery toolsA comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery toolsSusantaSethi3
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsThe University of Edinburgh
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsCarole Goble
 
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYINTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYcscpconf
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchangelagoze
 
Biocatalogue Talk Slides
Biocatalogue Talk SlidesBiocatalogue Talk Slides
Biocatalogue Talk SlidesBioCatalogue
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
 
Synthese Recommender System
Synthese Recommender SystemSynthese Recommender System
Synthese Recommender SystemAndre Vellino
 

Similar to Vellino presentationtocisti (20)

Inteligent Catalogue Final
Inteligent Catalogue FinalInteligent Catalogue Final
Inteligent Catalogue Final
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
 
Negotiated Studies - A semantic social network based expert recommender system
Negotiated Studies - A semantic social network based expert recommender systemNegotiated Studies - A semantic social network based expert recommender system
Negotiated Studies - A semantic social network based expert recommender system
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
 
COLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLES
COLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLESCOLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLES
COLLABORATIVE BIBLIOGRAPHIC SYSTEM FOR REVIEW/SURVEY ARTICLES
 
Semtech 2011 Elsevier PureDiscovery
Semtech 2011 Elsevier PureDiscoverySemtech 2011 Elsevier PureDiscovery
Semtech 2011 Elsevier PureDiscovery
 
Evaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery ServicesEvaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery Services
 
Academic SEO, or: How do I get my research to show up in search engines and d...
Academic SEO, or: How do I get my research to show up in search engines and d...Academic SEO, or: How do I get my research to show up in search engines and d...
Academic SEO, or: How do I get my research to show up in search engines and d...
 
Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery Index
 
A comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery toolsA comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery tools
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflows
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of Scientists
 
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYINTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
 
Biocatalogue Talk Slides
Biocatalogue Talk SlidesBiocatalogue Talk Slides
Biocatalogue Talk Slides
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
Synthese Recommender System
Synthese Recommender SystemSynthese Recommender System
Synthese Recommender System
 

Vellino presentationtocisti

  • 1. A Comprehensive Information Retrieval Portal for Canadian Scientific Researchers Research Proposal for CISTI Andre Vellino August 2006
  • 2. Overview  Context: CISTI Strategic Plan  Proposal Statement  System Architecture  Proposal Components  Partnerships  Outcomes and Draft Workplan  Andre’s Relevant Experience
  • 3. Holy Grail “It’s easy to say what would be the ideal online resource for scholars and scientist: all papers in all fields, systematically interconnected, effortlessly accessible and rationally navigable from any researcher’s desk, worldwide, for free” Stevan Harnad, 1999 Professor of Cognitive Science University of Southampton
  • 4. Excerpts from CISTI Strategic Plan  “Goal 1: Provide universal, seamless, and permanent access to information for Canadian research and innovation.”  “Canadians look to CISTI to deliver distilled, aggregated, and validated information that is relevant to their research and innovation activities.”  “Available at the client’s desktop, these services are provided through a technologically sophisticated infrastructure.”  “[All users] will have electronic access at their desktop to a wealth of national and international STM information resources, supported by intelligent search and analysis tools and expert advice.”
  • 5. Proposal Vision To develop a web-based information portal that offers universal, seamless access to highly relevant, distilled and aggregated SMT information using intelligent search and analysis tools that support scientific innovation.
  • 6. High Level Functional Architecture LitMiner Content Analysis Personalized Scientific Content Aggregator Web Application Literature OpenURL Resolver Server Research Portal Personalization Engine Commercial Science User Collaborative Publishers CISTI & Agents Filtering University Libraries Taste (open source)
  • 7. Proposal Components  User Needs  Content Aggregation  Collaborative Filtering  Content Mining  Results Visualization  Partnerships
  • 8. User Needs  Customers of CISTI services and content are elite – highly educated and exacting in their requirements;  Compared to mass-market or intranet commercial search-portals, the number of CISTI end-users is small (30,000 – 100,000);  User needs are (likely) varied but focused: e.g. bibliographic literature searches / peer reviews / competitive analysis / historical research;  Contribution to “innovation” can be measured (in the short term) by asking the user directly.
  • 9. User Profiling Enables  Customized services  Alerts / Notifications  Higher precision search results  Greater user satisfaction  Item and User based recommender system  Broadens scope of search to semantically cognate but otherwise disparate domains
  • 10. Content Aggregation  Most end users will (likely) not care where the information they seek resides;  Results for a search should show that many sources are available and provide links to these sources (Open Access / Commercial / Academic / Government);  Requires partnerships with content providers and search engines.
  • 11. Collaborative Filtering  Monitors user’s browsing behaviour (and / or explicit feedback) to build a profile of the users choices;  Other users with “similar” profiles can share (anonymously) their opinions (e.g. on the value or usefulness of an article or book) with others. “People who ordered article X also ordered article Y”);  Enables serendipitous recommendations (options that the “active user” might not have considered otherwise)  May stimulate “innovation”;  May complement citation indexing as a relevance criterion;  Untested technology in the scientific information retrieval community;
  • 12. Content Mining  Concept discovery using:  Automatic Classification (Categorization)  Named Entity Tagging  Document meta-tagging w/ Concepts  Value:  Improved Precision in Search Results  May add dimensions to meta-data about content  “Related Articles” feature in Google Scholar  Enables novel visualization of results
  • 13. Entrust Toolkit Categorizer DB Categories Concepts Concepts, Meta-Data Summarie Summarizer s, Ranked File Entrust Phrases System n o C t n e mu c o D Content Search Hits, Analysis Locations Toolkit
  • 15. Results Visualization  Content Analysis and Personalization  May allow different display paradigms for “more documents like this” or “similar articles” Interactive Vizualization of Multiple Query Results – Battelle  Feedback on relevance of the query terms to the selected item. Using Visualisation to Interpret Search Engine Results– Wolverhampton
  • 16. Partners  Google (Books / Scholar)  http://scholar.google.com/  Online Computer Library Center - WorldCat  http://www.worldcat.org/  Public Library of Science  http://www.plos.org  Science.gov  http://www.science.gov/  International Association of STM Publishers  http://www.stm-assoc.org/  Annual Reviews  http://www.annualreviews.org/  BioMed Central (UK)  http://www.biomedcentral.com/
  • 17. Related Areas of Research  Digital Archiving  Mechanisms for preserving digital objects (multi-media)  Valuation and payment models for Digital Objects  To decide what to preserve / for how long / how much to charge  Application of Metadata Standards  Dublin core / Semantic Web Ontologies (OWL)  Digital Rights Management & Security  Access control / Intellectual Property protection
  • 18. Project Phases & Outcomes  Project Phases  Requirements / Research Phase  Analysis / Design Phase  Development / Test Phase  Outcomes  Develop prototype of content-aggregation search portal with collaborative filtering and content analysis engine  Establish partnerships with content providers and search engine organizations  Test user satisfaction and "return use" improvements on a sample population  Publish results
  • 19. Requirements /Research Phase  User Requirements  Find out what classes of users there are and what features users want in an information portal that would help them innovate;  Technology Literature Review  Content Aggregation  Visualization  Categorization  Personalization / Collaborative Filtering
  • 20. Analysis / Design Phase  Use-Cases  For each category of user, enumerate the use- cases (behavioural scenarios).  User Interface Design  Design the interface for query, query-refinement, results visualization and recommendations.  Software Evaluation  Portal web-application components  Collaborative Filtering packages  Categorization / LitMiner interfaces
  • 21. Development / Test Phase  Prototype Information Portal  Develop Content Aggregator  Personalization / Recommendation agents  Integrate Content Analysis  LitMiner or Categorization / Concept Tagging toolkits  Test and Evaluate in a Pilot program.  Experiments with test group to determine  Measure of user acceptance  Rates of Return Usage
  • 23. Andre Vellino – Relevant Experience  Entrust  Content Analysis Policy Architect - Concept extraction and automatic categorization.  imGenie – startup  Systems architect for a wireless, bi-modal (voice / text), personalized information retrieval and groupware application.  National Research Council  Research Scientist, IIT – Information Retrieval on small-format displays.  Nortel Networks  Senior Systems Architect, Disruptive Network Solutions - Personal Identity Management for intelligent mediation of content-delivery in the network.  Carleton University  Cognitive Science Ph.D. program, Adjunct Research Professor  NCF Internet  Server-side Web architect for new NCF web-portal – registration, payment, single sign-on to integrated applications.  University of Georgia / Environmental Protection Agency  Research Associate, Advanced Computational Methods Center - development of expert system for predicting chemical reactivity from chemical structure.

Editor's Notes

  1. This quote is from Stevan Harnad, a professor of cognitive science and advocate of Open Access. He is an especially strong believer in self-archiving as a method for increasing the accessibility of scholarly work. This vision is similar, in several respects, to the one offered by the CISTI Strategic Plan (2005-2010).
  2. Excerpts from the CISTI 2005-2010 Strategic Plan
  3. In a sentence my proposal is : To develop a web-based information portal that offers universal, seamless access to highly relevant, distilled and aggregated information using intelligent search and analysis tools that support scientific innovation.
  4. This picture illustrates the overall functional architecture of this proposal.
  5. The proposal has 6 principle components.
  6. The specificity of scientific and technical researchers provides both a challenge and an opportunity. The challenge is that the users’ requirements are much more stringent, the opportunity is that the user’s needs are much more focused than that of the typical Google user.
  7. If we know who the users are and we keep track of the users’ behaviours, we can provide them with value-added services (alerts / notifications), better quality search results and novel capabilities that stimulate scientific innovation (recommender services.)
  8. One objective of this proposal is to provide a single point of access for Canadians to access a variety of STM content sources.
  9. This is one of the core technology components – a recommendation service build on collaborative filtering technology.
  10. The other core technology component is content analysis (classification / named-entity tagging). This will facilitates a better user search experience and enables novel ways of visualizing results.
  11. One (commercial) candidate for content analysis is the Entrust Content Analysis Toolkit, which offers a mixed-paradigm method of analyzing text content.
  12. One example that I developed for Entrust is this concept-hierarchy for detecting medical concepts. For example the concept “ICD-9” contains several thousand scored search terms. In combination, they can detect the presence of medical information in an e-mail or text document.
  13. These are some possibilities for search-result visualization that may be considered for this project.
  14. Partnerships with content publishers, whether Open Access or commercial, will have to be developed to achieve the goal of “seamless” and “comprehensive” access to STM information. This is a partial list of some content / search engine providers with which CISTI could for partnerships.
  15. There are other valid areas of research, such as Digital Archiving and Metadata Standards which may contribute to the objectives of this proposed research, but in this proposal I focus on the work that would best suit me and to which I have the most to contribute.
  16. There are 3 principle project phase and 4 main outcomes of this work.
  17. User Requirements: To develop an effective IR portal, we need to find out what features scientists of various sorts would want in such a portal that would help them with their task. This phase would allow us to better understand the different categories of users and the varieties of tasks that they are attempting to achieve when using the services of an information portal. Technology Literature Review: This phase will review the computer science and cognitive science literature in the 4 major technologies that need to be integrated in this portal.
  18. Use Cases: From the user-requirements, we can abstract out “Use Cases” – typical scenarios of usage that cover the range of user-requirements. For example, one use-case might be that of an industrial Chemist doing a search for “prior art” for a patent application. Design The UI: The use-cases enumerated in the previous phase, define some of the constraints for the User Interface. If users typically just want to enter search terms “google” style and then sort through the results and refine the search, that will dictate some aspects of the UI. If users typically know which sources of information they wish to search, that will constrain a different UI paradigm. Existing off-the-shelf software (commercial and open source) will be assessed for their suitability in this project.
  19. Prototyping the portal will have several components: * User authentication / login (for personalization features to be active, such as “high precision search” , “recommendation”, “notification alerts” etc.) Personalization based on one-time registration profile (profession / interests / contact information (for alerts)). Which of Content Analysis toolkits are integrated will depend in part on the application interfaces that are discovered in the software evaluation phase. How the whole application performs, from the point of view of user-acceptance, will have to be determined experimentally in a pilot program.
  20. This is the Gantt chart of the work plan outlined in the accompanying paper.
  21. Extracted from my Curriculum Vitae, this is the strength and depth of experience I bring to this project.