Vellino presentationtocisti

A Comprehensive Information
Retrieval Portal for Canadian
Scientific Researchers
Research Proposal for CISTI
Andre Vellino
August 2006

Overview

 Context: CISTI Strategic Plan
 Proposal Statement
 System Architecture
 Proposal Components
 Partnerships
 Outcomes and Draft Workplan
 Andre’s Relevant Experience

Holy Grail

“It’s easy to say what would be the ideal
online resource for scholars and scientist: all
papers in all fields, systematically
interconnected, effortlessly accessible and
rationally navigable from any researcher’s
desk, worldwide, for free”

Stevan Harnad, 1999
Professor of Cognitive Science
University of Southampton

Excerpts from CISTI Strategic Plan
 “Goal 1: Provide universal, seamless, and permanent
access to information for Canadian research and
innovation.”
 “Canadians look to CISTI to deliver distilled,
aggregated, and validated information that is relevant
to their research and innovation activities.”
 “Available at the client’s desktop, these services are
provided through a technologically sophisticated
infrastructure.”
 “[All users] will have electronic access at their desktop
to a wealth of national and international STM
information resources, supported by intelligent search
and analysis tools and expert advice.”

Proposal Vision

To develop a web-based information
portal that offers universal, seamless
access to highly relevant, distilled and
aggregated SMT information using
intelligent search and analysis tools that
support scientific innovation.

High Level Functional Architecture
LitMiner
Content Analysis

Personalized
Scientific
Content Aggregator Web Application
Literature
OpenURL Resolver Server Research
Portal

Personalization Engine
Commercial
Science User Collaborative
Publishers CISTI & Agents Filtering
University
Libraries
Taste (open source)

Proposal Components

 User Needs
 Content Aggregation
 Collaborative Filtering
 Content Mining
 Results Visualization
 Partnerships

User Needs
 Customers of CISTI services and content are elite –
highly educated and exacting in their requirements;
 Compared to mass-market or intranet commercial
search-portals, the number of CISTI end-users is
small (30,000 – 100,000);
 User needs are (likely) varied but focused: e.g.
bibliographic literature searches / peer reviews /
competitive analysis / historical research;
 Contribution to “innovation” can be measured (in the
short term) by asking the user directly.

User Profiling

Enables
 Customized services
 Alerts / Notifications
 Higher precision search results
 Greater user satisfaction
 Item and User based recommender system
 Broadens scope of search to semantically
cognate but otherwise disparate domains

Content Aggregation

 Most end users will (likely) not care where the
information they seek resides;
 Results for a search should show that many
sources are available and provide links to
these sources (Open Access / Commercial /
Academic / Government);
 Requires partnerships with content providers
and search engines.

Collaborative Filtering
 Monitors user’s browsing behaviour (and / or explicit
feedback) to build a profile of the users choices;
 Other users with “similar” profiles can share
(anonymously) their opinions (e.g. on the value or
usefulness of an article or book) with others. “People
who ordered article X also ordered article Y”);
 Enables serendipitous recommendations (options
that the “active user” might not have considered
otherwise)
 May stimulate “innovation”;
 May complement citation indexing as a relevance criterion;
 Untested technology in the scientific information
retrieval community;

Content Mining

 Concept discovery using:
 Automatic Classification (Categorization)
 Named Entity Tagging
 Document meta-tagging w/ Concepts
 Value:
 Improved Precision in Search Results
 May add dimensions to meta-data about content
 “Related Articles” feature in Google Scholar
 Enables novel visualization of results

Entrust Toolkit

Categorizer
DB Categories

Concepts Concepts,
Meta-Data

Summarie
Summarizer s,
Ranked
File
Entrust Phrases
System
n o C t n e mu c o D

Content Search Hits,
Analysis Locations
Toolkit

Example: Healthcare Concept Tree

Results Visualization
 Content Analysis and
Personalization
 May allow different
display paradigms for
“more documents like
this” or “similar articles”
Interactive Vizualization of Multiple
Query Results – Battelle
 Feedback on relevance
of the query terms to the
selected item.

Using Visualisation to Interpret Search
Engine Results– Wolverhampton

Partners
 Google (Books / Scholar)
 http://scholar.google.com/

 Online Computer Library Center - WorldCat
 http://www.worldcat.org/

 Public Library of Science
 http://www.plos.org

 Science.gov
 http://www.science.gov/

 International Association of STM Publishers
 http://www.stm-assoc.org/

 Annual Reviews
 http://www.annualreviews.org/

 BioMed Central (UK)
 http://www.biomedcentral.com/

Related Areas of Research
 Digital Archiving
 Mechanisms for preserving digital objects (multi-media)
 Valuation and payment models for Digital Objects
 To decide what to preserve / for how long / how much to
charge
 Application of Metadata Standards
 Dublin core / Semantic Web Ontologies (OWL)
 Digital Rights Management & Security
 Access control / Intellectual Property protection

Project Phases & Outcomes
 Project Phases
 Requirements / Research Phase
 Analysis / Design Phase
 Development / Test Phase
 Outcomes
 Develop prototype of content-aggregation search portal
with collaborative filtering and content analysis engine
 Establish partnerships with content providers and search
engine organizations
 Test user satisfaction and "return use" improvements on a
sample population
 Publish results

Requirements /Research Phase
 User Requirements
 Find out what classes of users there are and what
features users want in an information portal that
would help them innovate;
 Technology Literature Review
 Content Aggregation
 Visualization
 Categorization
 Personalization / Collaborative Filtering

Analysis / Design Phase
 Use-Cases
 For each category of user, enumerate the use-
cases (behavioural scenarios).
 User Interface Design
 Design the interface for query, query-refinement,
results visualization and recommendations.
 Software Evaluation
 Portal web-application components
 Collaborative Filtering packages
 Categorization / LitMiner interfaces

Development / Test Phase
 Prototype Information Portal
 Develop Content Aggregator
 Personalization / Recommendation agents
 Integrate Content Analysis
 LitMiner or Categorization / Concept Tagging
toolkits
 Test and Evaluate in a Pilot program.
 Experiments with test group to determine
 Measure of user acceptance
 Rates of Return Usage

Andre Vellino – Relevant Experience
 Entrust
 Content Analysis Policy Architect - Concept extraction and automatic categorization.
 imGenie – startup
 Systems architect for a wireless, bi-modal (voice / text), personalized information
retrieval and groupware application.
 National Research Council
 Research Scientist, IIT – Information Retrieval on small-format displays.
 Nortel Networks
 Senior Systems Architect, Disruptive Network Solutions - Personal Identity
Management for intelligent mediation of content-delivery in the network.
 Carleton University
 Cognitive Science Ph.D. program, Adjunct Research Professor
 NCF Internet
 Server-side Web architect for new NCF web-portal – registration, payment,
single sign-on to integrated applications.
 University of Georgia / Environmental Protection Agency
 Research Associate, Advanced Computational Methods Center - development of
expert system for predicting chemical reactivity from chemical structure.

Vellino presentationtocisti

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Vellino presentationtocisti

Similar to Vellino presentationtocisti (20)

Vellino presentationtocisti

Editor's Notes