Linked Data Love: research representation, discovery, and assessment
#ALAAC15
The explosion of linked data platforms and data stores over the last five years has been profound – both in terms of quantity of data as well as its potential impact. Research information systems such as VIVO (www.vivoweb.org) play a significant role in enabling this work. VIVO is an open source, Semantic Web-based application that provides an integrated, searchable view of the scholarly activities of an organization. The uniform semantic structure of VIVO-ISF data enables a new class of tools to advance science. This presentation will provide a brief introduction and update to VIVO and present ways that this semantically-rich data can enable visualizations, reporting and assessment, next-generation collaboration and team building, and enhanced multi-site search. Libraries are uniquely positioned to facilitate the open representation of research information and its subsequent use to spur collaboration, discovery, and assessment. The talk will conclude with a description of ways librarians are engaged in this work – including visioning, metadata and ontology creation, policy creation, data curation and management, technical, and engagement activities.
Kristi Holmes, PhD
Director, Galter Health Sciences Library
Director of Evaluation, NUCATS
Associate Professor, Preventive Medicine-Health and Biomedical Informatics
Northwestern University Feinberg School of Medicine
1. Linked Data Love:
research representation, discovery, and assessment
Kristi Holmes, PhD
@kristiholmes
Linked Library Data Interest Group
#alaac15 - June 27, 2015
2. The Semantic Web: a value proposition
At
its
heart,
the
Seman.c
Web
is
really
about
extending
standard
Web
technologies
to
be9er
deal
with
data
on
the
Web.
If
the
WWW
is
for
people,
the
Seman.c
Web
is
for
machines
George Thomas and Jim Hendler, http://www.data.gov/communities/node/116/blogs/142
Data modeled as bidirectional relationships
Web-based infrastructure of standards and technologies which allows for
a distributable, machine readable description of data that allows for
stronger data and smart web application linkages
3. Let’s talk about the data…
The Semantic Web isn't just about putting data
on the web. It is about making links, so that a
person or machine can explore the web of data.
With linked data, when you have some of it, you
can find other related data.
http://www.w3.org/DesignIssues/LinkedData.html
5. • Research is increasingly more interdisciplinary
• How can you find collaborators, track competitors, and stay abreast of current
research inside large institutions, at other institutions, and globally?
• How can you find others with shared interests or expertise?
• How can you build diverse teams? Find mentors? Be identified as a partner
by community groups?
Faculty
• Library administration or directors of core facilities want to align their strategic
plan with the evolving research needs of their clientele.
• Identifying growth areas of research through increasing publications, focused
areas of research and grant dollars enables this task to become more
evidence-based.
Support: facilities and personnel
• Research institutions can be extremely large and diverse
• How can administrators showcase and monitor research activity, track
competitors, and stay abreast of current research inside large institutions, at
other institutions, and globally?
• How can you enhance visibility and present a unified picture of an institution?
Administrators
We face a number of challenges on our campuses!
6. Research networking can help.
Information about scholars is optimized using a Web-based infrastructure of
standards and technologies which allows for a distributable, machine readable
description of data that allows for stronger data and smart web application linkages
across many universities, agencies, societies both within the US and abroad.
Why is this important?
Linked data infrastructure allows for
• Visualizations, research and clinical data integration,
and deep semantic searching across multiple types
and sources of data
• By breaking data out of traditional database silos,
research networking platforms promote a network
effect within a single site and across multiple sites
– The value of the network increases with the amount of
linked data and applications that are available to
consume the linked data.
7. 1. An open source
semantic web
application
2. An information model
3. An open community
Let’s talk about research networking in the context
of VIVO – what is it?
8. What is VIVO?
1. An open source semantic
web application
2. An information model
3. An open community
VIVO is one research networking platform,
although there are others. Organizations
make decisions about adopting these tools
based on many different features. The
most important aspect isn’t the software, it
is the data! More on that later…
9. VIVO
An open-source semantic web application that
enables the discovery of research and scholarship
across disciplines in an institution.
VIVO harvests data from verified sources and
offers detailed profiles of faculty and researchers.
Public, structured linked data about investigators
interests, activities and accomplishments, and
tools to use that data to advance science.
VIVO enjoys a robust open community space to
support implementation, adoption, &development
efforts around the world.
See http://wiki.duraspace.org/display/VIVO
10. A VIVO profile allows you to:
Showcase credentials, expertise, skills, and professional
achievements for individuals and campus groups.
Connect within focus areas and geographic expertise.
Simplify reporting tasks and link data to external
applications – e.g., to generate biosketches or CV or for
reporting purposes.
Publish the URL or link the profile to other applications.
Discover potential colleagues or campus resources by
work area, authorship, & collaborations.
Display visualizations of expertise areas or complex
collaboration networks and relationships.
11.
12. What is VIVO?
1. An open source semantic
web application
2. An information model
3. An open community
13. CTSA: Recommendations and Best Practices for
Research Networking
The Research Networking Recommendations were approved by the CTSA
Consortium Executive and Steering Committee on October 25, 2011.
Recommendations for Research Networking:
• Recommendation: All CTSAs should encourage their institution(s) to implement
research networking tool(s) institution-wide that utilize RDF triples and an ontology
compatible with the VIVO ontology.
• Recommendation: Information in people profiles at institutions should be publicly
available as data as a general principle, specifically as Linked Open Data. To
ensure quality of information, authoritative electronic data sources versus manual
entry should be emphasized. Institutions will vary in the amount of information that
they will include and make publicly available but the value is enhanced by the
quality and quantity of information.
• Recommendation: Monitoring of the research networking landscape, technology,
and tools should continue to be overseen by experts from the CTSA consortium
(e.g., the Research Networking group of the Informatics KFC).
https://www.ctsacentral.org/recommendations-and-best-practices-research-networking
14. Building a large web of data, greater than any
one effort, greater than any one platform.
Data Creators, Data
Aggregators, & Data
Consumers
Repositories. Tools.
Applications. Workflows
17. Weill Cornell Medical College
http://libraryconnect.elsevier.com/articles/technology-content/2013-03/authoritative-researcher-metadata-one-place-vivo
18. WCMC CTSC’s VIVO data sources
http://libraryconnect.elsevier.com/articles/technology-content/2013-03/authoritative-researcher-metadata-one-place-vivo
20. Data, Tools and Scientists
http://vivosearch.org/
http://vivosearchlight.org/
http://nrn.cns.iu.edu/
21. VIVO search scenarios
• Multiple campuses of one university
• Regional connections
- e.g., Illinois ties with regional federal
labs
• Consortia – 62+ CTSAs, USDA plus
land grant universities
• International
- 13 Netherlands universities and the
National Library
- German Universities
- AgriVIVO – UN FAO
Searchlight, AgriVIVO, etc.
22. Concept Coverage
• Research networking systems queried: 57
- SPARQL endpoints queried: 9
- Sites crawled: 48
• Institutions indexed: 64
- CTSA institutions: 27
• Total person URIs: 4,933,757
- Unique individuals profiled: 140,949 - 300,239
• Total publications by those persons indexed as part of their profile: 8,396,744
• Total co-author pairs (two people on the same paper): 48,012,993
• The harvesting times listed below are the times required to interrogate the respective SPARQL
endpoints or crawl the respective servers and cache the results locally at Iowa.
CTSAsearch
http://research.icts.uiowa.edu/polyglot/
23. What is VIVO?
1. An open source semantic
web application
2. An information model
3. An open community
24. VIVO Community
• DuraSpace wiki
• Calls and listservs
- Ontology
- Development
- Implementation
- Outreach
- Tools and Apps
• Social Media
- Facebook
- LinkedIn
- Twitter
• Events
• Annual conference
• Implementation Fest
• Workshops
• Hackathons
30. • Are a trusted, neutral entity
• Have a tradition of service and support
• Strive to serve all missions of the institution
• Are technology centers and have IT and data expertise
• Have skills—information organization, instruction, usability,
subject expertise
• Have close relationships with their clients (buy in)
• Understand user needs
• Understand the importance of collaboration and know how to
bring people together
• Have knowledge of institution, research, education, clinical
landscape
Library Staff:
Libraries:
What roles can the library play?
31. What roles can the library play?
Librarians are successfully stepping up to the semantic
web plate in a variety of roles related to institutional
research networking platforms.
• Outreach and adoption activities
• Education and training on the use of the platform
• Ontology and controlled vocabulary expertise,
extending the model
• Negotiations with data providers
• Programming, technical support
• Workgroup representation
• …and more!
Research networking also provides an opportunity for libraries
to become familiar with many concepts around linked open
data and the semantic web.
33. Northwestern University Clinical and Translational
Sciences (NUCATS) Institute
Mission: Speeding transformative
research discoveries to patients
and the community
http://nucats.northwestern.edu/
35. Digital Projects led by Digital Systems and
Collection Services
Among other projects…
Symplectic Elements
- Back-end bibliometric aggregator
- Support OA with repository integration
- Facilitates reports and reuse of clean aggregated data from a number of
diverse sources
Digital repository
- We’ll gain the ability to create, share, and preserve attractive, functional,
and citable digital collections and exhibits
- Promotes discovery and access of FSM scholarship, both traditional and
alternative outputs
- Better metrics
35
36. Symplectic Elements
Tracking, evaluation,
and reporting
Digital Asset
Management System
(IR)
Tasks (CVs and
biosketches, etc.)
Research
Information Systems
The Symplectic
Elements platform &
data will help facilitate
new avenues of
support
37. 37
Our shop is committed to open source
principles and we leverage semantic web
languages and architecture whenever
possible to support open science.
We want to optimize discoverability and dissemination of content and
enhance the impact of FSM, NUCATS, and our Northwestern Medicine
community.
38. • Measurement
instruments
• Con4nuing
educa4on
materials
• Cost-‐effec4ve
interven4on
• Consensus
development
conferences
• American
Medical
Associa4on
Current
Procedural
Terminology
(CPT)
codes
• Change
in
delivery
of
healthcare
services
• Gray
literature
Going beyond the counts to find evidence of
meaningful impact
• New
experimental
methods,
databases
or
soHware
tools
• New
diagnos4c
criteria
or
standards
of
care
• Biologics
• Curriculum
guidelines
• Clinical/prac4ce
guidelines
• Quality
measure
guidelines
https://becker.wustl.edu/impact-assessment
http://nucats.northwestern.edu/
Pathways
Advancement of Knowledge
Clinical Implementation
Legislation and Policy Enactment
Economic Benefit
Community Benefit
40. Hope to see you at the conference in August!
http://vivoconference.org/
41. Acknowledgements
Teams:
• The amazing team at Galter Library
• VIVO Colleagues worldwide
Support:
• Northwestern University Clinical and Translational
Sciences Institute, NIH award UL1TR000150
• VIVO, NIH award U24 RR029822
• VIVO/DuraSpace
Questions/Follow-up:
• kristi.holmes@northwestern.edu
• Twitter: @kristiholmes