SlideShare a Scribd company logo
1 of 69
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Herbert Van de Sompel
DANS
@hvdsomp
https://orcid.org/0000-0002-0715-6126
Collecting the Organizational Scholarly Record
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
James Powell, Harihar Shankar, Marko Rodriguez, and Herbert Van de Sompel (2014) EgoSystem: Where are our
Alumni? code{4}lib journal, issue 24. https://journal.code4lib.org/articles/9519
2013 - EgoSystem
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
EgoSystem Team
• Los Alamos National Laboratory:
• James Powell
• Harihar Shankar
• Herbert Van de Sompel
• Aurellius:
• Marko Rodriguez
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Motivation
• When postdocs leave LANL, the local information systems
maintain very little information about them
• But senior management is interested in engaging them after they
leave LANL as Ambassadors and Advocates
• They needs answers to questions like:
• Who is currently working where?
• Who is involved in what areas of research?
• Who might serve as advocates for the Lab?
• Who knows someone who knows someone we need to
connect with?
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
2012 - Initial Approach: Set Up a VIVO Instance
• 2700+ records were
ingested from LANL
Postdoc Office data to
create initial user profiles
• 8 postdoc alumni were
contacted to complete
their profile
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Up-to-date information at all times is essential to meet the need of
senior LANL management
• Some existing VIVO instances seemed to have been pre-
populated but then remained static after launch
• Would current and former postdocs be interested in
maintaining a professional profile on a VIVO instance
intended to help out LANL?
Doubts about the VIVO Instance
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Leverage public, network-level information pertaining to LANL
Alumni
• Find their network presences - social portals, scientific
portals, homepages, etc.
• Recurrently collect information from those presences: current
employer, social network neighborhood, geo location, etc.
• Create applications based on that information
• Rationale: People have incentives to keep network-layer
information up-to-date
• Goal: Devise a sustainable approach to gather and use up-
to-date information pertaining to LANL Alumni
2013 - New Approach: Leverage Network-Level Information
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Available information elements for PostDocs:
• Z#
• Name
• Institutions:
o PhD University; LANL; Institution after
LANL
• Field of Study
• Discipline
Find network identities:
• Various queries based on information
elements in:
o Yahoo Boss API; MS Academic
Search API
• Search for candidate identities:
o LinkedIn; MS Academic; Twitter;
Homepage; Blogger; SlideShare;
WikiPedia
• Rank and select candidate identities
o Reward when: same identities from
various searches; content matches
information elements
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
LinkedIn Identity
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
LinkedIn Identity
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
LinkedIn Identity
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Twitter Identity
Network-derived information:
• Identities:
o LinkedIn; MS Academic; Twitter;
Homepage; Blogger; SlideShare;
WikiPedia
• Additional information elements:
o Current institution; geo location;
updated discipline
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
0
200
400
600
800
1000
1200
1400
1600
1800
none one two three four five
Web Identities Discovered Per Postdoc
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Resulting Identity Types per Postdoc
0
500
1000
1500
2000
2500
3000
3500
LANL MS Academic LinkedIn Twitter
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Random set of 100 postdocs
• MS Academic
o 86 correct
- 71 correctly discovered identities
- 15 correctly labeled as not having identity
o 14 incorrect
- 2 discovered identities did not match the postdoc
- 12 existing identities were not discovered
• Algorithms favored precision over recall
Evaluation of the Discovery Algorithm
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Network-derived information:
• Network neighborhood:
o Social network ~ Twitter: followers,
followed
o Academic network ~ co-authors MS
Academic
o Affiliations ~ LinkedIn, homepage
• Artifacts: papers, slide decks
• Concepts
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Platonic vertices
o Persons
o Institutions
o Artifacts
o Concepts
• Affiliation vertices
o Different types
o Different time periods
• Graph extent, started with 3,005 postdocs:
o Vertices: 9,015,844
o Edges: 19,399,683
Property Graph Representation of Resulting Information
Property Graph Representation of Resulting Information
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Graph Database for Storage/Retrieval/Analysis
Titan Distributed Graph Database
http://titan.thinkaurelius.com/
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Simple web query interface
• Shareable profile page for individuals
• Graph analytics (aggregate social networks, path analysis) and
graph visualization
• Who’s where (the LANL Director travels) search
• Capability to add non-LANL person to the graph
o To find closest path to the person via a LANL postdoc
EgoSystem Application
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Success?
• At the end of the demo meeting, the director said (paraphrasing)
o “I didn’t know what I wanted when we first met but this looks
like what I want, what I need.”
• Project discontinued because of the inability to access LinkedIn
data in legitimate manner
• As a result of heuristic-based processes, the database, query
results are not necessarily correct/complete. This made
EgoSystem an approximating application.
• Fantastic 2 month (~ 6 MM) project that did not yield a production
system but in which we learned an awful lot
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
James Powell, Martin Klein, and Herbert Van de Sompel (2017) Autoload: a pipeline for expanding the holdings of
an Institutional Repository enabled by ResourceSync code{4}lib journal, issue 36.
https://journal.code4lib.org/articles/12427
2016 - Autoload
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
2018 – myresearch.institute
The Scholarly Orphans project
is funded by the Andrew W. Mellon Foundation
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
myresearch.institute Team
• Los Alamos National Laboratory:
• Lyudmila Balakireva
• Martin Klein
• James Powell
• Harihar Shankar
• Herbert Van de Sompel
• Old Dominion University:
• Sawood Alam
• Grant Atkins
• Shawn Jones
• Mat Kelly
• Michael L. Nelson
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Consideration
• Researchers are increasingly using a variety of web platforms for
collaboration and communication
• Why?
• Many of these platforms have desirable characteristics
• Versioning
• Time stamping
• Social embedding
• Their institutions do not provide platforms that have global reach
• Collaboration, cf. Github ~ productivity
• Communication, cf. SlideShare ~ visibility
Research and Research Communication on the Web
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Consideration
• Researchers are increasingly using a variety of web platforms for
collaboration and communication
• Web Platforms:
• Dedicated to scholarship:
• Commercial: e.g., FigShare, Publons
• Not for profit: e.g., OSF, Zenodo
• General purpose:
• Commercial: e.g., GitHub, SlideShare
• Not for profit: e.g., Wikipedia, Wikidata
Research and Research Communication on the Web
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Emma Schymanski
https://orcid.org/0000-0001-6868-8145
https://github.com/schymane
https://www.slideshare.net/EmmaSchymanski
https://figshare.com/authors/Emma_Schymanski/5087039
https://publons.com/author/1538491/emma-schymanski#profile
https://www.eawag.ch/en/aboutus/portrait/organisation/staff/profile/emma-schymanski/
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Shawn Jones
https://orcid.org/0000-0002-4372-870X
http://www.shawnmjones.org/
https://github.com/shawnmjones
https://www.slideshare.net/shawnmjones
https://en.wikipedia.org/wiki/User:Shawnmjones
https://www.blogger.com/profile/17827543974149663194
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Consideration
• Researchers deposit artifacts in web platforms
• Status quo - The researchers’ institutions are in the dark
• Do not know about the existence of these artifact
• Do not have a copy of these artifacts
Research and Research Communication on the Web
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Consideration
• Researchers deposit artifacts in web platforms
• Status quo – Uncertainty regarding long-term access
• Commercial: changing business model, no preservation commitment
• Not for profit: unpredictable funding stream
Research and Research Communication on the Web
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
• Consideration
• Researchers deposit artifacts in web platforms
• Status quo - Not systematically archived
• No frameworks like LOCKSS/Portico exist for these artifacts
• Researchers only selectively deposit artifacts in portals that
provide archival guarantees; to obtain a cite-able DOI
• Can’t expect researchers to (also) upload all artifacts in IRs
• Web archives only incidentally archive these artifacts, cf.
anecdotal & Hiberlink project evidence
Research and Research Communication on the Web
Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE
https://doi.org/10.1371/journal.pone.0115253
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Emma’s SlideShare Artifact: 0 Mementos
https://www.slideshare.net/EmmaSchymanski/dmcm2018-community-resources-connecting-chemistry-and-toxicity-knowledge
http://timetravel.mementoweb.org/
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Shawn’s GitHub Artifact: 1 Memento
https://github.com/shawnmjones/mediawiki
https://web.archive.org/web/*/https://github.com/shawnmjones/mediawiki
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Evidence from the Hiberlink Project
Web resources referenced in Elsevier corpus (1996-2012)
without representative Memento in public web archives
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
The Scholarly Orphans Project: How to Archive these Artifacts?
• Explores an institution-driven paradigm
• Academic institutions typically have a long shelf life
• A basic premise underlying e.g., LOCKSS, perma.cc
• An academic institution should be interested in capturing the
artifacts (intellectual property) its scholars deposit on the web
• Collecting and archiving such artifacts aligns with the
mission of academic libraries
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
An Institutional Perspective
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
The Scholarly Orphans Project: How to Archive these Artifacts?
• Explores a paradigm inspired by web archiving
• Scale of the problem
• Can’t expect researchers to upload all artifacts in an institutional
repository
• Bilateral agreements for archival purposes with most web
portals unlikely
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
A Web Archiving Perspective
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
myresearch.institute Prototype Pipeline
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Tracking Artifacts
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Tracking Artifacts - Description
• In order to track artifacts that were recently deposited by an
institutional researcher in a portal, one reasonably needs:
• The web identity of the researcher in the portal
• Algorithmic discovery, cf. EgoSystem
• Discovery via a registry, cf. ORCID paper
• Manual collection
• A portal API that supports:
• Access by web identity
• Access to contributions “since …” for the web identity
• Result of tracking:
• URI(s) of new artifact(s) discovered in the portal
Klein, M., and Van de Sompel, H. (2017) Discovering Scholarly Orphans Using ORCID. Proceedings of the 2017
ACM/IEEE Joint Conference on Digital Libraries https://arxiv.org/abs/1703.09343
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Tracking Artifacts - Challenges
• Portal API access by web identity
• Broadly supported by general purpose portals
• Typically not supported by scholarly portals
• Some lack an API altogether
• Should add ORCID access to APIs
• OAI-PMH and ResourceSync need sets per web identity
• Professional versus personal contributions
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Capturing Artifacts
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Capturing Artifacts - Description
• The capture process takes as input the URI of a new artifact
discovered in a portal
• Its task is to create a representative institutional capture of the
artifact
• Result of capture:
• WARC file for new artifact in an institutional archive
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Capturing Artifacts - Challenges
• Create a high-fidelity capture using an approach that scales for a
steady stream of new artifacts
• Handle dynamic content & interactive features of web pages
• Determine the web boundary of the artifact
• More than the input artifact URI
• The boundary is in the eye of the beholder
• We made a significant breakthrough with the Memento Tracer
framework
• Others (cf. webrecorder.io Autopilot, IA Brozzler) are working on
the same problem
Memento Tracer: http://tracer.mementoweb.org
Autopilot: https://blog.webrecorder.io/2019/08/14/autopilot
Brozzler: https://github.com/internetarchive/brozzler
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Capturing Artifacts
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Memento Tracer - Framework
http://tracer.mementoweb.org
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Archiving Artifacts
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Archiving Artifacts - Description
• The archiving process takes as input the URI of a WARC file
generated by the capture process
• Its task is to ingest the WARC file in a cross-institutional web archive
• This can be achieved using off-the-shelf web archiving software,
e.g., pywb, Open Wayback
• Result of archiving:
• Mementos pertaining to newly discovered artifact in a cross-
institutional, Memento-compliant web archive
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Archiving Artifacts - Challenges
• Attempted to use ipwb, a pywb version that uses IPFS
• Cross-institutional distributed file system with redundancy
• Ran out of time to get it operationally stable
Sawood Alam, Mat Kelly, and Michael L. Nelson (2016) InterPlanetary Wayback: The Permanent Web Archive
https://doi.org/10.1145/2910896.2925467
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
myresearch.institute - Researchers
• Uniquely identified by ORCIDs
• Web identities in multiple portals
• Create various types of artifacts
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
myresearch.institute - Portals
• Tracking started August 27 2018
• Tracking artifacts created starting
August 1 2018
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Scholarly Orphans – Pipeline
• 16,005 unique artifacts tracked, captured, and archived between
20180801 and 20190828
• 60MB event database
• 83GB of WARC files
• 3GB of web archive index
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Showtime: myresearch.institute Portal
https://myresearchinstitute.org
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Success?
• “Interesting project! I’m happy to participate.”
“One more thing, is it possible to get a copy of the URI-Rs that
you guys detected so that I can feed them into an archive of my
choice?...”
• Prototype pipeline developed over 8 months (24 MM)
• Metrics of the prototype demonstrate that researchers generate
a lot of artifacts (that their institutions are typically not aware of)
• Metrics of the prototype suggest it should be possible to run a
production pipeline at the scale of an academic institution
• But would they …?
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Some Final Thoughts
• For a number of reasons, applications that leverage network-level
information at scale (e.g. EgoSystem, myresearch.institute,
Autoload) tend not to be perfect. But they are automatic.
• Do institutions reserve sufficient resources for innovation and
failure? The alternative seems to be outsourcing and loss of
expertise.
• Ideas/visions are rarely fully realized when working on them. But
many times, the work does improve on the status quo. So keep
dreaming and working!
@hvdsomp
VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
Herbert Van de Sompel
DANS
@hvdsomp
https://orcid.org/0000-0002-0715-6126
Collecting the Organizational Scholarly Record

More Related Content

Similar to Collecting the organizational scholarly record

Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...
Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...
Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...KISK FF MU
 
Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...Hristian Daskalov
 
Digital identity and employability
Digital identity and employabilityDigital identity and employability
Digital identity and employabilityLisa Harris
 
OKFest2014 glam-survey_workshop_20140717
OKFest2014 glam-survey_workshop_20140717OKFest2014 glam-survey_workshop_20140717
OKFest2014 glam-survey_workshop_20140717Beat Estermann
 
Data extraction tools (2019 Version)
Data extraction tools (2019 Version)Data extraction tools (2019 Version)
Data extraction tools (2019 Version)Cristian Ruiz
 
Six month evaluation dufferin research serbia
Six month evaluation dufferin research serbiaSix month evaluation dufferin research serbia
Six month evaluation dufferin research serbiaRick Frank
 
Six month evaluation dufferin research serbia
Six month evaluation dufferin research serbiaSix month evaluation dufferin research serbia
Six month evaluation dufferin research serbiaRick Frank
 
Tracker workshop ALT Conference 2016
Tracker workshop ALT Conference 2016Tracker workshop ALT Conference 2016
Tracker workshop ALT Conference 2016Jisc
 
Elsevier social media for researchers - University of Balamand
Elsevier social media for researchers - University of BalamandElsevier social media for researchers - University of Balamand
Elsevier social media for researchers - University of Balamanduoblibraries
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...Martin Klein
 
Creativity, simplicity, and
Creativity, simplicity, andCreativity, simplicity, and
Creativity, simplicity, andsamira amiri
 
Translating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with GraphsTranslating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with GraphsNeo4j
 
Exploring Research Opportunities in the Digital Era
Exploring Research Opportunities in the Digital EraExploring Research Opportunities in the Digital Era
Exploring Research Opportunities in the Digital EraTogar Simatupang
 
Studoland: The Studen'ts Dream
Studoland: The Studen'ts DreamStudoland: The Studen'ts Dream
Studoland: The Studen'ts DreamBitBomB01
 
Domain of One's Own @ Emory for TATTO 2015
Domain of One's Own @ Emory for TATTO 2015Domain of One's Own @ Emory for TATTO 2015
Domain of One's Own @ Emory for TATTO 2015HeatherJulien
 

Similar to Collecting the organizational scholarly record (20)

Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...
Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...
Tatjana Aparac-Jelušić, Lucija Žilić, Jelena Šatalić Krstić: Marketing digiti...
 
Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...
 
Digital identity and employability
Digital identity and employabilityDigital identity and employability
Digital identity and employability
 
Bhagi
BhagiBhagi
Bhagi
 
OKFest2014 glam-survey_workshop_20140717
OKFest2014 glam-survey_workshop_20140717OKFest2014 glam-survey_workshop_20140717
OKFest2014 glam-survey_workshop_20140717
 
Data extraction tools (2019 Version)
Data extraction tools (2019 Version)Data extraction tools (2019 Version)
Data extraction tools (2019 Version)
 
Six month evaluation dufferin research serbia
Six month evaluation dufferin research serbiaSix month evaluation dufferin research serbia
Six month evaluation dufferin research serbia
 
Six month evaluation dufferin research serbia
Six month evaluation dufferin research serbiaSix month evaluation dufferin research serbia
Six month evaluation dufferin research serbia
 
Tracker workshop ALT Conference 2016
Tracker workshop ALT Conference 2016Tracker workshop ALT Conference 2016
Tracker workshop ALT Conference 2016
 
Elsevier social media for researchers - University of Balamand
Elsevier social media for researchers - University of BalamandElsevier social media for researchers - University of Balamand
Elsevier social media for researchers - University of Balamand
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
 
Studoland BitBomB01
Studoland  BitBomB01Studoland  BitBomB01
Studoland BitBomB01
 
Creativity, simplicity, and
Creativity, simplicity, andCreativity, simplicity, and
Creativity, simplicity, and
 
Translating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with GraphsTranslating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with Graphs
 
Exploring Research Opportunities in the Digital Era
Exploring Research Opportunities in the Digital EraExploring Research Opportunities in the Digital Era
Exploring Research Opportunities in the Digital Era
 
STI2 Board Meeting 2012
STI2 Board Meeting 2012STI2 Board Meeting 2012
STI2 Board Meeting 2012
 
Studoland: The Studen'ts Dream
Studoland: The Studen'ts DreamStudoland: The Studen'ts Dream
Studoland: The Studen'ts Dream
 
Studoland: The Student's Dream
Studoland: The Student's DreamStudoland: The Student's Dream
Studoland: The Student's Dream
 
Data for Social Good
Data for Social GoodData for Social Good
Data for Social Good
 
Domain of One's Own @ Emory for TATTO 2015
Domain of One's Own @ Emory for TATTO 2015Domain of One's Own @ Emory for TATTO 2015
Domain of One's Own @ Emory for TATTO 2015
 

More from Herbert Van de Sompel

The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about itHerbert Van de Sompel
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DoneHerbert Van de Sompel
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueHerbert Van de Sompel
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Herbert Van de Sompel
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsHerbert Van de Sompel
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Herbert Van de Sompel
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarshipHerbert Van de Sompel
 
A Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordA Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordHerbert Van de Sompel
 

More from Herbert Van de Sompel (20)

The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about it
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized Web
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than Done
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)
 
To the Rescue of Scholarly Orphans
To the Rescue of Scholarly OrphansTo the Rescue of Scholarly Orphans
To the Rescue of Scholarly Orphans
 
Almost two decades at LANL
Almost two decades at LANLAlmost two decades at LANL
Almost two decades at LANL
 
Perseverance on Persistence
Perseverance on PersistencePerseverance on Persistence
Perseverance on Persistence
 
Paul Evan Peters Lecture
Paul Evan Peters LecturePaul Evan Peters Lecture
Paul Evan Peters Lecture
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed Collections
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarship
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
Creating Pockets of Persistence
Creating Pockets of PersistenceCreating Pockets of Persistence
Creating Pockets of Persistence
 
ResourceSync Quick Overview
ResourceSync Quick OverviewResourceSync Quick Overview
ResourceSync Quick Overview
 
Memento 101
Memento 101Memento 101
Memento 101
 
A Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordA Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly Record
 

Recently uploaded

(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...Escorts Call Girls
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...SUHANI PANDEY
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubaikojalkojal131
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls DubaiEscorts Call Girls
 
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft DatingDubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft Datingkojalkojal131
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...SUHANI PANDEY
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Servicegwenoracqe6
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)Delhi Call girls
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLimonikaupta
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.soniya singh
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...roncy bisnoi
 
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.soniya singh
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...singhpriety023
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 

Recently uploaded (20)

(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
Russian Call Girls in %(+971524965298  )#  Call Girls in DubaiRussian Call Girls in %(+971524965298  )#  Call Girls in Dubai
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls Dubai
 
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft DatingDubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 

Collecting the organizational scholarly record

  • 1. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Herbert Van de Sompel DANS @hvdsomp https://orcid.org/0000-0002-0715-6126 Collecting the Organizational Scholarly Record
  • 2. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro James Powell, Harihar Shankar, Marko Rodriguez, and Herbert Van de Sompel (2014) EgoSystem: Where are our Alumni? code{4}lib journal, issue 24. https://journal.code4lib.org/articles/9519 2013 - EgoSystem
  • 3. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro EgoSystem Team • Los Alamos National Laboratory: • James Powell • Harihar Shankar • Herbert Van de Sompel • Aurellius: • Marko Rodriguez
  • 4. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Motivation • When postdocs leave LANL, the local information systems maintain very little information about them • But senior management is interested in engaging them after they leave LANL as Ambassadors and Advocates • They needs answers to questions like: • Who is currently working where? • Who is involved in what areas of research? • Who might serve as advocates for the Lab? • Who knows someone who knows someone we need to connect with?
  • 5. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro 2012 - Initial Approach: Set Up a VIVO Instance • 2700+ records were ingested from LANL Postdoc Office data to create initial user profiles • 8 postdoc alumni were contacted to complete their profile
  • 6. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Up-to-date information at all times is essential to meet the need of senior LANL management • Some existing VIVO instances seemed to have been pre- populated but then remained static after launch • Would current and former postdocs be interested in maintaining a professional profile on a VIVO instance intended to help out LANL? Doubts about the VIVO Instance
  • 7. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Leverage public, network-level information pertaining to LANL Alumni • Find their network presences - social portals, scientific portals, homepages, etc. • Recurrently collect information from those presences: current employer, social network neighborhood, geo location, etc. • Create applications based on that information • Rationale: People have incentives to keep network-layer information up-to-date • Goal: Devise a sustainable approach to gather and use up- to-date information pertaining to LANL Alumni 2013 - New Approach: Leverage Network-Level Information
  • 8. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro
  • 9. Available information elements for PostDocs: • Z# • Name • Institutions: o PhD University; LANL; Institution after LANL • Field of Study • Discipline
  • 10. Find network identities: • Various queries based on information elements in: o Yahoo Boss API; MS Academic Search API • Search for candidate identities: o LinkedIn; MS Academic; Twitter; Homepage; Blogger; SlideShare; WikiPedia • Rank and select candidate identities o Reward when: same identities from various searches; content matches information elements
  • 11. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro LinkedIn Identity
  • 12. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro LinkedIn Identity
  • 13. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro LinkedIn Identity
  • 14. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Twitter Identity
  • 15. Network-derived information: • Identities: o LinkedIn; MS Academic; Twitter; Homepage; Blogger; SlideShare; WikiPedia • Additional information elements: o Current institution; geo location; updated discipline
  • 16. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro 0 200 400 600 800 1000 1200 1400 1600 1800 none one two three four five Web Identities Discovered Per Postdoc
  • 17. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Resulting Identity Types per Postdoc 0 500 1000 1500 2000 2500 3000 3500 LANL MS Academic LinkedIn Twitter
  • 18. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Random set of 100 postdocs • MS Academic o 86 correct - 71 correctly discovered identities - 15 correctly labeled as not having identity o 14 incorrect - 2 discovered identities did not match the postdoc - 12 existing identities were not discovered • Algorithms favored precision over recall Evaluation of the Discovery Algorithm
  • 19. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Network-derived information: • Network neighborhood: o Social network ~ Twitter: followers, followed o Academic network ~ co-authors MS Academic o Affiliations ~ LinkedIn, homepage • Artifacts: papers, slide decks • Concepts
  • 20. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Platonic vertices o Persons o Institutions o Artifacts o Concepts • Affiliation vertices o Different types o Different time periods • Graph extent, started with 3,005 postdocs: o Vertices: 9,015,844 o Edges: 19,399,683 Property Graph Representation of Resulting Information
  • 21. Property Graph Representation of Resulting Information
  • 22. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Graph Database for Storage/Retrieval/Analysis Titan Distributed Graph Database http://titan.thinkaurelius.com/
  • 23. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Simple web query interface • Shareable profile page for individuals • Graph analytics (aggregate social networks, path analysis) and graph visualization • Who’s where (the LANL Director travels) search • Capability to add non-LANL person to the graph o To find closest path to the person via a LANL postdoc EgoSystem Application
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Success? • At the end of the demo meeting, the director said (paraphrasing) o “I didn’t know what I wanted when we first met but this looks like what I want, what I need.” • Project discontinued because of the inability to access LinkedIn data in legitimate manner • As a result of heuristic-based processes, the database, query results are not necessarily correct/complete. This made EgoSystem an approximating application. • Fantastic 2 month (~ 6 MM) project that did not yield a production system but in which we learned an awful lot
  • 34. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro James Powell, Martin Klein, and Herbert Van de Sompel (2017) Autoload: a pipeline for expanding the holdings of an Institutional Repository enabled by ResourceSync code{4}lib journal, issue 36. https://journal.code4lib.org/articles/12427 2016 - Autoload
  • 35. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro 2018 – myresearch.institute The Scholarly Orphans project is funded by the Andrew W. Mellon Foundation
  • 36. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro myresearch.institute Team • Los Alamos National Laboratory: • Lyudmila Balakireva • Martin Klein • James Powell • Harihar Shankar • Herbert Van de Sompel • Old Dominion University: • Sawood Alam • Grant Atkins • Shawn Jones • Mat Kelly • Michael L. Nelson
  • 37. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Consideration • Researchers are increasingly using a variety of web platforms for collaboration and communication • Why? • Many of these platforms have desirable characteristics • Versioning • Time stamping • Social embedding • Their institutions do not provide platforms that have global reach • Collaboration, cf. Github ~ productivity • Communication, cf. SlideShare ~ visibility Research and Research Communication on the Web
  • 38. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Consideration • Researchers are increasingly using a variety of web platforms for collaboration and communication • Web Platforms: • Dedicated to scholarship: • Commercial: e.g., FigShare, Publons • Not for profit: e.g., OSF, Zenodo • General purpose: • Commercial: e.g., GitHub, SlideShare • Not for profit: e.g., Wikipedia, Wikidata Research and Research Communication on the Web
  • 39. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Emma Schymanski https://orcid.org/0000-0001-6868-8145 https://github.com/schymane https://www.slideshare.net/EmmaSchymanski https://figshare.com/authors/Emma_Schymanski/5087039 https://publons.com/author/1538491/emma-schymanski#profile https://www.eawag.ch/en/aboutus/portrait/organisation/staff/profile/emma-schymanski/
  • 40. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Shawn Jones https://orcid.org/0000-0002-4372-870X http://www.shawnmjones.org/ https://github.com/shawnmjones https://www.slideshare.net/shawnmjones https://en.wikipedia.org/wiki/User:Shawnmjones https://www.blogger.com/profile/17827543974149663194
  • 41. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Consideration • Researchers deposit artifacts in web platforms • Status quo - The researchers’ institutions are in the dark • Do not know about the existence of these artifact • Do not have a copy of these artifacts Research and Research Communication on the Web
  • 42. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Consideration • Researchers deposit artifacts in web platforms • Status quo – Uncertainty regarding long-term access • Commercial: changing business model, no preservation commitment • Not for profit: unpredictable funding stream Research and Research Communication on the Web
  • 43. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro • Consideration • Researchers deposit artifacts in web platforms • Status quo - Not systematically archived • No frameworks like LOCKSS/Portico exist for these artifacts • Researchers only selectively deposit artifacts in portals that provide archival guarantees; to obtain a cite-able DOI • Can’t expect researchers to (also) upload all artifacts in IRs • Web archives only incidentally archive these artifacts, cf. anecdotal & Hiberlink project evidence Research and Research Communication on the Web Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0115253
  • 44. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Emma’s SlideShare Artifact: 0 Mementos https://www.slideshare.net/EmmaSchymanski/dmcm2018-community-resources-connecting-chemistry-and-toxicity-knowledge http://timetravel.mementoweb.org/
  • 45. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Shawn’s GitHub Artifact: 1 Memento https://github.com/shawnmjones/mediawiki https://web.archive.org/web/*/https://github.com/shawnmjones/mediawiki
  • 46. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Evidence from the Hiberlink Project Web resources referenced in Elsevier corpus (1996-2012) without representative Memento in public web archives
  • 47. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro The Scholarly Orphans Project: How to Archive these Artifacts? • Explores an institution-driven paradigm • Academic institutions typically have a long shelf life • A basic premise underlying e.g., LOCKSS, perma.cc • An academic institution should be interested in capturing the artifacts (intellectual property) its scholars deposit on the web • Collecting and archiving such artifacts aligns with the mission of academic libraries
  • 48. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro An Institutional Perspective
  • 49. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro The Scholarly Orphans Project: How to Archive these Artifacts? • Explores a paradigm inspired by web archiving • Scale of the problem • Can’t expect researchers to upload all artifacts in an institutional repository • Bilateral agreements for archival purposes with most web portals unlikely
  • 50. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro A Web Archiving Perspective
  • 51. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro myresearch.institute Prototype Pipeline
  • 52. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Tracking Artifacts
  • 53. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Tracking Artifacts - Description • In order to track artifacts that were recently deposited by an institutional researcher in a portal, one reasonably needs: • The web identity of the researcher in the portal • Algorithmic discovery, cf. EgoSystem • Discovery via a registry, cf. ORCID paper • Manual collection • A portal API that supports: • Access by web identity • Access to contributions “since …” for the web identity • Result of tracking: • URI(s) of new artifact(s) discovered in the portal Klein, M., and Van de Sompel, H. (2017) Discovering Scholarly Orphans Using ORCID. Proceedings of the 2017 ACM/IEEE Joint Conference on Digital Libraries https://arxiv.org/abs/1703.09343
  • 54. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Tracking Artifacts - Challenges • Portal API access by web identity • Broadly supported by general purpose portals • Typically not supported by scholarly portals • Some lack an API altogether • Should add ORCID access to APIs • OAI-PMH and ResourceSync need sets per web identity • Professional versus personal contributions
  • 55. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Capturing Artifacts
  • 56. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Capturing Artifacts - Description • The capture process takes as input the URI of a new artifact discovered in a portal • Its task is to create a representative institutional capture of the artifact • Result of capture: • WARC file for new artifact in an institutional archive
  • 57. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Capturing Artifacts - Challenges • Create a high-fidelity capture using an approach that scales for a steady stream of new artifacts • Handle dynamic content & interactive features of web pages • Determine the web boundary of the artifact • More than the input artifact URI • The boundary is in the eye of the beholder • We made a significant breakthrough with the Memento Tracer framework • Others (cf. webrecorder.io Autopilot, IA Brozzler) are working on the same problem Memento Tracer: http://tracer.mementoweb.org Autopilot: https://blog.webrecorder.io/2019/08/14/autopilot Brozzler: https://github.com/internetarchive/brozzler
  • 58. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Capturing Artifacts
  • 59. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Memento Tracer - Framework http://tracer.mementoweb.org
  • 60. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Archiving Artifacts
  • 61. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Archiving Artifacts - Description • The archiving process takes as input the URI of a WARC file generated by the capture process • Its task is to ingest the WARC file in a cross-institutional web archive • This can be achieved using off-the-shelf web archiving software, e.g., pywb, Open Wayback • Result of archiving: • Mementos pertaining to newly discovered artifact in a cross- institutional, Memento-compliant web archive
  • 62. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Archiving Artifacts - Challenges • Attempted to use ipwb, a pywb version that uses IPFS • Cross-institutional distributed file system with redundancy • Ran out of time to get it operationally stable Sawood Alam, Mat Kelly, and Michael L. Nelson (2016) InterPlanetary Wayback: The Permanent Web Archive https://doi.org/10.1145/2910896.2925467
  • 63. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro myresearch.institute - Researchers • Uniquely identified by ORCIDs • Web identities in multiple portals • Create various types of artifacts
  • 64. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro myresearch.institute - Portals • Tracking started August 27 2018 • Tracking artifacts created starting August 1 2018
  • 65. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Scholarly Orphans – Pipeline • 16,005 unique artifacts tracked, captured, and archived between 20180801 and 20190828 • 60MB event database • 83GB of WARC files • 3GB of web archive index
  • 66. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Showtime: myresearch.institute Portal https://myresearchinstitute.org
  • 67. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Success? • “Interesting project! I’m happy to participate.” “One more thing, is it possible to get a copy of the URI-Rs that you guys detected so that I can feed them into an archive of my choice?...” • Prototype pipeline developed over 8 months (24 MM) • Metrics of the prototype demonstrate that researchers generate a lot of artifacts (that their institutions are typically not aware of) • Metrics of the prototype suggest it should be possible to run a production pipeline at the scale of an academic institution • But would they …?
  • 68. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Some Final Thoughts • For a number of reasons, applications that leverage network-level information at scale (e.g. EgoSystem, myresearch.institute, Autoload) tend not to be perfect. But they are automatic. • Do institutions reserve sufficient resources for innovation and failure? The alternative seems to be outsourcing and loss of expertise. • Ideas/visions are rarely fully realized when working on them. But many times, the work does improve on the status quo. So keep dreaming and working!
  • 69. @hvdsomp VIVO Conference 2019, September 5 2019, Podgorica, Montenegro Herbert Van de Sompel DANS @hvdsomp https://orcid.org/0000-0002-0715-6126 Collecting the Organizational Scholarly Record

Editor's Notes

  1. ~100k articles with links > 230k links total
  2. New paradigm for web archiving, found as part of this problem Unexpected, yet most important result/contribution of this effort Lets imagine you need to frequently archive slide decks from SlideShare (we do) Understand that there are boundary and quality problems Bring human (curator) in the loop Navigate to *one* SS presentation Interact with that presentation in an attempt to show what the boundary is, make explicit what needs to be archived Browser extension, listens to browser events, intercepts them and records them in abstract way (not in terms of URLs, addresses in the DOM, Xpath, CSS selectors) Result: trace expresses in abstract way the interactions the curator had with slide deck Abstract b/c same info how to interact with *this* presentation will apply to *all* presentations Record one, share, re-use with headless browser Share in repo, collectively create, curate traces, update with layout of pages