Ensuring Technical Readiness For Copilot in Microsoft 365
LKG Editor Dev
1. Library Knowledge Graph
Editor Development
Simeon Warner (Cornell)
https://orcid.org/0000-0002-7970-7855
Reporting work from the LD4P2 project including contributions from: Steven
Folsom, Huda Khan, Lynette Rayle, Jason Kovari, Tim Worrall (Cornell), Astrid
Usong (Stanford), David Eichmann (Iowa), and others…
US2TS 2019, March 11-13, Duke University, Durham, NC
3. Library Cataloging Background
Many practices developed in the era of card catalogs
MARC format developed in 1960's
Long history of linking entities, albeit with authorized
names rather than identifiers. Used for limited forms of
semantic browse
LD4 work and broader community moving from
MARC→RDF, from authorized names to URIs, and
toward better linking with the web
Henriette Avram 1919–2006,
American computer programmer
and systems analyst who
developed MARC
https://en.wikipedia.org/wiki/Henrie
tte_Avram
4. Production Scale
Cornell catalog has ~9M records
(~8M physical, ~1M electronic)
Cataloging staff must keep up with
new acquisitions. RSI is a real
Rarely start from scratch: base on
vendor supplied, community records
or record for similar resource
Specialists covering many
languages
Library Technical Services space in
OIin Library, Cornell University
5. MARC → RDF
Past work on ontology development but current
focus around BIBFRAME model from Library of
Congress (LC), still evolving
Conversions ~100 triples from each MARC record
Cornell: 9M records → ~1 billion triples (cf. WorldCat
scale: 440M bib records, 2.7G holdings)
Community will still rely on centralized services, but
opens possibility for other models too, and ad-hoc
links
Key entity types in BIBFRAME
6. Shapes
cf. Khan, Folsom, et al.,
poster at US2TS 2018
Want re-use and hence
interested in shared
shapes. Mechanics may
be mix of SHACL, ShEx,
schema
Currently no decoupling of
validation from forms, a
controlled environment
https://drive.google.com/file/d/1M_xhnG8qYL7M9akvIRSETfOgeSEfS9oh/view
7. Linking Our Data - Focus on Lookups
Build UI and infrastructure around discovery of related entities. We know:
➔ Evolving community norms: appetite for a variety of linked datasets and
associated lookup services; how to link each well and efficiently; sensitivity to
inclusive descriptions
➔ Complexity in how to search (recall/precision -- relevancy tests)
➔ Need context -- labels and types are nowhere near sufficient, what else to
display to enable human verification/selection?
➔ Multiple sources for same entity type (e.g. person in LC NAF, ISNI, ORCID)
➔ If available, hubs likely most efficient
➔ Largely untackled: maintenance and updates (traditional authorities have
strong policies and practices which have benefit but can be stifling)
8. Lookup Usability Experiments
● Building on VitroLib designs and results
○ Context generally useful and navigation to authoritative sources
important
● Current LD4P2 usability work around Sinopia editor development
○ 6 participants across different institutions
○ Prototype based on LC BIBFRAME Editor (BFE)
○ Contextual information for persons and genre forms
○ Links to Wikipedia, ISNI, VIAF where available
○ Additional mockups
Slides from SWIB18 presentation; Folsom, Khan, et al.
9. A cataloger has a copy of a film
"Nowhere Boy" by "Sam Taylor", a
British director
10.
11.
12. A cataloger is trying to add genre to a
record, is "humorous" fiction the right term?
13. Lookup Usability: Preliminary Results
● Contextual information useful
○ Should also include related works, more identifying info
○ Identify source of information
● External sources such as university profiles, genre or type-specific
sites (e.g. Discogs)
● Vocabularies such as MESH, AAT, Getty (depending on content)
● Links to Wikidata, ISNI, VIAF are useful to include
● Need consistent interface experience, use clearer icons
● Improve hierarchical navigation for subject areas/genre forms
14. Work Cycle I Data Flow Diagrams and Prototypes October 2018
Thanks to Astrid Usong, Stanford
15. Discogs -- External Source Data as Lookup
Recall - rarely start from scratch
Cataloging old 45's at Cornell
Exploring use of Discogs to generate
base record directly integrated with
the catalog editor tool
17. Community Scale Experiments & Challenges
➔ 15 organizations in LD4P2 cohort + project partners
➔ Test editor and lookup infrastructure in a number of cataloging projects
Caching needed because (most) authority sources don't provide sufficient and
stable infrastructure for lookups (also associated validation, cleaning,
transformation for non-LD sources)
Static vs dynamic
➔ caching for static but need live query if one expects catalogers to create new
entities in "real time" and then be able see them
➔ e.g. Wikidata - try against SPARQL API
18. Discovery Experiments
Primary purpose of library knowledge graph is to enable discovery of library
resources -- the benefits of linked data are so far unproven
➔ Parallels with ideas for lookups and linking
➔ Indexing -- already do some light inferencing from MARC into Solr (e.g.
broader terms, alternates). What other data inclusion or inference is useful?
➔ Individual libraries too small to develop search systems. Considerable effort
around a Solr/Ruby system called Blacklight where UI interactions
studied/improved together. What is broadly reusable?
➔ Most linked data UIs are awful! What good examples we might learn from?
LD4 Discovery Affinity Group having open biweekly calls