This presentation was provided by David Kuliman of Elsevier, during the NISO event "Content Presentation: Diversity of Formats." The webinar was held on February 10, 2021.
1. David Kuilman, Gina Donato, Dr. Rinke Hoekstra
A content standard for data-platform use cases:
Content Profiles
& linked documents
NISO Diversity of formats
February 10, 2021 11:00am
Working Group initiative to create a NISO standard for the interchange
of academic, research, and professional content, data, and semantics
3. (Early) access
and visibility
Expedite shapes
Lineage
Provenance
Policy / license
Priority of
content and
authorship
Content is data
Content and data
operate seamlessly
Content structure
follows document
entity structure
Rich HTML5 literals
for UI/UX use cases
Role based
processing
Content typology
Granular
Context-based
using process
and purpose
intelligence
Content is
shared
All content can be
leveraged throughout
the platform by all
contributor/consumer
roles using a common
vocabulary
Zero organisational
boundaries
Policies for compliance
Continuous
flow and
hydration
Partial and
complete resources
Extensible types
and enrichments
Optimisation
of formats
Machine
learning
Human
interaction
Agile, extensible
and resilient
Fast services development
Nimble models
Extensible models
Arbitrary content (types)
Service level agreement
Handle exception flows
gracefully and informed
Business requirement: from a content perspective
4. Anatomy of content entity processes on a data platform
Source
Data
Harvesting Normalisation Extraction matching Linking Curation Publishing
… entity driven workflow
Classic document driven workflow…
manuscript Internal format copyedit Mastercopy Product
mappings mappings
5. The Content Profiles & Linked Document standard (CP/LD) is the result of
adopting content platform principles to provide the flexibility, extensibility and
connectivity required on a
data platform for academic, research and professional content
Lets consider a few critical design considerations first…
Pipeline to cyclic
Human-in-the-loop
Merging data entities and content entities on demand
7. Key concept: think human-in-the-loop and machine learning
Sourcing
Harvesting
Normalizing
Extraction
Matching
linking
Publishing
Gold set
Test sets
Human curation within
content centric workflows
Human curation within
Machine Learning
Contributor
Consumer
Continuous improvement
Content operations
Platform operations
Continuous deployment
Model operations
Content
artefacts
Enhanced
Content
artefacts
Human supervised
Content usage metrics
8. The CP/LD standard uses established standards to create the
format framework that supports data platform content
operations without compromise
Linked data and HTML5 unite syntax, structure and semantics
needed on the platform
9. HTML5
JSON-LD +
Structured narrative
Semantic data layer
XHTML dialect
Linked Data
Usage standard and guidelines
Independent of any particular use case
Content Profile standard & Linked Document
XML Schema
RDF Schema
SHACL
XML
Schema
RDF: Discovery
XML: consistency
JSON: messaging
JSON-LD: knowledge infusion
HTML5: representation
Business roles
10. This is a part of text that has a specific style (italic)
This is a paragraph
This paragraph is the abstract of the paper
This paragraph is the title of the paper
This is author Alba Grifoni
This is a citation of another paper
This is a result reported on in this paper
This is a mention of the “COVID-19” concept
This is a mention of the “SARS-CoV2” concept
This states that “SARS-CoV2” reactive “CD4+ T-cells” exist in ~40%-
60% of unexposed individuals, suggesting cross-reactive T-cell
recognition with “common cold”
doi:10.1126/sciimunol.aan5393
“55425663600”
hgraph:id-88f9e4ca-c776-3380-933b-f1218c4ef1fd (COVID-19)
hgraph:id-2ab6cd87-e543-3229-85ff-c862a90f415c (SARS-CoV2)
hgraph:id-88f9e4ca-c776-3380-933b-f1218c4ef1fd (T-CD4+)
hgraph:id-2ab6cd87-e543-3229-85ff-c862a90f415c (SARS-CoV2)
hgraph:id-a28e7725-1919-34f0-a648-45721d8bd6a2 (common cold)
reactive to
reactive to
The anatomy of a Linked
Document
13. Activating the platform: merge topics and create a product view
After merging the topics, the
finished view offers:
• A manuscript becomes an
Document
• the position of an abstract
and a conclusion
• An person has been identified
as author
• The author string has been
identified within the
document.
• The author has entity
attributes
• The document assembly is a
scientific article of type
‘Finished’ because it satisfies
the above criteria
merge
Article Author
Author
attributes
Abstract
Author
String
Conclusion
Outside document
Inside document
HTML5 vocabulary
JSON-LD predicates
Relationships legend
A finished article
14. Key takeaways
• Content is data; treat it as data not as documents
• Normalization is great divider from files to entities, items and assertions
• Entity-designed data and Author-designed data become blended
• Machine learner and researcher forge alliance
On standards & formats…
• RDF and XML schema technology (remain) backbone for information
modelling
• JSON, JSON-LD and HTML5 serialisations dominant for content standards
Working Group initiative to create a NISO standard for the interchange
of academic, research, and professional content, data, and semantics
Further information:
Editor's Notes
XML DTD 5.6 (OPS), XOCS… Common Index Profile (CIP) -> structure & metadata
NLP: CM2, FPE, Leadmine, MedScan, Termite (SciBite) …
Linking: Parity, FPE, …