Tdwg 2015-nicolson-kew-mobilisation

•Download as PPTX, PDF•

1 like•691 views

The talk from TDWG 2015 presents a simple model for the mobilization of biodiversity data from a data rich, diverse organisation, based on open source tools compatible with those taught in the Data Carpentry syllabus. The talk presents an open-source toolkit (https://github.com/RBGKew/Reconciliation-and-Matching-Framework ; http://data1.kew.org/reconciliation/) to configure an Open Refine (http://openrefine.org/) compatible reconciliation service over any tabular file or structured database. "Reconciliation" is the process of converting a text string representation of a thing into a usable identifier for that thing, e.g. to convert the text string "Tahina spectabilis" to "http://ipni.org/urn:lsid:ipni.org:names:77086615-1". Although the toolkit was developed first for scientific name reconciliation, it can be configured to reconcile any entity type (people, specimens etc). Micro-components of the tool (for data transformations - https://github.com/RBGKew/String-Transformers) are available as drop-ins in the Open Refine data cleaning package. This approach is an alternative to existing services development, which have largely been aimed at technical users. The guiding principle is to open data services to a wider range of users by lowering the barrier to entry, such that hands-on scientists and data curators - those who know their data best - can link it with external sources. Technical choices were made to fit with approaches taught in the software and data carpentry initiatives (http://datacarpentry.org/). The toolkit aids progress towards Tim Berners-Lee’s Linked Open Data principle #4 "Refer to other things using their HTTP URI-based names when publishing data on the Web" and shows how we can build the foundations of the biodiversity knowledge graph.

Science

A simple model for large-scale data
mobilization across a diverse
organisation
Nicky Nicolson, RBG Kew
@nickynicolson
Biodiversity Information Standards (TDWG) annual meeting
Nairobi, Kenya / 28th September – 1 October 2015

Linked Open Data
Principle #4: "Refer to other things using their
HTTP URI-based names when publishing data on
the Web"

Services philosophy (1)
... Put the services and tools in the hands of
researchers

Services philosophy (2)
... Use general tools, focus on unique problems

Connecting name data to other resources
Schinus longifolius var. paraguariensis
(Hassler) F. Barkley
Taxonomic status of 229196-2?
229196-2
Synonym

We’ve converted a name to an identifier
Schinus longifolius var. paraguariensis
(Hassler) F. Barkley
229196-2
Now we can use that identifier to add in
more data…

Different kinds of research…
http://bit.ly/plant-species-gendergap

Thanks to:
• Biodiversity Informatics team (Abigail Barker,
Matt Blissett, James Crowe, John Iacona, Rob
Turner, Alecs Gueder)
• Plant & fungal name curation team (Christine
Barker / Irina Belyaeva / Katherine Challis /
Rafael Govaerts / Paul Kirk / Heather Lindon /
Emma Williams)
• Data improvement team (Anna Lynch, Rachel
Witherow, Malin Rivers, Esther Wainwright-Deri)

@nickynicolson / n.nicolson@kew.org
http://bit.ly/k-names-service
http://github.com/RBGKew
Biodiversity Information Standards (TDWG) annual meeting
Nairobi, Kenya / 28th September – 1 October 2015

Recently uploaded

Citronella presentation SlideShare mani upadhyayupadhyaymani499

BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2

THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh

(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54

Volatile Oils Pharmacognosy And Phytochemistry -INandakishor Bhaurao Deshmukh

Four Spheres of the Earth Presentation.pptJoemSTuliba

GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1

Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems

Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju

STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B

User Guide: Magellan MX™ Weather StationColumbia Weather Systems

Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju

The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar

Topic 9- General Principles of International Law.pptxJorenAcuavera1

BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9

RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed

User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS

Recently uploaded (20)

Citronella presentation SlideShare mani upadhyay

BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf

THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx

(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)

Volatile Oils Pharmacognosy And Phytochemistry -I

Four Spheres of the Earth Presentation.ppt

GenBio2 - Lesson 1 - Introduction to Genetics.pptx

Environmental Biotechnology Topic:- Microbial Biosensor

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)

Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf

STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx

User Guide: Magellan MX™ Weather Station

Pests of Bengal gram_Identification_Dr.UPR.pdf

The dark energy paradox leads to a new structure of spacetime.pptx

Topic 9- General Principles of International Law.pptx

BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...

RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx

User Guide: Orion™ Weather Station (Columbia Weather Systems)

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...

Featured

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Featured (20)

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Tdwg 2015-nicolson-kew-mobilisation

1. A simple model for large-scale data mobilization across a diverse organisation Nicky Nicolson, RBG Kew @nickynicolson Biodiversity Information Standards (TDWG) annual meeting Nairobi, Kenya / 28th September – 1 October 2015

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20. “Biodiversity Knowledge Graph”

21. Linked Open Data Principle #4: "Refer to other things using their HTTP URI-based names when publishing data on the Web"

22. How?

23. Services philosophy (1) ... Put the services and tools in the hands of researchers

24. Services philosophy (2) ... Use general tools, focus on unique problems

25.

26.

27. http://bit.ly/k-names-service

28. Connecting name data to other resources Schinus longifolius var. paraguariensis (Hassler) F. Barkley Taxonomic status of 229196-2? 229196-2 Synonym

29. IPNI Reconciliation Service

30. IPNI Reconciliation Service

31. IPNI Reconciliation Service

32.

33.

34.

35.

36. We’ve converted a name to an identifier Schinus longifolius var. paraguariensis (Hassler) F. Barkley 229196-2 Now we can use that identifier to add in more data…

37.

38.

39.

40. Connecting name data to other resources Schinus longifolius var. paraguariensis (Hassler) F. Barkley Taxonomic status of 229196-2? 229196-2 Synonym

41.

42. Different kinds of research… http://bit.ly/plant-species-gendergap

43.

44. Thanks to: • Biodiversity Informatics team (Abigail Barker, Matt Blissett, James Crowe, John Iacona, Rob Turner, Alecs Gueder) • Plant & fungal name curation team (Christine Barker / Irina Belyaeva / Katherine Challis / Rafael Govaerts / Paul Kirk / Heather Lindon / Emma Williams) • Data improvement team (Anna Lynch, Rachel Witherow, Malin Rivers, Esther Wainwright-Deri)

45. @nickynicolson / n.nicolson@kew.org http://bit.ly/k-names-service http://github.com/RBGKew Biodiversity Information Standards (TDWG) annual meeting Nairobi, Kenya / 28th September – 1 October 2015

Editor's Notes

This shows the kinds of data elements that Kew has collected and how they interlink to form a “knowledge graph”
Fieldwork
…carried out in a particular geographical region…
… collects physical material…
… accessioned into multiple specialist collections (e.g. herbarium, DNA bank, seed & living collections)
Duplicate specimens are shared with other organisations
Individual researchers, teams, and organisations are represented as agents
One key activity is for researchers to label specimens with determinations
A determination is a link between a specimen and the concept that it represents
Concepts fit into classifications
The core of the concept is the name, which has a special link to a specimen via type citation
Names and classifications are published in scientific literature, accessed via bibliographic citations
Concepts can be mapped to management classifications (for reporting purposes) and to phylogenies
Finally – once we have recognised species, we can assert facts about them – e.g. their physical characteristics, traits, distributions and uses
In summary: elements about the physical specimens
… Elements which use those physical specimens to define and name species
Assertions about species
Here, we show elements which are shared with other scientific and / or academic domains: geographic localities, people/teams/organisations and scholarly literature
If we want this rich graph of data, how do we build it?
Deb’s talk : what’s an API?
A walkthrough of matching up a dataset containing names to some Kew data resources Match the names against IPNI, get an identifier, ask other resources what they know about that identifier. (i.e. names matching isolated into one place)
Reconciliation service configured to run against an IPNIN dataset. Can be configured to expose any tabular dataset or result-set from a relational DB. Data first transformed – using a set of rules defined in configuration – then matched. Transformations handle things like gender agreement: "-us" on one side and "-a" on the other transform to the same form. Transformers can also handle authorship: "F. Barkley" and "F.A.Barkley“ are probably the same.
A user can explore the service using a web interface
This shows the results of the query.
Use open refine for large volumes of data – load into Open Refine, identify the column of scientific names that you want to “reconcile” (send to the service), choose “Reconcile” “Start Reconciling” on the column of name data
Select the IPNI service
The data are sent to the service (via JSON over HTTP), and IPNI ids (with hyperlinks to IPNI) are brought back.
The ID can be extracted and held in its own column.
Choose “Add columns from TPL…”
TPL gives us a list of “properties” it knows names have. I’ve chosen “taxonomic status”, and there’s a preview on the right.
A minute later, and an extra column is added with the status from TPL. Fourth from the top is a synonym, but this real dataset shouldn’t have had any synonyms in.
Users of the service include research & development staff at many institutes – largely without support from Kew (using Open Refine user support material)
Linking the data like this enables us to do different kinds of research

Tdwg 2015-nicolson-kew-mobilisation

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Tdwg 2015-nicolson-kew-mobilisation

Editor's Notes