SlideShare a Scribd company logo
1 of 77
10 years of global biodiversity
databases:
are we there yet?
Tony Rees
Independent data consultant,
Northern Rivers region, New South Wales,
Australia
previously: CSIRO Marine &
Atmospheric Research, Hobart, Tasmania
Global ocean bio-records in OBIS, 2015
The vision: “Biodiversity information on every
desktop” [ / device]…
 A global taxon inventory
 up-to-date species lists,
synonymies, etc. (for all groups)
 Citations, links to primary
literature
 direct access to the primary
taxonomic literature (for all
described taxa), including full text
(preferably…)
 “All” georeferenced records
accessible, for all species
 no need for individuals to
do the data aggregation
 map local / regional / global
records, show details for any
data item
 Indexes of taxon traits
 e.g. to support sort /
filter / group by…
 Predictive mapping / computed
range maps for all taxa
 fill sampling gaps via niche
modelling, to produce
comprehensive global species maps  Plus more (phylogenies,
illustrations, genetics,
descriptions, keys…)
A standardised approach for this talk
 Rationale for each component/
activity (why do we care?)
 Size of the problem (or sub-problem)
 Who is addressing it (and what they are
doing)
 How far have they (“we”) got, and how
much is still to be done…
Some other points to consider:
• open vs. closed access to relevant content (who can access?)
• machine vs. human retrievability ( -> services, not just pages to view)
• degree of consolidation available (saves querying multiple resources)
• web query only, or are the base data available for export/user upload
A global taxon inventory
From presentation by Quentin Wheeler, International Institute for Species Exploration (IISE):
A global taxon inventory – why do we care?
 Useful to know with what organisms we share the planet
 History of life as its own study area, also key to understanding
present life forms and their relationships
 Ensure taxonomic names do not get accidentally re-used
 Construct list once, use many times (no need to re-create
from scratch)
 Reconcile old names / synonyms to current taxon concepts
(important for data integration)
 Provide “taxonomic backbone” underpinning other
biodiversity activities / projects.
How many species? Arthur Chapman asked the experts:
From Chapman’s summary document (2009 edition):
NB “Others” includes non-green
algae, Protista, prokaryotes and
viruses (refer document for details)
“Invertebrates” includes 1m insects,
360k others (incl. 102k arachnids, 85k
molluscs, 47k Crustacea)
“Estimated” total spp. for world is
11.3m i.e. only 17% of estimated
global biodiversity yet named (!)
From Chapman’s summary document (2009 edition):
From Chapman’s summary document (2009 edition):
…add another 200k-300k(?) for known fossil species, maybe
multiply 2x-3x to include synonyms…
…gives upwards of 5m species names to
catalogue/organise (+0.5m genera…)
+ new species descriptions (~20k/year) and higher taxa,
also new combinations (??/year)
5m+
names!
From group estimates to global lists
Likes:
• Comprehensive coverage (“most” zoological names held)
• Includes fossils as well as extant taxa
• Cites original publications for most post-1860 names
• Low latency (c. 6 months to name appearance in index)
• Some tax. hierarchy for all names
• ION ID minted for every name (usefulness varies)
Dislikes:
• Many more names than taxa (spelling + authority variations, synonyms, bad
data) – needs deduplication before use
• Hard to work out which is “correct” name or what names are synonyms, etc.
• Some quirks in citations as given (including author spellings)
• Detailed publication and taxon info is behind paywall
ION – Index to Organism Names
www.organismnames.com
2005: 1.8m names, all ranks (incl. synonyms), from 2.8m documents
2015: 5.2m names, all ranks (incl. synonyms), from 3.5m documents
- Animal names only (+ protists), cf. other resources for plants etc.
Newly published names (all ranks) in ION,
Nov 2015 (total 2.01m)
names from Index Animalium
(1758-1850)
names from Zoological Record
(1864-current)
Catalogue of Life
www.catalogueoflife.org
2005: 530k valid species names + ?? synonyms), from 23 databases
2015: 1.6m valid species names + 1.3m synonyms), from 151 databases
- All taxonomic groups, extant only (a few fossils starting 2015)
Likes:
• Name quality high (all expert-supplied), synonomies explicit, no (few) duplicates
• Internally consistent taxonomic hierarchy, kingdom -> family
• Coverage increasing over time (claims currently 84% of all extant species)
Dislikes:
• Some groups not yet covered (also no fossils)
• Synonymies not always complete (some old names not listed)
• No author, synonym information at ranks above species
• No links to original literature (although these may be traceable via source databases)
• More latency than ION (takes a while for new names to appear)
• No stable IDs for names (cannot use for linking to current edition)
Partial ION listing – search for “Physeter”
Partial ION listing – search for “Physeter”
Note: 1. ION often includes the same name in multiple variants (mix of “good” and
“bad” content), giving over-representation of number of “real” names
2. This is a list of names, not taxa (single taxon can have multiple names, e.g.
valid name plus synonyms – not distinguished in ION).
Equivalent Cat. of Life search result for “Physeter”:
Selected other names/taxon databases of note
PaleoBioDB (fossils) – formerly PaleoDB
• 2005: 60k names, all ranks
• 2015: 320k names, all ranks (incl. synonyms)
– Good coverage of many fossil taxa (most groups)
World Register of Marine Species (WoRMS)
• 2007: first release, 75k valid species + ?? syns
• 2015: 230k valid species + 96k synonyms
– Excellent coverage of marine taxa (almost all
groups), incl. some fossils
Interim Register of Marine and Nonmarine Genera (IRMNG)
– Tony Rees / OBIS project
• 2006: first release, 159k genus names incl. synonyms
(the latter partly known, part not)
• 2015: 488k genus names incl. synonyms (also 1.9m
species names incl. synonyms)
– Comprehensive genus level coverage of all groups,
extant + fossil, not all assigned to family as yet
For other groups (examples, NB completeness varies)
(etc., etc.)
Linking names to the literature
Linking names to the literature – why do we
care?
 Initial publication / description / designated type is “anchor” for
every taxonomic name and concept
 Use to verify “indexing” details (taxon name + author, year) are
correctly represented
 Included text details (title < abstract < full text) can be “mined”
to extract information useful for indexing (or just reading)
 Entry point to wider literature via refs. list, subsequent citations,
etc.
E.g. from this…
to this
(taxon name + author,
year)
(article full
text)
Genus #1 in IRMNG: example “minimal” citation styles
Genus Authority Microcitation
Aa
Aa Baker,
1940
Aa Baker,
1940
Bull. Bishop
Mus., 165, 107
Genus #1 in IRMNG: example “better” (=standard) citation style
Genus Authority Microcitation Full citation
Aa
Aa Baker,
1940
Aa Baker,
1940
Bull. Bishop
Mus., 165, 107
Aa Baker,
1940
Baker, H.B., 1940. Zonitid
snails from Pacific Islands.
Part 2.-Hawaiian genera of
Microcystinae. Bulletin
Bishop Museum Honolulu,
165: 105-201.
ION has a subset of these
(article title, citation only)
Genus #1 in IRMNG: example “best” citation style with online links
Genus Authority Microcitation Full citation Online link
(abstract)
Online link
(full text)
Aa
Aa Baker,
1940
Aa Baker,
1940
Bull. Bishop
Mus., 165, 107
Aa Baker,
1940
Baker, H.B., 1940. Zonitid
snails from Pacific Islands.
Part 2.-Hawaiian genera of
Microcystinae. Bulletin
Bishop Museum Honolulu,
165: 105-201.
Aa Baker,
1940
Baker, H.B., 1940. (etc.) http://...
(or DOI)
Aa Baker,
1940
Baker, H.B., 1940. (etc.) http://...
(or DOI)
ION has a subset of these
(article title, citation only)
BioNames
(R. Page project)
has some of these
Online access to scientific literature – 1
Q.: How many articles in the “scientific literature”?
A.: Guesstimate might be 180m total “scholarly articles”, 120m in all
sciences, 20m in biology over past 250 years
 Google Scholar: ~160m citations (all disciplines)
 Web of Science: 90m items indexed (1900 onwards)
 PubMed: 24m records (mostly 1966 onwards)
 Biological Abstracts: 12m records, 1926 onwards (includes some
non-journal material)
 Ideally would like single master list, unique ID/hyperlink for each work
(article/chapter/book etc.)
 DOI (Digital Object Identifier) system / CrossRef introduced in 2000,
good for newly published work
 currently used for 114m “objects” (incl. some retrospective allocation;
NB not all are scientific literature)
 “Publishers use CrossRef's tools to convert citations from dumb
strings to useful links” (quote from R. Page discussion post)
Online access to scientific literature – 2
 Zoological Record has indexed 3.5m works in zoology 1864-
current (increasing at 70k/year, 1.5k/week), but individual
records are behind paywall
Online access to scientific literature – 3
 Biodiversity Heritage Library (BHL) is scanning older literature (esp.
pre-1923) and placing online
 limited subset indexed by article title, otherwise (all) indexed by journal
and page no. (then has BHL page ID – can link to that)
 search can be initiated by journal title, volume + page (if already
known)
 can also search by taxon scientific name – but some instances will be
missed (BHL OCR [optical character recognition] is less than 100%
reliable)
 this author’s experience looking for initial publication instances of
older names – success in around 1/3 of cases (not too bad), however
requires manual search (time consuming)
 ideally, original description page links should be compiled somewhere
for others to re-use (not currently done on any scale)
BHL sample page: American Journal of Science s4 v15 (1903) p. 312
(original description of Megablattina Sellards, 1903, a cockroach)
BHL sample page: American Journal of Science s4 v15 (1903) p. 312
(original description of Megablattina Sellards, 1903, a cockroach)
Online access to scientific literature – 4
 More recent literature – mix of publisher websites and operations like
JSTOR, often behind paywalls (though abstracts typically not so) – but not
all yet available digitally (BHL also has some post-1922 content)
 Subscription/abstracting services (Zoological Record, Web of Science,
etc.) have better coverage, but are often not open access for viewing or
external linking purposes (although PubMed is)
 Some tools constructed around planned all-encompassing “Bibliography
of Life” project (from Europe, http://biblife.org/), but progress difficult to
gauge as yet (claims 215k references held); another European project:
GRIB (Global References Index to Biodiversity), however development
appears to have stopped…
…
claims 215k references held
In summary: online [open] access available to subsets of article titles
> abstracts > full text in decreasing proportions
 No single comprehensive source of online refs. available at
this time, users must “mix and match” sources as available
 Few direct links in current tax. databases to literature that is
online (some noteworthy exceptions)
 Over 95% of taxonomic literature pre-dates year 2000 starting
point for DOIs
 Most comprehensive indexes are currently commercial
products (behind paywalls), not much traction in “community
/ open access” equivalents as yet.
Taxon traits
Machine-readable sets of taxon traits – why do
we care?
 Powerful tools for automated subsetting / filtering out sets of
interest
 Useful for data quality assurance (e.g. flag suspect data, fix
logical inconsistencies)
 Can form the basis of auto-response “expert systems” / keys
e.g. as already available for specialised groups
 Need for standardised vocabularies/ semantics for indexing
terms, units used, etc.
 Operations like OBIS (Ocean Biogeographic Information
System) want to display only (e.g.) marine + extant taxa,
suppress others
 No “trait bank” systems existed at that time, IRMNG was
created to fill this need: flag taxa as extant/fossil,
marine/nonmarine
 IRMNG data & flags subsequently incorporated into other systems
e.g. WoRMS, ALA, OTOL, EOL, more… – IRMNG flags are ~70%
complete at genus level, 95%+ for species
 EOL (Encyclopedia of Life) is establishing “TraitBank” (2014 on)
to capture similar traits + more
EOL TraitBank most populous content (Oct 2015)
Note, EOL is an aggregator, not an original content generator (relies on
content supplied by third parties)
EOL TraitBank most populous content (Oct 2015)
EOL traits recently
added to Google
search, Nov 2015
Room for further development in this area…
 e.g. TDWG (Taxonomic Databases Working Group) had active
interest in development of “SPM” (Species Profile Model)
around 2007-8, seems a bit quiet since
 character matrices stored in computer-based keys e.g. Lucid,
DELTA, etc. could presumably be leveraged in some cases
 some domains already well covered in standard manner (e.g.
FishBase for 33k fishes, SeaLifeBase for 71k non-fish marine
taxa)
 SeaLifeBase example shown in next slide…
Displayed range
data is from
AquaMaps holdings,
see later in this
presentation
Assembling georeferenced species
data
Assembling georeferenced species data – why
do we care?
 “Where” is as important as “what” in biodiversity studies
 Central repository much easier point of access than
thousands/millions of distributed sources
 See gaps in existing data holdings / state of current data sampling,
digitisation and mobilisation
 Overlay spatial distributions with other layers e.g. country
boundaries, habitats, environmental variables – generate regional
lists, understand controlling factors
 Spot bad data (appearing in unlikely places on the map)
 Use for spatial analysis (geography as computable data).
Distributed data networks
 First data networks in USA, late 1990s – VertNET, HerpNET, ORNIS –
connecting museum data (vertebrate specimen records) in participating
agencies (also in Australia: Australian Virtual Herbarium)
 OBIS (2002 on) and GBIF (2004 on) provide gateways to both specimen
and observation data from multiple agencies wordwide
 OBIS (marine species records only):
 2005: 5.6m records from 38 data sources (40,700 species)
 2015: 44.9m records from 1,916 data sources (147,000 species)
 GBIF (all habitats):
 2005: 45m records from 334 data sources (?? species)
 2015: 577m records from 15,196 data sources (?? species)
 OBIS data flows into GBIF (though with some issues), also into local
networks e.g. ALA (Atlas of Living Australia)
Building OBIS – 2002-5
(trying to make a working system, and provide a good user experience)
“OBIS v2” front page /
spatial search interface, 2005
Current OBIS sample map & data
OBIS records for Physeter macrocephalus (sperm whale) in Australian
region, Oct 2015 (51,756 global records)
GBIF sample map & data
GBIF records for Physeter macrocephalus in Australian region, Oct 2015
(34,436 global records)
ALA (Atlas of Living Australia) presentation of records for Physeter macrocephalus in
Australian region, Oct 2015
ALA (Atlas of Living Australia) presentation of records for Physeter macrocephalus in
Australian region, Oct 2015
How complete are holdings of GBIF, OBIS, etc.?
 From Hill et al., 2012 paper: at least 1 bn – 2 bn specimens in
biological collections worldwide (not all currently digitised)
 Observations probably outnumber specimens by 100x - 1000x
 Gives maybe 500 bn potential records +/- ; GBIF has 0.5 bn to
date (0.1%)…
 Not all records are of equal importance for initial studies of
distributions (much redundancy), maybe OBIS/GBIF have <5% of
most useful records at this time…
 Existing holdings presently heavily skewed towards better
sampled/accessible areas, also regions where digitisation is more
advanced
 True “target numbers” difficult to assess (every individual of every
species, or what?)
Predicted distributions (environmental niche
modelling)
From AquaMaps poster for e-biosphere conference, London, 2009
Predicted distributions (environmental niche
modelling) – why do we care?
 Available georeferenced data are always incomplete, need a
mechanism to intelligently fill in data gaps, produce more
complete biodiversity maps & atlases
 Move from hand drawn maps / non-digital “expert knowledge”
to computable data
 Model potential spread of invasives into new areas (show
suitable habitat)
 Model potential changes in species range in response to
changing climate or other factors
 Facilitate better understanding of broad- (and fine-) scale
factors controlling species distributions.
Niche modelling concept
 Range of methodologies available including MAXENT, GARP, simple niche
models e.g. Relative Environmental Suitability (RES)
 Ready et al., 2010 (incl. Tony Rees) contend that simple methods work as
well as more complex ones:
Source: A. Guisan group web page, Université de Lausanne, Switzerland
http://www.unil.ch/idyst/en/home/menuinst/research-poles/geoinformatics-and-spatial-m/predictive-biogeography/
advancing-the-science-of-eco.html
J. Ready et al., Ecological Modelling, 2010
Global niche modelling/mapping projects
Lifemapper: Kansas University, c. 2003 onwards
• Models terrestrial niches (?only)
• No. of maps unclear (claims >100,000 species with data,
perhaps only a subset with maps)
• Uses GARP modelling (computationally intensive, several
hours per species map?), no expert review
** FW variables: elevation, surface temperature, net primary productivity, soil pH, soil
moisture, soil organic carbon, precipitation, compound topographic index
* Marine variables: bottom depth, water temperature (SST/bottom), salinity, primary
production, sea ice concentration, distance to land
AquaMaps: Kiel Marine Lab (+ co-developers), 2006 onwards
• Models marine niches only (plus some freshwater)
• 22,000 species mapped by Nov 2015 (incl. ~600 FW), mainly
fishes
• Uses RES modelling (6*/8** environmental variables, <2 mins
per species map) plus geographic partitioning and expert review
Lifemapper team at KU (Kansas University, USA)
Lifemapper example map
Lifemapper example map for milk or Spanish snail (Helix lactea, now = Otala lactea)
(yellow dots are data points, red is potential habitat)
Building AquaMaps – 2005
(trying to make a working system, and the models fit the data…)
Building AquaMaps – 2005
(trying to make a working system, and the models fit the data…)
AquaMaps example map for New Zealand sea lion (Phocarctos hookeri)
(without expert review): Data points used
AquaMaps example map for New Zealand sea lion (Phocarctos hookeri)
(without expert review): Computed AquaMap
AquaMaps example map for New Zealand sea lion (Phocarctos hookeri)
(without expert review): All suitable habitat
AquaMaps example map for New Zealand sea lion (Phocarctos hookeri)
(without expert review) : All suitable habitat (detail)
Detail (square size
= 50 km nominal
for global
coverage)
AquaMaps example maps for New Zealand sea lion (Phocarctos hookeri)
(without expert review): Current vs. computed year 2100 range
2010 2100
from http://us.whales.org/species-guide/sperm-whale
From hand drawn map to computable data…
(this example: Physeter macrocephalus)
from http://us.whales.org/species-guide/sperm-whale
Aquamaps map (including probability of
occurrence by 0.5 degree square)
From hand drawn map to computable data…
(this example: Physeter macrocephalus)
Pooling all the range data
AquaMaps summary plot – all marine species overlaid (22,800 as at Nov 2015)
AquaMaps summary plot – all marine species overlaid (22,800 as at Nov 2015)
Click on the map to generate
“what lives here” queries…
AquaMaps spatial query result, November 2015:
AquaMaps spatial query result, November 2015:
Any primary data to back this up? E.g. search in Atlas of
Living Australia…
Yes, there ARE whale sharks on NSW central coast…
Niche modelling is a very current discussion topic: e.g. GBIF/AMNH symposium
4 November 2015 (no proceedings as yet)
So – “are we there yet?”
Component 2005 2015 Status (/5)
Global taxon inventory – all species
names (with synonyms)
25%? 60%+ nnn(n)
All names linked to the literature
(original descriptions), at least
minimally
5%? 10-20%? n(n)
Taxon traits databased, in machine-
addressable form
0 10%+? n(n)
Distribution data (specimens,
observations) in online systems
<1% 5%+? n
Predicted distributions/global
range maps for all taxa
0?
5%+?
(fishes 60%+) n(n)
A report card to date…
 Take home message: progress is definitely being made,
however plenty still to do:
 Complete master names lists, release as open data (also deal
with inflow of new names and taxonomic dynamism)
 Improve online access to tax. literature (plus embedded links
from relevant databases)
 More data into OBIS & GBIF (including datasets not yet
digitised)
 More progress on
predictive mapping
(algorithms, base data,
habitat factors, species
covered).
 Take home message: progress is definitely being made,
however plenty still to do:
 Complete master names lists, release as open data (also deal
with inflow of new names and taxonomic dynamism)
 Improve online access to tax. literature (plus embedded links
from relevant databases)
 More data into OBIS & GBIF (including datasets not yet
digitised)
 More progress on
predictive mapping
(algorithms, base data,
habitat factors, species
covered).
Thank
you!Tony Rees Tony.Rees@marinespecies.org
◦ CSIRO Marine Research applications developer 1998-2014 including CAAB
(Codes for Australian Aquatic Biota), c-squares and Taxamatch
◦ OBIS steering committees (various) / system developer 2002-2005
◦ AquaMaps project co-developer 2004-current
◦ IRMNG developer 2006-current
◦ OBIS Australia Node manager 2006-2014
◦ Global Names Project collaborator 2006-current
◦ WoRMS contributor 2007-current
◦ GBIF & Open Tree of Life collaborator 2010-current
◦ iPlant collaborator 2010-2013
◦ Atlas of Living Australia consultant 2010-2012
◦ Catalogue of Life global team member 2010-2012
◦ GBIF Ebbe Nielsen Prize (for excellence in Biodiversity Informatics) winner 2014.
This talk available at: www.slideshare.net/tony1212/presentations

More Related Content

What's hot

An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...Martin Kalfatovic
 
Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...ICZN
 
Selected innovations in Biodiversity Informatics
Selected innovations inBiodiversity InformaticsSelected innovations inBiodiversity Informatics
Selected innovations in Biodiversity InformaticsTony Rees
 
Fbip specify2015
Fbip specify2015Fbip specify2015
Fbip specify2015wcoetzer
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...ICZN
 
Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data scienceRutger Vos
 
Writing The Encyclopedia Of Life (not EoL.org)
Writing The Encyclopedia Of Life (not EoL.org)Writing The Encyclopedia Of Life (not EoL.org)
Writing The Encyclopedia Of Life (not EoL.org)Vince Smith
 
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013millerjeremya
 
Classification Mr. Binder
Classification Mr. BinderClassification Mr. Binder
Classification Mr. Binderbinderline
 
Botanists and annotations printer friendly
Botanists and annotations   printer friendlyBotanists and annotations   printer friendly
Botanists and annotations printer friendlyWilliam Ulate
 
Computational Acoustic Identification of Bat Species
Computational Acoustic Identification of Bat SpeciesComputational Acoustic Identification of Bat Species
Computational Acoustic Identification of Bat SpeciesJason Miller
 
Eol-Drupal Presentation for DrupalSouth 2008
Eol-Drupal Presentation for DrupalSouth  2008Eol-Drupal Presentation for DrupalSouth  2008
Eol-Drupal Presentation for DrupalSouth 2008Dan Morrison
 
Classification notes for website
Classification notes for websiteClassification notes for website
Classification notes for websiteAmy Stolipher
 
Sherborn: Thompson & Pape - Sherborn’s critical influence in getting informat...
Sherborn: Thompson & Pape - Sherborn’s critical influence in getting informat...Sherborn: Thompson & Pape - Sherborn’s critical influence in getting informat...
Sherborn: Thompson & Pape - Sherborn’s critical influence in getting informat...ICZN
 

What's hot (20)

An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
 
Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...
 
Selected innovations in Biodiversity Informatics
Selected innovations inBiodiversity InformaticsSelected innovations inBiodiversity Informatics
Selected innovations in Biodiversity Informatics
 
Fbip specify2015
Fbip specify2015Fbip specify2015
Fbip specify2015
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
 
Plant names: Obstacles and Solutions to access information about plants
Plant names: Obstacles and Solutions to access information about plantsPlant names: Obstacles and Solutions to access information about plants
Plant names: Obstacles and Solutions to access information about plants
 
Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data science
 
Writing The Encyclopedia Of Life (not EoL.org)
Writing The Encyclopedia Of Life (not EoL.org)Writing The Encyclopedia Of Life (not EoL.org)
Writing The Encyclopedia Of Life (not EoL.org)
 
Remsen Lect04
Remsen Lect04Remsen Lect04
Remsen Lect04
 
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
 
Shorthouse
ShorthouseShorthouse
Shorthouse
 
Unit 17a Scientific classification
Unit 17a Scientific classificationUnit 17a Scientific classification
Unit 17a Scientific classification
 
Classification Mr. Binder
Classification Mr. BinderClassification Mr. Binder
Classification Mr. Binder
 
Unit 17b Domains and kingdoms
Unit 17b  Domains and kingdomsUnit 17b  Domains and kingdoms
Unit 17b Domains and kingdoms
 
Patterson names
Patterson namesPatterson names
Patterson names
 
Botanists and annotations printer friendly
Botanists and annotations   printer friendlyBotanists and annotations   printer friendly
Botanists and annotations printer friendly
 
Computational Acoustic Identification of Bat Species
Computational Acoustic Identification of Bat SpeciesComputational Acoustic Identification of Bat Species
Computational Acoustic Identification of Bat Species
 
Eol-Drupal Presentation for DrupalSouth 2008
Eol-Drupal Presentation for DrupalSouth  2008Eol-Drupal Presentation for DrupalSouth  2008
Eol-Drupal Presentation for DrupalSouth 2008
 
Classification notes for website
Classification notes for websiteClassification notes for website
Classification notes for website
 
Sherborn: Thompson & Pape - Sherborn’s critical influence in getting informat...
Sherborn: Thompson & Pape - Sherborn’s critical influence in getting informat...Sherborn: Thompson & Pape - Sherborn’s critical influence in getting informat...
Sherborn: Thompson & Pape - Sherborn’s critical influence in getting informat...
 

Similar to 10 Years of Global Biodiversity Databases: Are We There Yet

Tony Rees: Towards a Hierarchical Classification of All Life
Tony Rees: Towards a Hierarchical Classification of All LifeTony Rees: Towards a Hierarchical Classification of All Life
Tony Rees: Towards a Hierarchical Classification of All LifeTony Rees
 
Zoo Bank Talk Ms Ccourse09 Compressed Test
Zoo Bank Talk Ms Ccourse09 Compressed TestZoo Bank Talk Ms Ccourse09 Compressed Test
Zoo Bank Talk Ms Ccourse09 Compressed TestICZN
 
A Global Library of Life: The Biodiversity Heritage Library
A Global Library of Life: The Biodiversity Heritage LibraryA Global Library of Life: The Biodiversity Heritage Library
A Global Library of Life: The Biodiversity Heritage LibraryMartin Kalfatovic
 
Eol fellow-march2010
Eol fellow-march2010Eol fellow-march2010
Eol fellow-march2010tgarnett
 
Tony Rees: An All Genera Index
Tony Rees: An All Genera IndexTony Rees: An All Genera Index
Tony Rees: An All Genera IndexTony Rees
 
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...Martin Kalfatovic
 
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2Ellinor Michel
 
The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...
The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...
The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...Martin Kalfatovic
 
IRMNG presentation March 2012
IRMNG presentation March 2012IRMNG presentation March 2012
IRMNG presentation March 2012Tony Rees
 
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...Martin Kalfatovic
 
The Biodiversity Heritage Library
The Biodiversity Heritage LibraryThe Biodiversity Heritage Library
The Biodiversity Heritage LibraryMartin Kalfatovic
 
Biodiversity Heritage Library : Development and Partnerhips
Biodiversity Heritage Library : Development and PartnerhipsBiodiversity Heritage Library : Development and Partnerhips
Biodiversity Heritage Library : Development and PartnerhipsNancy Gwinn
 
Special Libraries Associatin
Special Libraries AssociatinSpecial Libraries Associatin
Special Libraries Associatindrielinger
 
Global Names Architecture - Remsen
Global Names Architecture - RemsenGlobal Names Architecture - Remsen
Global Names Architecture - RemsenDavid Remsen
 
Taxanomic websites IPNI,plant list,tropicos
Taxanomic websites IPNI,plant list,tropicosTaxanomic websites IPNI,plant list,tropicos
Taxanomic websites IPNI,plant list,tropicosHimanshi Chauhan
 
Botanists and annotations: use cases and their relevance for the larger scie...
Botanists and annotations:  use cases and their relevance for the larger scie...Botanists and annotations:  use cases and their relevance for the larger scie...
Botanists and annotations: use cases and their relevance for the larger scie...Trish Rose-Sandler
 
Global Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage LibraryGlobal Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage LibraryMartin Kalfatovic
 

Similar to 10 Years of Global Biodiversity Databases: Are We There Yet (20)

Tony Rees: Towards a Hierarchical Classification of All Life
Tony Rees: Towards a Hierarchical Classification of All LifeTony Rees: Towards a Hierarchical Classification of All Life
Tony Rees: Towards a Hierarchical Classification of All Life
 
Zoo Bank Talk Ms Ccourse09 Compressed Test
Zoo Bank Talk Ms Ccourse09 Compressed TestZoo Bank Talk Ms Ccourse09 Compressed Test
Zoo Bank Talk Ms Ccourse09 Compressed Test
 
A Global Library of Life: The Biodiversity Heritage Library
A Global Library of Life: The Biodiversity Heritage LibraryA Global Library of Life: The Biodiversity Heritage Library
A Global Library of Life: The Biodiversity Heritage Library
 
Eol fellow-march2010
Eol fellow-march2010Eol fellow-march2010
Eol fellow-march2010
 
Tony Rees: An All Genera Index
Tony Rees: An All Genera IndexTony Rees: An All Genera Index
Tony Rees: An All Genera Index
 
Mla May 7
Mla May 7Mla May 7
Mla May 7
 
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
 
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
 
The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...
The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...
The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...
 
IRMNG presentation March 2012
IRMNG presentation March 2012IRMNG presentation March 2012
IRMNG presentation March 2012
 
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
 
The Biodiversity Heritage Library
The Biodiversity Heritage LibraryThe Biodiversity Heritage Library
The Biodiversity Heritage Library
 
Biodiversity Heritage Library : Development and Partnerhips
Biodiversity Heritage Library : Development and PartnerhipsBiodiversity Heritage Library : Development and Partnerhips
Biodiversity Heritage Library : Development and Partnerhips
 
Special Libraries Associatin
Special Libraries AssociatinSpecial Libraries Associatin
Special Libraries Associatin
 
Global Names Architecture - Remsen
Global Names Architecture - RemsenGlobal Names Architecture - Remsen
Global Names Architecture - Remsen
 
Taxanomic websites IPNI,plant list,tropicos
Taxanomic websites IPNI,plant list,tropicosTaxanomic websites IPNI,plant list,tropicos
Taxanomic websites IPNI,plant list,tropicos
 
Bi 2005 20
Bi 2005 20Bi 2005 20
Bi 2005 20
 
Botanists and annotations: use cases and their relevance for the larger scie...
Botanists and annotations:  use cases and their relevance for the larger scie...Botanists and annotations:  use cases and their relevance for the larger scie...
Botanists and annotations: use cases and their relevance for the larger scie...
 
Flora, Revision and Monograph
Flora, Revision and  MonographFlora, Revision and  Monograph
Flora, Revision and Monograph
 
Global Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage LibraryGlobal Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage Library
 

Recently uploaded

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 

Recently uploaded (20)

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 

10 Years of Global Biodiversity Databases: Are We There Yet

  • 1. 10 years of global biodiversity databases: are we there yet? Tony Rees Independent data consultant, Northern Rivers region, New South Wales, Australia previously: CSIRO Marine & Atmospheric Research, Hobart, Tasmania Global ocean bio-records in OBIS, 2015
  • 2. The vision: “Biodiversity information on every desktop” [ / device]…  A global taxon inventory  up-to-date species lists, synonymies, etc. (for all groups)  Citations, links to primary literature  direct access to the primary taxonomic literature (for all described taxa), including full text (preferably…)  “All” georeferenced records accessible, for all species  no need for individuals to do the data aggregation  map local / regional / global records, show details for any data item  Indexes of taxon traits  e.g. to support sort / filter / group by…  Predictive mapping / computed range maps for all taxa  fill sampling gaps via niche modelling, to produce comprehensive global species maps  Plus more (phylogenies, illustrations, genetics, descriptions, keys…)
  • 3. A standardised approach for this talk  Rationale for each component/ activity (why do we care?)  Size of the problem (or sub-problem)  Who is addressing it (and what they are doing)  How far have they (“we”) got, and how much is still to be done… Some other points to consider: • open vs. closed access to relevant content (who can access?) • machine vs. human retrievability ( -> services, not just pages to view) • degree of consolidation available (saves querying multiple resources) • web query only, or are the base data available for export/user upload
  • 4. A global taxon inventory From presentation by Quentin Wheeler, International Institute for Species Exploration (IISE):
  • 5. A global taxon inventory – why do we care?  Useful to know with what organisms we share the planet  History of life as its own study area, also key to understanding present life forms and their relationships  Ensure taxonomic names do not get accidentally re-used  Construct list once, use many times (no need to re-create from scratch)  Reconcile old names / synonyms to current taxon concepts (important for data integration)  Provide “taxonomic backbone” underpinning other biodiversity activities / projects.
  • 6. How many species? Arthur Chapman asked the experts:
  • 7. From Chapman’s summary document (2009 edition): NB “Others” includes non-green algae, Protista, prokaryotes and viruses (refer document for details) “Invertebrates” includes 1m insects, 360k others (incl. 102k arachnids, 85k molluscs, 47k Crustacea) “Estimated” total spp. for world is 11.3m i.e. only 17% of estimated global biodiversity yet named (!)
  • 8. From Chapman’s summary document (2009 edition):
  • 9. From Chapman’s summary document (2009 edition): …add another 200k-300k(?) for known fossil species, maybe multiply 2x-3x to include synonyms… …gives upwards of 5m species names to catalogue/organise (+0.5m genera…) + new species descriptions (~20k/year) and higher taxa, also new combinations (??/year) 5m+ names!
  • 10. From group estimates to global lists
  • 11. Likes: • Comprehensive coverage (“most” zoological names held) • Includes fossils as well as extant taxa • Cites original publications for most post-1860 names • Low latency (c. 6 months to name appearance in index) • Some tax. hierarchy for all names • ION ID minted for every name (usefulness varies) Dislikes: • Many more names than taxa (spelling + authority variations, synonyms, bad data) – needs deduplication before use • Hard to work out which is “correct” name or what names are synonyms, etc. • Some quirks in citations as given (including author spellings) • Detailed publication and taxon info is behind paywall ION – Index to Organism Names www.organismnames.com 2005: 1.8m names, all ranks (incl. synonyms), from 2.8m documents 2015: 5.2m names, all ranks (incl. synonyms), from 3.5m documents - Animal names only (+ protists), cf. other resources for plants etc.
  • 12. Newly published names (all ranks) in ION, Nov 2015 (total 2.01m) names from Index Animalium (1758-1850) names from Zoological Record (1864-current)
  • 13. Catalogue of Life www.catalogueoflife.org 2005: 530k valid species names + ?? synonyms), from 23 databases 2015: 1.6m valid species names + 1.3m synonyms), from 151 databases - All taxonomic groups, extant only (a few fossils starting 2015) Likes: • Name quality high (all expert-supplied), synonomies explicit, no (few) duplicates • Internally consistent taxonomic hierarchy, kingdom -> family • Coverage increasing over time (claims currently 84% of all extant species) Dislikes: • Some groups not yet covered (also no fossils) • Synonymies not always complete (some old names not listed) • No author, synonym information at ranks above species • No links to original literature (although these may be traceable via source databases) • More latency than ION (takes a while for new names to appear) • No stable IDs for names (cannot use for linking to current edition)
  • 14. Partial ION listing – search for “Physeter”
  • 15. Partial ION listing – search for “Physeter” Note: 1. ION often includes the same name in multiple variants (mix of “good” and “bad” content), giving over-representation of number of “real” names 2. This is a list of names, not taxa (single taxon can have multiple names, e.g. valid name plus synonyms – not distinguished in ION).
  • 16. Equivalent Cat. of Life search result for “Physeter”:
  • 17. Selected other names/taxon databases of note PaleoBioDB (fossils) – formerly PaleoDB • 2005: 60k names, all ranks • 2015: 320k names, all ranks (incl. synonyms) – Good coverage of many fossil taxa (most groups) World Register of Marine Species (WoRMS) • 2007: first release, 75k valid species + ?? syns • 2015: 230k valid species + 96k synonyms – Excellent coverage of marine taxa (almost all groups), incl. some fossils Interim Register of Marine and Nonmarine Genera (IRMNG) – Tony Rees / OBIS project • 2006: first release, 159k genus names incl. synonyms (the latter partly known, part not) • 2015: 488k genus names incl. synonyms (also 1.9m species names incl. synonyms) – Comprehensive genus level coverage of all groups, extant + fossil, not all assigned to family as yet
  • 18. For other groups (examples, NB completeness varies) (etc., etc.)
  • 19. Linking names to the literature
  • 20. Linking names to the literature – why do we care?  Initial publication / description / designated type is “anchor” for every taxonomic name and concept  Use to verify “indexing” details (taxon name + author, year) are correctly represented  Included text details (title < abstract < full text) can be “mined” to extract information useful for indexing (or just reading)  Entry point to wider literature via refs. list, subsequent citations, etc.
  • 21. E.g. from this… to this (taxon name + author, year) (article full text)
  • 22. Genus #1 in IRMNG: example “minimal” citation styles Genus Authority Microcitation Aa Aa Baker, 1940 Aa Baker, 1940 Bull. Bishop Mus., 165, 107
  • 23. Genus #1 in IRMNG: example “better” (=standard) citation style Genus Authority Microcitation Full citation Aa Aa Baker, 1940 Aa Baker, 1940 Bull. Bishop Mus., 165, 107 Aa Baker, 1940 Baker, H.B., 1940. Zonitid snails from Pacific Islands. Part 2.-Hawaiian genera of Microcystinae. Bulletin Bishop Museum Honolulu, 165: 105-201. ION has a subset of these (article title, citation only)
  • 24. Genus #1 in IRMNG: example “best” citation style with online links Genus Authority Microcitation Full citation Online link (abstract) Online link (full text) Aa Aa Baker, 1940 Aa Baker, 1940 Bull. Bishop Mus., 165, 107 Aa Baker, 1940 Baker, H.B., 1940. Zonitid snails from Pacific Islands. Part 2.-Hawaiian genera of Microcystinae. Bulletin Bishop Museum Honolulu, 165: 105-201. Aa Baker, 1940 Baker, H.B., 1940. (etc.) http://... (or DOI) Aa Baker, 1940 Baker, H.B., 1940. (etc.) http://... (or DOI) ION has a subset of these (article title, citation only) BioNames (R. Page project) has some of these
  • 25. Online access to scientific literature – 1 Q.: How many articles in the “scientific literature”? A.: Guesstimate might be 180m total “scholarly articles”, 120m in all sciences, 20m in biology over past 250 years  Google Scholar: ~160m citations (all disciplines)  Web of Science: 90m items indexed (1900 onwards)  PubMed: 24m records (mostly 1966 onwards)  Biological Abstracts: 12m records, 1926 onwards (includes some non-journal material)  Ideally would like single master list, unique ID/hyperlink for each work (article/chapter/book etc.)  DOI (Digital Object Identifier) system / CrossRef introduced in 2000, good for newly published work  currently used for 114m “objects” (incl. some retrospective allocation; NB not all are scientific literature)  “Publishers use CrossRef's tools to convert citations from dumb strings to useful links” (quote from R. Page discussion post)
  • 26. Online access to scientific literature – 2  Zoological Record has indexed 3.5m works in zoology 1864- current (increasing at 70k/year, 1.5k/week), but individual records are behind paywall
  • 27. Online access to scientific literature – 3  Biodiversity Heritage Library (BHL) is scanning older literature (esp. pre-1923) and placing online  limited subset indexed by article title, otherwise (all) indexed by journal and page no. (then has BHL page ID – can link to that)  search can be initiated by journal title, volume + page (if already known)  can also search by taxon scientific name – but some instances will be missed (BHL OCR [optical character recognition] is less than 100% reliable)  this author’s experience looking for initial publication instances of older names – success in around 1/3 of cases (not too bad), however requires manual search (time consuming)  ideally, original description page links should be compiled somewhere for others to re-use (not currently done on any scale)
  • 28. BHL sample page: American Journal of Science s4 v15 (1903) p. 312 (original description of Megablattina Sellards, 1903, a cockroach)
  • 29. BHL sample page: American Journal of Science s4 v15 (1903) p. 312 (original description of Megablattina Sellards, 1903, a cockroach)
  • 30. Online access to scientific literature – 4  More recent literature – mix of publisher websites and operations like JSTOR, often behind paywalls (though abstracts typically not so) – but not all yet available digitally (BHL also has some post-1922 content)  Subscription/abstracting services (Zoological Record, Web of Science, etc.) have better coverage, but are often not open access for viewing or external linking purposes (although PubMed is)  Some tools constructed around planned all-encompassing “Bibliography of Life” project (from Europe, http://biblife.org/), but progress difficult to gauge as yet (claims 215k references held); another European project: GRIB (Global References Index to Biodiversity), however development appears to have stopped… … claims 215k references held
  • 31. In summary: online [open] access available to subsets of article titles > abstracts > full text in decreasing proportions  No single comprehensive source of online refs. available at this time, users must “mix and match” sources as available  Few direct links in current tax. databases to literature that is online (some noteworthy exceptions)  Over 95% of taxonomic literature pre-dates year 2000 starting point for DOIs  Most comprehensive indexes are currently commercial products (behind paywalls), not much traction in “community / open access” equivalents as yet.
  • 33. Machine-readable sets of taxon traits – why do we care?  Powerful tools for automated subsetting / filtering out sets of interest  Useful for data quality assurance (e.g. flag suspect data, fix logical inconsistencies)  Can form the basis of auto-response “expert systems” / keys e.g. as already available for specialised groups  Need for standardised vocabularies/ semantics for indexing terms, units used, etc.
  • 34.  Operations like OBIS (Ocean Biogeographic Information System) want to display only (e.g.) marine + extant taxa, suppress others  No “trait bank” systems existed at that time, IRMNG was created to fill this need: flag taxa as extant/fossil, marine/nonmarine  IRMNG data & flags subsequently incorporated into other systems e.g. WoRMS, ALA, OTOL, EOL, more… – IRMNG flags are ~70% complete at genus level, 95%+ for species  EOL (Encyclopedia of Life) is establishing “TraitBank” (2014 on) to capture similar traits + more
  • 35. EOL TraitBank most populous content (Oct 2015) Note, EOL is an aggregator, not an original content generator (relies on content supplied by third parties)
  • 36. EOL TraitBank most populous content (Oct 2015) EOL traits recently added to Google search, Nov 2015
  • 37. Room for further development in this area…  e.g. TDWG (Taxonomic Databases Working Group) had active interest in development of “SPM” (Species Profile Model) around 2007-8, seems a bit quiet since  character matrices stored in computer-based keys e.g. Lucid, DELTA, etc. could presumably be leveraged in some cases  some domains already well covered in standard manner (e.g. FishBase for 33k fishes, SeaLifeBase for 71k non-fish marine taxa)  SeaLifeBase example shown in next slide…
  • 38.
  • 39. Displayed range data is from AquaMaps holdings, see later in this presentation
  • 41. Assembling georeferenced species data – why do we care?  “Where” is as important as “what” in biodiversity studies  Central repository much easier point of access than thousands/millions of distributed sources  See gaps in existing data holdings / state of current data sampling, digitisation and mobilisation  Overlay spatial distributions with other layers e.g. country boundaries, habitats, environmental variables – generate regional lists, understand controlling factors  Spot bad data (appearing in unlikely places on the map)  Use for spatial analysis (geography as computable data).
  • 42. Distributed data networks  First data networks in USA, late 1990s – VertNET, HerpNET, ORNIS – connecting museum data (vertebrate specimen records) in participating agencies (also in Australia: Australian Virtual Herbarium)  OBIS (2002 on) and GBIF (2004 on) provide gateways to both specimen and observation data from multiple agencies wordwide  OBIS (marine species records only):  2005: 5.6m records from 38 data sources (40,700 species)  2015: 44.9m records from 1,916 data sources (147,000 species)  GBIF (all habitats):  2005: 45m records from 334 data sources (?? species)  2015: 577m records from 15,196 data sources (?? species)  OBIS data flows into GBIF (though with some issues), also into local networks e.g. ALA (Atlas of Living Australia)
  • 43. Building OBIS – 2002-5 (trying to make a working system, and provide a good user experience) “OBIS v2” front page / spatial search interface, 2005
  • 44. Current OBIS sample map & data OBIS records for Physeter macrocephalus (sperm whale) in Australian region, Oct 2015 (51,756 global records)
  • 45. GBIF sample map & data GBIF records for Physeter macrocephalus in Australian region, Oct 2015 (34,436 global records)
  • 46. ALA (Atlas of Living Australia) presentation of records for Physeter macrocephalus in Australian region, Oct 2015
  • 47. ALA (Atlas of Living Australia) presentation of records for Physeter macrocephalus in Australian region, Oct 2015
  • 48. How complete are holdings of GBIF, OBIS, etc.?  From Hill et al., 2012 paper: at least 1 bn – 2 bn specimens in biological collections worldwide (not all currently digitised)  Observations probably outnumber specimens by 100x - 1000x  Gives maybe 500 bn potential records +/- ; GBIF has 0.5 bn to date (0.1%)…  Not all records are of equal importance for initial studies of distributions (much redundancy), maybe OBIS/GBIF have <5% of most useful records at this time…  Existing holdings presently heavily skewed towards better sampled/accessible areas, also regions where digitisation is more advanced  True “target numbers” difficult to assess (every individual of every species, or what?)
  • 49. Predicted distributions (environmental niche modelling) From AquaMaps poster for e-biosphere conference, London, 2009
  • 50. Predicted distributions (environmental niche modelling) – why do we care?  Available georeferenced data are always incomplete, need a mechanism to intelligently fill in data gaps, produce more complete biodiversity maps & atlases  Move from hand drawn maps / non-digital “expert knowledge” to computable data  Model potential spread of invasives into new areas (show suitable habitat)  Model potential changes in species range in response to changing climate or other factors  Facilitate better understanding of broad- (and fine-) scale factors controlling species distributions.
  • 51. Niche modelling concept  Range of methodologies available including MAXENT, GARP, simple niche models e.g. Relative Environmental Suitability (RES)  Ready et al., 2010 (incl. Tony Rees) contend that simple methods work as well as more complex ones: Source: A. Guisan group web page, Université de Lausanne, Switzerland http://www.unil.ch/idyst/en/home/menuinst/research-poles/geoinformatics-and-spatial-m/predictive-biogeography/ advancing-the-science-of-eco.html
  • 52. J. Ready et al., Ecological Modelling, 2010
  • 53. Global niche modelling/mapping projects Lifemapper: Kansas University, c. 2003 onwards • Models terrestrial niches (?only) • No. of maps unclear (claims >100,000 species with data, perhaps only a subset with maps) • Uses GARP modelling (computationally intensive, several hours per species map?), no expert review ** FW variables: elevation, surface temperature, net primary productivity, soil pH, soil moisture, soil organic carbon, precipitation, compound topographic index * Marine variables: bottom depth, water temperature (SST/bottom), salinity, primary production, sea ice concentration, distance to land AquaMaps: Kiel Marine Lab (+ co-developers), 2006 onwards • Models marine niches only (plus some freshwater) • 22,000 species mapped by Nov 2015 (incl. ~600 FW), mainly fishes • Uses RES modelling (6*/8** environmental variables, <2 mins per species map) plus geographic partitioning and expert review
  • 54. Lifemapper team at KU (Kansas University, USA)
  • 55. Lifemapper example map Lifemapper example map for milk or Spanish snail (Helix lactea, now = Otala lactea) (yellow dots are data points, red is potential habitat)
  • 56. Building AquaMaps – 2005 (trying to make a working system, and the models fit the data…)
  • 57. Building AquaMaps – 2005 (trying to make a working system, and the models fit the data…)
  • 58. AquaMaps example map for New Zealand sea lion (Phocarctos hookeri) (without expert review): Data points used
  • 59. AquaMaps example map for New Zealand sea lion (Phocarctos hookeri) (without expert review): Computed AquaMap
  • 60. AquaMaps example map for New Zealand sea lion (Phocarctos hookeri) (without expert review): All suitable habitat
  • 61. AquaMaps example map for New Zealand sea lion (Phocarctos hookeri) (without expert review) : All suitable habitat (detail) Detail (square size = 50 km nominal for global coverage)
  • 62. AquaMaps example maps for New Zealand sea lion (Phocarctos hookeri) (without expert review): Current vs. computed year 2100 range 2010 2100
  • 63. from http://us.whales.org/species-guide/sperm-whale From hand drawn map to computable data… (this example: Physeter macrocephalus)
  • 64. from http://us.whales.org/species-guide/sperm-whale Aquamaps map (including probability of occurrence by 0.5 degree square) From hand drawn map to computable data… (this example: Physeter macrocephalus)
  • 65. Pooling all the range data
  • 66. AquaMaps summary plot – all marine species overlaid (22,800 as at Nov 2015)
  • 67. AquaMaps summary plot – all marine species overlaid (22,800 as at Nov 2015) Click on the map to generate “what lives here” queries…
  • 68. AquaMaps spatial query result, November 2015:
  • 69. AquaMaps spatial query result, November 2015:
  • 70. Any primary data to back this up? E.g. search in Atlas of Living Australia…
  • 71. Yes, there ARE whale sharks on NSW central coast…
  • 72. Niche modelling is a very current discussion topic: e.g. GBIF/AMNH symposium 4 November 2015 (no proceedings as yet)
  • 73. So – “are we there yet?”
  • 74. Component 2005 2015 Status (/5) Global taxon inventory – all species names (with synonyms) 25%? 60%+ nnn(n) All names linked to the literature (original descriptions), at least minimally 5%? 10-20%? n(n) Taxon traits databased, in machine- addressable form 0 10%+? n(n) Distribution data (specimens, observations) in online systems <1% 5%+? n Predicted distributions/global range maps for all taxa 0? 5%+? (fishes 60%+) n(n) A report card to date…
  • 75.  Take home message: progress is definitely being made, however plenty still to do:  Complete master names lists, release as open data (also deal with inflow of new names and taxonomic dynamism)  Improve online access to tax. literature (plus embedded links from relevant databases)  More data into OBIS & GBIF (including datasets not yet digitised)  More progress on predictive mapping (algorithms, base data, habitat factors, species covered).
  • 76.  Take home message: progress is definitely being made, however plenty still to do:  Complete master names lists, release as open data (also deal with inflow of new names and taxonomic dynamism)  Improve online access to tax. literature (plus embedded links from relevant databases)  More data into OBIS & GBIF (including datasets not yet digitised)  More progress on predictive mapping (algorithms, base data, habitat factors, species covered).
  • 77. Thank you!Tony Rees Tony.Rees@marinespecies.org ◦ CSIRO Marine Research applications developer 1998-2014 including CAAB (Codes for Australian Aquatic Biota), c-squares and Taxamatch ◦ OBIS steering committees (various) / system developer 2002-2005 ◦ AquaMaps project co-developer 2004-current ◦ IRMNG developer 2006-current ◦ OBIS Australia Node manager 2006-2014 ◦ Global Names Project collaborator 2006-current ◦ WoRMS contributor 2007-current ◦ GBIF & Open Tree of Life collaborator 2010-current ◦ iPlant collaborator 2010-2013 ◦ Atlas of Living Australia consultant 2010-2012 ◦ Catalogue of Life global team member 2010-2012 ◦ GBIF Ebbe Nielsen Prize (for excellence in Biodiversity Informatics) winner 2014. This talk available at: www.slideshare.net/tony1212/presentations