| www.eudat.eu | B2FIND - User training Version 07, June 2017: B2FIND is EUDAT’s simple, user friendly metadata catalogue allowing users to discover metadata from a wide range of scientific communities.
My name is …… and I work at ….. I’m here today to talk about the usage of B2FIND, or in other words how you can search, find and access data objects and collections using the metadata service B2FIND.
Metadata, such as keywords or names of the data creators, describe data. B2FIND is the metadata service of EUDAT and comprises two components: it provides a powerful and user-friendly discovery service on metadata, covering a wide range of research communities, and It is based on a comprehensive joint metadata catalogue of research data collections which are stored in EUDAT data centres and other repositories
B2FIND interacts with other EUDAT services in different ways.
For example B2FIND harvests metadata from B2SHARE to provide access to data objects stored within the EUDAT CDI B2FIND is also used in inter-service use cases, for example when staging EUDAT data into a high performance computing platform for processing. B2FIND is used in the first workflow step, to identify links to data collections. These references are then used by B2STAGE to transfer the data objects to HPC platforms. B2STAGE then deposits the results of the computation back into EUDAT. Depending on community arrangements, B2SAFE replicates these data into other EUDAT centres and B2HANDLE assigns persistent identifiers to them.
With B2FIND you can browse through the huge amounts of data that EUDAT stores from a broad range of disciplines and Search the whole catalogue, which comprises collections of scientific data, irrespective of their origin, discipline or community. You can carry out filtered, also known as faceted searches in different ways : For example you can search for data that covers a specific geographic region or a given period in time. Or you can search for several textual properties. For example, you can search for all data which are created by a specific person or published by a given institution. B2FIND provides the metadata references to the underlying data sources, which allows you access to scientific data objects. In a few minutes I’ll give a detailed example of how you can use B2FIND.
EUDAT, and thus B2FIND, has a truly cross-community approach and B2FIND is instrumental in this: Metadata are harvested, that is, extracted from a wide range of research fields, that spread From Climate Research to Social Sciences, From Biodiversity to Linguistics, From Archaeology to Seismology, and numerous other research areas. To browse and search these fields, we need a common vocabulary for the whole catalogue. B2FIND transforms and homogenises the diverse metadata to achieve this.
This transformation process is part of the metadata ingestion, and related to the workflow step ‘Metadata mapping and validation’ as shown in the workflow diagram. In order for metadata to be harvested by B2FIND, the owner needs to make it available in a repository. The B2FIND service harvests these metadata and restructures them, so they are in format suitable for faceted search and discovery. The only thing we want to stress here is that there is no change to the content of the metadata during this workflow. The original metadata are only restructured, reformatted and indexed to allow discovery and faceted search. We do not go in more detail here and refer to the related presentation ‘B2FIND Integration’ if you are interested in in-depth information about the individual workflow steps for the ingestion workflow.
B2FIND originally indexed metadata harvested from some EUDAT core communities, for example from ENES and CLARIN, and stored through other EUDAT services, at the moment from B2SHARE. EUDAT extended and is extending the service to other external and reliable data and metadata providers that are interested in publishing their metadata in the international and cross-discipline scope of EUDAT. A snapshot of the list of communities indexed by B2FIND is shown here. The most up to date listing can be found on this B2FIND website (http://b2find.eudat.eu/group/)
[ Note to trainer: this slide and the Upcoming Improvements slide may get out of date quickly. Please ensure that these are up to date by contacting the B2FIND team with a few weeks’ notice: http://eudat.eu/support-request?service=B2FIND ]
More than [470,000] datasets from  sources are uploaded to the metadata catalogue and available in the discovery portal.
The communities listed on the x axis demonstrate the wealth and the variety of the research data available through B2FIND. They cover communities from humanities and social sciences, such as CLARIN and CESSDA, through natural sciences such as ENES and ALEPH and up to aggregators that themselves provide cross-discipline metadata as DataCite. Note that B2SHARE – displayed highlighted in colour in the histogram – is not a community, but a source within the EUDAT CDI whose metadata are indexed regularly by B2FIND. That means that each time a data object is uploaded to B2SHARE the associated metadata are automatically indexed in the B2FIND catalogue.
Note the logarithmic scale here that hides the high variance in indexed artefacts: An example for low number is the community ENES: less than a thousand metadata records are harvested from the data provider, but each of them refers to underlying data collections in the order of terabytes. CLARIN on the other hand contributes metadata from more than a hundred thousand records, though these refer to small data objects in the order of kilo or megabytes.
As mentioned in the previous example, the strength of B2FIND is in the ability to search and browse data sets. The B2FIND interface allows google-like ‘free text’ keyword search, where you can enter words in the text fields and a search over the full body of the original harvested metadata records is executed. Results are displayed in easy to read format and listed in order of relevance to your search.
In addition to the free text search, B2FIND provides ‘faceted’ search through which you can explore the catalogue applying multiple filters: For example there are several ways to search for datasets that have a specific coverage in space or time : Filtering by geo spatial coverage is possible by selecting a bounding box from the world map widget, Filtering by temporal coverage is applied by using the facet ‘Filter by time’, and you can also search for datasets published within a certain period using the facet ‘Publication year’. There are also several textual facets you can search for, for example : Filter out datasets of one of the communities, or Search for a Tag, a keyword the dataset is tagged with, or Find out by whom the dataset is created, or from which Discipline the data originates or search for the data publisher and so on … As you can see here for the facet ‘Creator’ you can use filtering and sorting functionalities, for example filter all names that include ‘Michael’ and order them for instance alphanumerically. (next click) In the resulting list on the right panel you can now click on one dataset title and the corresponding dataset is displayed, … (next click) In the top left corner the geo spatial extent is shown and On the right hand side first the Title and a summary or description are displayed, Then the related associated tags are listed as clickable buttons in an extra segment, Followed by a table of field-value pairs of the remaining textual facets.
Note the references to the data resources provided – here shown with the orange arrows.
[ Note : animated slide !] Here is how you can get access to the underlying data objects through these references given in the metadata: [ click !] Each record has the mandatory field ‘Source’ that links to the data resource itself. But it depends on the policies and access permissions of the data centre where the data object is located, if you can open or download the data directly or if you get led to another metadata view or another landing page. Where available, we also provide the PID and DOI of the object: [next click] here for example the DOI leads to another landing page (while - in this example - the PID contains the same link as the Source field) Finally [click !] a link to the originally harvested metadata is provided.
For the future several enhancements and developments are planned: We will continue with the integration of communities and aggregators, you can see a list of metadata providers that we are looking to integrate with B2FIND soon. The functionality of the portal will be further improved, for instance The implementation of an annotating function is planned and Taxonomies will be used to allow hierarchical searching and filtering. While B2FIND aims to homogenise to a common metadata scheme, we also want to address customisation of the service to specific requirements. Options to implement this, are templates and extendable facets for specific community needs usage of particular vocabularies and ontologies and individually adapted user interfaces Improvement of the quality of metadata is one of the central and most challenging tasks. Here B2FIND can help by Enhancement and further development of the mapping and the validation We also want to improve the feedback mechanism between the communities and the B2FIND developers.
Additional information on B2FIND can be found at https://eudat.eu/services/b2find . Detailed user documentation is also available at http://eudat.eu/services/userdoc/b2find Visit the B2FIND discovery portal at http://b2find.eudat.eu to find data objects and collections.
That brings me to the end of my presentation.
Thank you for your attention !
B2FIND - User training| www.eudat.eu |
Find Research Data
www.eudat.euEUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
How to find data objects and collections
using EUDAT’s B2FIND
This work is licensed under the Creative
Commons CC-BY 4.0 licence
What is B2FIND?
is the metadata service of EUDAT
is based on a comprehensive joint
metadata catalogue of research data
collections stored in EUDAT data
centres and other repositories
provides a powerful and user-
friendly discovery service on
metadata covering a wide range of
This image is licensed under the Creative
Commons CC0 Public Domain
Where is B2FIND in the EUDAT suite?
EUDAT services such
as B2SHARE to
provide access to
data objects within
the EUDAT CDI
is used in inter-
service use cases,
e.g. to identify links
to data collections,
which will be
transferred to HPC
With B2FIND you can...
B2FIND – Find Research Data
Browse through the large amounts
of data that EUDAT stores from a
broad range of disciplines
Search the whole catalogue, which
comprises collections of scientific
data, irrespective of their origin,
discipline or community
Carry out faceted searches for
geospatial or temporal coverage
textual properties, such as
‘Creator’ or ‘Publisher’ and
many other facets
Find and access scientific data
objects relevant for your work
Data from a
great selection of subjects
B2FIND has a truly cross-community
Metadata are harvested from a wide
range of research areas
From Climate Research to Social
From Biodiversity to Linguistics
From Archaeology to Seismology
This necessitates the transformation
and homogenisation of the diverse
metadata to achieve the usage of a
common vocabulary for the whole
This image is licensed under the Creative
Attribution-NoDerivs 2.0 Generic (CC BY-
ND 2.0), taken from
Metadata Ingestion Workflow
MD Mapping and
MD Uploading and
The transformation process is
part of the metadata
ingestion (MD Mapping and
Validation) shown in the
The individual workflow steps
are described in detail in the
related presentation ‘B2FIND
Note: While the original
metadata are restructured,
re-formatted and indexed to
allow discovery and faceted
search there is no change to
B2FIND – Find Research Data
B2FIND initially indexed metadata
harvested from EUDAT core
communities (such as ENES
and CLARIN) and
stored through the B2SHARE
EUDAT extended and is extending
the service to other external and
reliable data and metadata
The list of currently integrated
communities is available at
B2FIND MD Catalogue
• > 470000 records
• 17 communities
• (16 external + B2SHARE)
B2FIND Discovery Portal
Search and browse datasets
Search and browse
all data sets via
Results displayed in
easy to read format
and listed in order of
relevance to your
B2FIND provides ‘faceted’
B2FIND Discovery Portal
Dataset view provides
Display of metadata
Geo spatial extent
Title and abstract
Table of field-value pairs
Links to data resources
Resolved link to
Link to (another landing page of)
the data object
View of originally
harvested metadata record
Address more communities and
Improve functionality of portal, e.g.
Include annotating function
Templates and extendable facets for
specific community needs
Usage of vocabularies and
Individually adapted user interfaces
Improve quality of the metadata by
Enhancement of the mapping and
Continued exchange and feedback
between the communities and the
B2FIND – Find Research Data
More training material
User training presentation
In depth use instructions
Integration presentation: How to publish metadata
Motivation and requirements to integrate with B2FIND
Community decision-makers and facility managers
Hands-on training material
Usage and deployment tutorials
Modules for different kinds of users
For more info: https://eudat.eu/services/b2find
User documentation: http://eudat.eu/services/userdoc/b2find
This work is licensed under the Creative Commons CC-BY 4.0 licence
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures.
Contract No. 654065
Heinrich Widmann, DKRZ Hannes Thiemann, DKRZ