The EUDAT datacentres store and replicate large amounts of data for the communities. But what about processing these data? And how do these data get into the EUDAT datacentres in the first place?
B2STAGE is a reliable, efficient, light-weight and easy-to-use service to transfer research data sets between EUDAT storage resources and high-performance computing (HPC) workspaces
B2STAGE is one of the services offered by EUDAT, the pan-European Infrastructure. EUDAT offers common data services, supporting multiple research communities as well as individuals, through a geographically distributed resilient network connecting general purpose data centres and community-specific data repositories. EUDAT wants to enable European researchers from any discipline to preserve, find, access, and process data in a trusted environment, as part of a Collaborative Data Infrastructure.
The services offered by EUDAT are community driven as they are designed, built and implemented based on user community requirements. This means that the communities have direct influence on these services and contribute to the development of them. Services are defined not just by researchers in the EUDAT collaboration, but we also elicit requirements from other communities to ensure our services are as generic and applicable to as wide an audience as possible.
The EUDAT service suite represents an integrated set of services to support researchers manage their data through the data lifecycle. As your data moves through the data lifecycle, EUDAT services will help you manage your data using best practices followed by some of the world’s largest communities. The services available cover a wide range of functionalities. B2SAFE enables communities to replicate and safely store their large-scale data on robust, reliable datacentres operated by the EUDAT partners. B2HANDLE registers all data on EUDAT with a unique identifier which can be globally resolved on the standard handle system. B2DROP allows EUDAT users to easily exchange working data, while B2SHARE allows to deposit and disseminate final research data at a smaller scale, but easier than with B2SAFE. B2FIND allows searches on the EUDAT metadata and is one of the key enablers of multi-disciplinary research on EUDAT. B2ACCESS is the simple and secure authorisation and authentication platform of EUDAT, which allows single sign-on on EUDAT’s public and internal service. B2STAGE, the subject of this talk, offers communities an entry-point to ingest and replicate into EUDAT large volumes of data. Data ingested through B2STAGE are registered with a Persistent Identifier using the mechanism adopted by B2SAFE.
EUDAT offers the B2STAGE service, which allows big, research data to move efficiently between storage and computation. The service also takes care of depositing the computation output from the HPC facilities to EUDAT. B2STAGE can also be used to deposit the community data into the EUDAT facilities. B2STAGE uses the established gridFTP protocol to ensure high-speed transfer between the sites. Data transfer is reliable and requires very little user interaction. B2STAGE also assigns PIDs to computational output that the user elects to inject back into the EUDAT datacentres.
B2STAGE was conceived to deal with modern day research challenges. As hardware and research software improve, scope for research is broadening. Communities now pursue large-scale simulations, for example developing models for climate simulation encompassing the whole of the Earth, as opposed to isolated regions. Scientists simulate not only organs in the human body, but also their interactions. Similarly, earthquake data are now collected and processed for areas as large as entire continents. The common requirement of such research challenges is that they generate and process increasing volumes of data, with typical workflows requiring data to be processed in a distributed fashion, so as to cope with the pace of data generation. In order for this to be possible, data need to be transferred in an efficient way to the high-performance or high-throughput computing resources, and this is where B2STAGE comes in.
B2STAGE was developed to address specific user requirements. The fundamental use-case is to allow data already ingested into EUDAT to move to HPC facilities for processing. This is important not only in the case where the community that deposited the data process them as per their original intention, but also in more advanced scenarios of inter-disciplinary research on open data. In this case B2STAGE moves heterogeneous data for processing allowing data combinations that were not previously thought of. B2STAGE also allows users to push the results of the computation safely back into EUDAT, where they may be preserved and/or further replicated according to the community policies. B2STAGE is developed over the gridFTP protocol, which make data transfers reliable and efficient. To ease use, EUDAT has developed the companion Data Staging Script, a client-side tool that facilitates the data transfer commands and handles PIDs for the data resources involved.
The main end-users are EUDAT researchers, who can transfer their data between storage and computation as part of their day-to-day workflows. Using B2STAGE to inject community data into EUDAT is generally a function of Community Managers.
As per slide.
The B2STAGE service is deployed on EUDAT datacentres and many HPC nodes. Access to EUDAT nodes is automatic for all EUDAT registered users, though users would need to arrange access to HPC nodes separately. The user can use clients running on their desktop or on other log-in servers that they have access to. Globus Online provides a GUI and a command-line interface, or the user may prefer to use the native GridFTP command-line interface, or other GridFTP clients like UberFTP. We spoke about the EUDAT Data Staging Script earlier. EUDAT is also working on an HTTP interface. In all cases, the user uses their client of choice to initiate transfers between B2STAGE instances on EUDAT and HPC centres, or between their desktop and nodes that are enabled with B2STAGE.
(This is a continuation from the last sentence of the previous slide). This is better depicted in this figure. The user employs the GridFTP client of their choice, which interacts with B2STAGE instances on the sites involved in the transfer. Underneath the B2STAGE hood is a GridFTP server, enriched with the EUDAT Data Storage Interface component. When data arrive at an EUDAT node to be deposited, the B2SAFE service ensures that a PID is generated by B2HANDLE for each artefact, and this is recorded in the EPIC PID Register. The iRODS Server also handles any replication required for these artefacts, according to the community policies that apply to the user who initiated the transfer. If the user utilises the EUDAT DSS script, then any PIDs generated, and this again depends on the iRODS server configuration and the community agreement, are returned to them.
The situation is similar when the user transfers data into an EUDAT centre.
B2STAGE is used by EUDAT research communities. For example, the Virtual Physiological Human community is ingesting data onto EUDAT using B2STAGE. The community is to deposit approximately 12TB of data into RZG, which will be replicated by B2SAFE into PSNC. B2STAGE will also be instrumental to establishing the forthcoming collaboration with the EGI and PRACE research infrastructures. The aim of these collaborations is to create the framework and foster cross-infrastructure usage. B2STAGE will be the software to bridge between these infrastructures. And the new projects starting from early 2016 as part of the two EUDAT Calls of Collaboration will use B2STAGE to transfer their data into EUDAT.
As per bullets
As per bullet
For more info please visit: http://eudat.eu/services/b2stage. The User documentation can be found at: http://eudat.eu/services/userdoc/b2stage
B2STAGE- how to shift large amounts of data| www.eudat.eu |
Get Data to Computation
How to shift large amounts of data
This work is licensed under the Creative
Commons CC-BY 4.0 licence.
Attribution: EUDAT – www.eudat.eu
a reliable, efficient, light-weight and easy-
to-use service to transfer research data
sets between EUDAT storage resources
and high-performance computing (HPC)
A truly pan-European Infrastructure
EUDAT offers common data services
to both research communities and
individuals through a network of 35
EUDAT wants to enable
European researchers from any
discipline to preserve, find,
access, and process data in a
trusted environment, as part of a
Collaborative Data Infrastructure.
EUDAT services are designed, built and implemented
based on user community requirements.
move large amounts of data between
data stores and high-performance
re-ingest computational results back
deposit large data sets into EUDAT
resources for long-term preservation
Facilitating communities to:
reliable and light-weight
manages permanent PIDs
Why use B2STAGE?
Research challenges are getting larger and
E.g. full-Earth climate simulation, coupled
simulations of multiple organs in the human
body, seismic analyses of earthquakes at
Researcher data and compute demands are rising fast
Efficient transfer of data to high performance computing (HPC)
workspaces is essential especially in distributed computing,
where resources are geographically dispersed
Why use B2STAGE?
Facilitates transfer of large data
collections from EUDAT storage
resources to HPC facilities.
Provides the means to re-ingest computational results back
into the EUDAT infrastructure.
Ingests data sets into EUDAT resources for long-term
Offers reliable, efficient, easy-to-use tools to manage data
The Data Staging Script is the only tool handling data
transfer using PIDs.
Who can use B2STAGE?
Researchers can transfer large data collections from
EUDAT storage resources to HPC facilities for processing.
Community Managers can replicate community data
through a lightweight service and ingest data sets to
EUDAT storage resources for long term preservation.
How can you use B2STAGE?
EUDAT offers B2STAGE to all registered researchers and
interested communities, enabling them to make use of
the service to stage data out of EUDAT, and ingest
computational results back.
Access to remote HPC facilities should be negotiated
and arranged by individual users in parallel.
To help researchers use the B2STAGE service, EUDAT
offers documentation, training material and a service
For more information please email:
How does B2STAGE work?
How does B2STAGE work?
B2STAGE User communities
VPH Community ingesting data onto EUDAT resources
Approximately 12TB will be ingested through this service
VPH data also replicated between RZG and PSNC sites
B2STAGE will foster the collaboration with EGI and PRACE to
develop cross-infrastructure usage:
B2STAGE will be the main service to enable the
interoperability of these infrastructures.
Numerous new communities to adopt it as part of the 2015
and 2016 Calls for Collaboration
data staging functionalities to easily and efficiently
transfer data from EUDAT storage resources to HPC
a powerful mechanism to ingest data onto EUDAT
a script to facilitate the staging, ingest and retrieval of
PID information of transferred data
B2STAGE is unique in handling PIDs for the data
The Data Staging Script will be replaced by a modular
and extensible python library which will furnish the users
with a programmable interface towards most of the
For more info: http://eudat.eu/services/b2stage
User documentation: http://eudat.eu/services/userdoc/b2stage