SlideShare a Scribd company logo
1 of 27
DOIs and Supercomputing
DataCite Summer 2013 Meeting
Terry Jones, Sudharshan Vazhkudai, Doug Fuller
Oak Ridge National Laboratory
DataCite Summer 2013 / Washington DC
Why Supercomputers!?
Because Innovation Drives The Economy…
• Over the last 5 years, 38% of the international innovation “R&D 100” awards went
to US National Labs
0
5
10
15
20
25
30
35
40
45
50
2009 2010 2011 2012 2013
• This was done with YOUR tax
money
• Ideas shape the course of
history – John Maynard
Keynes
• The central goal of economic
policy should be to spur higher
productivity through greater
innovation – Joseph
Schumpeter‟s Innovation
Economics
DataCite Summer 2013 / Washington DC
Why Supercomputers!? (part 2)
…And in 2013, Supercomputers Drive Innovation
Computers have changed the way we conduct
experiments. Given enough computer power, we
can perform accurate experiments more
quickly, more cheaply, and often with greater
control.
DataCite Summer 2013 / Washington DC
The New Laboratory:
High-Performance Computing yields breakthroughs
H = -
2
2mi
Ñi
2
i=1
n
å -
eiej
riji¹j
n
å
DataCite Summer 2013 / Washington DC
Big Problems Require Big Solutions
Energy
Healthcare
Competitiveness
OLCF resources are available to
academia and industry through
open, peer-reviewed allocation
mechanisms.
DataCite Summer 2013 / Washington DC
• High Performance Production Computing for the
Office of Science
• Characterized by a large number of projects (over 400) and users (
over 4800)
• Leadership Computing for Open Science
• Characterized by a small number of projects ( about 50) and
users (about 800) with computationally intensive projects
• Linking it together – ESnet
• Investing in the future – R&E Prototypes
ESnet
Titan at ORNL
(#2)
Mira at ANL
(#5)
Hopper at LBNL
(#24)
June 2013
DOE Office of Science HPC User Facilities
DataCite Summer 2013 / Washington DC
DOE Office of Science HPC User Facilities
DataCite Summer 2013 / Washington DC
With Big Computations Comes Big Data
• DOE HPC User Facilities produce enormous volumes of data
• Each User Facility has tertiary (archival) storage, often HPSS
– statistics for one such computer center pictured here
• In addition, each center provides secondary storage
– for example: a 10PB Lustre parallel file system
DataCite Summer 2013 / Washington DC
• Part of a Collaborative DOE Office of
Science program at ORNL and ANL
• Mission: Provide the computational
and data resources required to solve
the most challenging problems.
• Access to the most powerful computer
in the world for open access computing
(Titan)
• Highly competitive user allocation
programs (INCITE, ALCC).
• Projects receive 10x to 100x more
resource than at other generally
available centers.
• OLCF centers partner with users to
enable science & engineering
breakthroughs (Liaisons, Catalysts).
Oak Ridge Leadership Computing Facility (OLCF)
-- A Leading DOE User Facility
DataCite Summer 2013 / Washington DC
We have increased our system capability
by 10,000 times since 2004
• Strong partnerships with supercomputer vendors.
• LCF users employ large portions of the machine for large fractions of time.
• Strong partnerships with our users to scale codes and algorithms.
DataCite Summer 2013 / Washington DC
OLCF Future (Based On Extrapolation)
Jaguar: 2.3 PF
Leadership
system for science
Titan (OLCF-3):
10–20 PF
Leadership system
2009 2012 2016 2019
OLCF-5:
1 EF
OLCF-4:
100–250 PF
• Computer system performance increases through parallelism
– Clock speed trend flat to slower over coming years
– In the last 28 years, systems have scaled from 64 cores to ~300,000
– Applications must utilize all inherent parallelism
• Our compute and data resources have grown 10,000X over the
decade, are in high demand, and are effectively used.
DataCite Summer 2013 / Washington DC
The Data Deluge
2013 4PB disk & 34PB tape [Titan]
2017 64PB disk & 600PB tape [Coral]
2021 1EB disk & 10EB tape (?)
• Key Challenge: Make Sense of So Much Data
• We‟ll Need Better Tools
• If “many hands make light work,” how can we
enable more people to make sense of the data?
DataCite Summer 2013 / Washington DC
What Breakthroughs Are We Missing?
• HPC will remain important to Scientific Discovery
– Important for Climate, Material Science, Energy Security
• Today, the state-of-the-art is (still!) bibliographic
publications
• But The Gains From Bibliographic Sharing Are
Limited
– Constraints in paper length
– Limited Focus of paper
– Limited ability to convey with graphs, figures, tables
• Urgently Needed: A Quick Way To „Enable‟ Data
DataCite Summer 2013 / Washington DC
New External Drivers for Supercomputing Centers
• The push is on to squeeze more results from High-Performance Computing
– Scientists have difficulty in replicating (or even understanding) other‟s results
– Tax payers want more openness
– The Holdren memo
DataCite Summer 2013 / Washington DC
Our Response: Make Supercomputer Produced
Data As Widely Available As Possible
• DOIs provide the necessary mechanism & implementation
• Makes sense for OLCF (uniquely qualified for 100TB datasets)
• Will benefit from DataCite‟s integration with Thomson Reuter‟s data citation index and
other services.
• Already successful for sensor-driven research like NASA
• As research goes forward, the project Principal Investigator stores “appropriate data”
– Presumably, if data can support a bibliographic result (graph, figure, data), the data is worth
curation.
• After curation, the data is available to the entire scientific community
✔ Helps OLCF with „research tracking‟
✔ Helps OLCF with „reporting to sponsors‟
✔ Helps OLCF resolve data disposition questions
✔ All The Traditional Benefits To Researchers
DataCite Summer 2013 / Washington DC
DOI BenefitsDOI Benefits
• Identify & Cite key data products
of interest and value, and
annotate them.
• Safely share data with their
collaborators even before
publishing the result in a
scientific communication.
• Future data analyses can easily
feed off of the data
products, fostering a highly
dynamic, and collaborative
environment.
From User‟s Perspective, DOIs can: From Sponsor‟s Perspective, DOIs can:
• Help with research tracking and identifying the major results coming
out of a project allocation on the center‟s resources.
• Aid in reporting to sponsors.
• Since the DOIs also capture some basic metadata along with the
index, it can help the center to answer questions on the disposition
of the data, search and discover them.
From Center‟s Perspective, DOIs can:
• Added benefit of seeing data sharing flourish within
the community, and more data analyses spawned from
the data products.
• Both users and centers that the sponsor funds now
have rich tools for data management.
• Preserve data products for a longer-term, much beyond the
expiration of their projects at the centers.
• Satisfy requirements from funding agencies on data management
plans in terms of long-term preservation, sharing and dissemination
of research results.
• DOIs enable more value
for the dollar spent. In
addition to software
tools, research
artifacts, and
papers, there is now a
new entity, the citable
data product.
• Better utilization of HPC
center resources.
• Provides a tool the to cull the data holdings. Provide tangible policies to users for long-term data preservation.
• Evolve to support “data-only” users through data science tools such as DOIs.
• Provide an opportunity for our center to distinguish itself from other centers (they have the best data tools)
DataCite Summer 2013 / Washington DC
Workflow for DOI Creation
1. User
creates data
2. User
requests DOI
3. ORNL
requests DOI
4. OSTI
provides DOI
5. DOI stored
at data portal
6. Request
Permanent
Data Copy
7. Data
Migrated to
Archive
8. Archive
success
response
9. DOI
success
response
DataCite Summer 2013 / Washington DC
Workflow for DOI Data Retrieval
1. User
provides
search criteria
4. Request
Data Subset
5. Data
Migrated for
Upload
2. Matches
found via
Metadata
3. User
identifies
needed data
6. User
retrieves data
DataCite Summer 2013 / Washington DC
Some Challenges Are Expected
• How will permanent data storage be funded?
– Projects last 3 years.
• Researchers are affiliated with institutions that have their own data policies.
– For example, the Princeton Plasma Physics Lab may have policies affecting how we can support
it‟s fusion projects.
• Some fields will require effort to make their data “portable” for a wide audience.
– Astrophysics has a standard file format, Fusion does not.
• Developing good metadata is a human intensive effort
– Getting PIs to provide the metadata
– Looking to OSTI & DataCite for some help with DOI Q&A
DataCite Summer 2013 / Washington DC
…More Challenges
• What about Authenticated access to data?
Or malicious users in general...
• What about the long-term QA aspects of
maintaining data?
• What about the logistics of very large data?
– Staging
– Retrieving huge files (can‟t be on disk)
Where’s The
Data?
DataCite Summer 2013 / Washington DC
Current Project Status
• Provided a DOI recommendation for the Center
– Pros and Cons
– Long term implications
• Designed the Workflow
• Created infrastructure to support the workflow
– Frontend infrastructure for storing & DOI association
– Backend infrastructure for search & retrieval
• Having conversations with a few selected HPC user communities
1. Astrophysics
2. Groundwater Simulation
3. Climate
4. Turbulence
5. Fusion
DataCite Summer 2013 / Washington DC
Summary
• High Performance Computing & Data are integral to scientific
discovery
• Bibliographic publications cannot contain the wealth of insight
available in the raw data
• ORNL is leading an effort to make HPC data available to all
with DOIs
• In the future, “Publish” to
a scientist will probably
refer to obtaining a DOI
for a supercomputer
dataset
DataCite Summer 2013 / Washington DC
Acknowledgements
• OLCF DOI Team
– Sudharshan Vazhkudai
– Doug Fuller
– Terry Jones
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak
Ridge National Laboratory, which is supported by the Office of Science of the U.S.
Department of Energy under Contract No. DE-AC05-00OR22725.
• OSTI Support
– Mark Martin
– Jannean Elliott
• ORNL Support
– Jack Wells
– Giri Palanisamy
– John Cobb
– Stan White
DataCite Summer 2013 / Washington DC
Questions?
trj@ornl.gov
DataCite Summer 2013 / Washington DC
Extra Viewgraphs
DataCite Summer 2013 / Washington DC
High-Temperature
Superconductivity Biofluidic Systems Plasma Physics Cosmology
Taking a Quantum Leap
in Time to Solution for
Simulations of High-TC
Superconductors
19 Petaflops
Simulation of Protein
Suspensions in
Crowding Conditions
Radiative Signatures
of the Relativistic
Kelvin-Helmholtz
Instability
HACC: Extreme
Scaling and
Performance Across
Diverse Architectures
Titan Titan Titan Sequoia, Mira, Titan
How Does The OLCF Compare With Other Centers?
Peter Staar
ETH Zurich
Massimo Bernaschi
ICNR-IAC Rome
Michael Bussmann
HZDR - Dresden
Salman Habib
ANL
Four of Six SC13 Gordon Bell Finalists Used Titan
DataCite Summer 2013 / Washington DC
The New Laboratory (continued):
High-Performance Computing is widely applicable

More Related Content

What's hot

5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance Qubole
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
Advanced Research Computing at York
Advanced Research Computing at YorkAdvanced Research Computing at York
Advanced Research Computing at YorkMing Li
 
Data storage in Cloud computing
Data storage in Cloud computingData storage in Cloud computing
Data storage in Cloud computingDong Yuan
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014iedadata
 
Big Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & InnovationBig Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & InnovationPhilip Bourne
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Vivien Bonazzi
 
Big data ppt
Big data pptBig data ppt
Big data pptYash Raj
 
Science20brussels osimo april2013
Science20brussels osimo april2013Science20brussels osimo april2013
Science20brussels osimo april2013osimod
 
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEWUSING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEWNellore Harilakshmi
 
Australia's Environmental Predictive Capability
Australia's Environmental Predictive CapabilityAustralia's Environmental Predictive Capability
Australia's Environmental Predictive CapabilityTERN Australia
 
Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...Varsha Khodiyar
 
2013 bio it world
2013 bio it world2013 bio it world
2013 bio it worldChris Dwan
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data ScienceKenny Daniel
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 

What's hot (20)

5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance
 
Zucca "Technology & Systems"
Zucca "Technology & Systems"Zucca "Technology & Systems"
Zucca "Technology & Systems"
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Oracle openworld-presentation
Oracle openworld-presentationOracle openworld-presentation
Oracle openworld-presentation
 
Advanced Research Computing at York
Advanced Research Computing at YorkAdvanced Research Computing at York
Advanced Research Computing at York
 
Data storage in Cloud computing
Data storage in Cloud computingData storage in Cloud computing
Data storage in Cloud computing
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014
 
Big Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & InnovationBig Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & Innovation
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Science20brussels osimo april2013
Science20brussels osimo april2013Science20brussels osimo april2013
Science20brussels osimo april2013
 
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEWUSING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
 
Australia's Environmental Predictive Capability
Australia's Environmental Predictive CapabilityAustralia's Environmental Predictive Capability
Australia's Environmental Predictive Capability
 
Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...
 
2013 bio it world
2013 bio it world2013 bio it world
2013 bio it world
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 

Similar to 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013University of Washington
 
Revolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational ResearchRevolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational ResearchAmye Kenall
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College LondonSarah Anna Stewart
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
Graham Pryor
Graham PryorGraham Pryor
Graham PryorEduserv
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013SALCTG
 
1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdfAyele40
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorialJosh Young
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?Daniel S. Katz
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsMartin Donnelly
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesJason Hattrick-Simpers
 

Similar to 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory) (20)

Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013
 
Revolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational ResearchRevolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational Research
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
 
1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and Solutions
 
Baker - Evolution of Data Products and Designated Audiences
Baker - Evolution of Data Products and Designated AudiencesBaker - Evolution of Data Products and Designated Audiences
Baker - Evolution of Data Products and Designated Audiences
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop Slides
 

More from datacite

ODIN Final Event - Publishing and citing, and the role of persistent identifiers
ODIN Final Event - Publishing and citing, and the role of persistent identifiersODIN Final Event - Publishing and citing, and the role of persistent identifiers
ODIN Final Event - Publishing and citing, and the role of persistent identifiersdatacite
 
ODIN Final Event - Submission to datacentres
ODIN Final Event - Submission to datacentresODIN Final Event - Submission to datacentres
ODIN Final Event - Submission to datacentresdatacite
 
ODIN Final Event - Supporting the research lifecycle: Discovery and Analysis
ODIN Final Event - Supporting the research lifecycle: Discovery and AnalysisODIN Final Event - Supporting the research lifecycle: Discovery and Analysis
ODIN Final Event - Supporting the research lifecycle: Discovery and Analysisdatacite
 
ODIN Final Event - The Care and Feeding of Scientific Data
ODIN Final Event - The Care and Feeding of Scientific DataODIN Final Event - The Care and Feeding of Scientific Data
ODIN Final Event - The Care and Feeding of Scientific Datadatacite
 
Creating Incentives
Creating IncentivesCreating Incentives
Creating Incentivesdatacite
 
DataCite overview 2014
DataCite overview 2014DataCite overview 2014
DataCite overview 2014datacite
 
2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...
2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...
2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...datacite
 
2013 DataCite Summer Meeting - Data-Planet (Matt Dunie - Data-Planet)
2013 DataCite Summer Meeting - Data-Planet (Matt Dunie - Data-Planet)2013 DataCite Summer Meeting - Data-Planet (Matt Dunie - Data-Planet)
2013 DataCite Summer Meeting - Data-Planet (Matt Dunie - Data-Planet)datacite
 
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)datacite
 
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...datacite
 
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...datacite
 
2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...
2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...
2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...datacite
 
2013 DataCite Summer Meeting - Introducing DataCite working groups: Metadata ...
2013 DataCite Summer Meeting - Introducing DataCite working groups: Metadata ...2013 DataCite Summer Meeting - Introducing DataCite working groups: Metadata ...
2013 DataCite Summer Meeting - Introducing DataCite working groups: Metadata ...datacite
 
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...datacite
 
2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...
2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...
2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...datacite
 
2013 DataCite Summer Meeting - Update on Force 11 and the Amsterdam manifesto...
2013 DataCite Summer Meeting - Update on Force 11 and the Amsterdam manifesto...2013 DataCite Summer Meeting - Update on Force 11 and the Amsterdam manifesto...
2013 DataCite Summer Meeting - Update on Force 11 and the Amsterdam manifesto...datacite
 
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...datacite
 
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...datacite
 
2013 DataCite Summer Meeting - Opening Keynote: A short history of the Higgs ...
2013 DataCite Summer Meeting - Opening Keynote: A short history of the Higgs ...2013 DataCite Summer Meeting - Opening Keynote: A short history of the Higgs ...
2013 DataCite Summer Meeting - Opening Keynote: A short history of the Higgs ...datacite
 
2013 DataCite Summer Meeting - Introduction (Adam Farquhar - DataCite)
2013 DataCite Summer Meeting - Introduction (Adam Farquhar - DataCite)2013 DataCite Summer Meeting - Introduction (Adam Farquhar - DataCite)
2013 DataCite Summer Meeting - Introduction (Adam Farquhar - DataCite)datacite
 

More from datacite (20)

ODIN Final Event - Publishing and citing, and the role of persistent identifiers
ODIN Final Event - Publishing and citing, and the role of persistent identifiersODIN Final Event - Publishing and citing, and the role of persistent identifiers
ODIN Final Event - Publishing and citing, and the role of persistent identifiers
 
ODIN Final Event - Submission to datacentres
ODIN Final Event - Submission to datacentresODIN Final Event - Submission to datacentres
ODIN Final Event - Submission to datacentres
 
ODIN Final Event - Supporting the research lifecycle: Discovery and Analysis
ODIN Final Event - Supporting the research lifecycle: Discovery and AnalysisODIN Final Event - Supporting the research lifecycle: Discovery and Analysis
ODIN Final Event - Supporting the research lifecycle: Discovery and Analysis
 
ODIN Final Event - The Care and Feeding of Scientific Data
ODIN Final Event - The Care and Feeding of Scientific DataODIN Final Event - The Care and Feeding of Scientific Data
ODIN Final Event - The Care and Feeding of Scientific Data
 
Creating Incentives
Creating IncentivesCreating Incentives
Creating Incentives
 
DataCite overview 2014
DataCite overview 2014DataCite overview 2014
DataCite overview 2014
 
2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...
2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...
2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...
 
2013 DataCite Summer Meeting - Data-Planet (Matt Dunie - Data-Planet)
2013 DataCite Summer Meeting - Data-Planet (Matt Dunie - Data-Planet)2013 DataCite Summer Meeting - Data-Planet (Matt Dunie - Data-Planet)
2013 DataCite Summer Meeting - Data-Planet (Matt Dunie - Data-Planet)
 
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
 
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
 
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
 
2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...
2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...
2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...
 
2013 DataCite Summer Meeting - Introducing DataCite working groups: Metadata ...
2013 DataCite Summer Meeting - Introducing DataCite working groups: Metadata ...2013 DataCite Summer Meeting - Introducing DataCite working groups: Metadata ...
2013 DataCite Summer Meeting - Introducing DataCite working groups: Metadata ...
 
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
 
2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...
2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...
2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...
 
2013 DataCite Summer Meeting - Update on Force 11 and the Amsterdam manifesto...
2013 DataCite Summer Meeting - Update on Force 11 and the Amsterdam manifesto...2013 DataCite Summer Meeting - Update on Force 11 and the Amsterdam manifesto...
2013 DataCite Summer Meeting - Update on Force 11 and the Amsterdam manifesto...
 
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
 
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...
 
2013 DataCite Summer Meeting - Opening Keynote: A short history of the Higgs ...
2013 DataCite Summer Meeting - Opening Keynote: A short history of the Higgs ...2013 DataCite Summer Meeting - Opening Keynote: A short history of the Higgs ...
2013 DataCite Summer Meeting - Opening Keynote: A short history of the Higgs ...
 
2013 DataCite Summer Meeting - Introduction (Adam Farquhar - DataCite)
2013 DataCite Summer Meeting - Introduction (Adam Farquhar - DataCite)2013 DataCite Summer Meeting - Introduction (Adam Farquhar - DataCite)
2013 DataCite Summer Meeting - Introduction (Adam Farquhar - DataCite)
 

Recently uploaded

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Recently uploaded (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

  • 1. DOIs and Supercomputing DataCite Summer 2013 Meeting Terry Jones, Sudharshan Vazhkudai, Doug Fuller Oak Ridge National Laboratory
  • 2. DataCite Summer 2013 / Washington DC Why Supercomputers!? Because Innovation Drives The Economy… • Over the last 5 years, 38% of the international innovation “R&D 100” awards went to US National Labs 0 5 10 15 20 25 30 35 40 45 50 2009 2010 2011 2012 2013 • This was done with YOUR tax money • Ideas shape the course of history – John Maynard Keynes • The central goal of economic policy should be to spur higher productivity through greater innovation – Joseph Schumpeter‟s Innovation Economics
  • 3. DataCite Summer 2013 / Washington DC Why Supercomputers!? (part 2) …And in 2013, Supercomputers Drive Innovation Computers have changed the way we conduct experiments. Given enough computer power, we can perform accurate experiments more quickly, more cheaply, and often with greater control.
  • 4. DataCite Summer 2013 / Washington DC The New Laboratory: High-Performance Computing yields breakthroughs H = - 2 2mi Ñi 2 i=1 n å - eiej riji¹j n å
  • 5. DataCite Summer 2013 / Washington DC Big Problems Require Big Solutions Energy Healthcare Competitiveness OLCF resources are available to academia and industry through open, peer-reviewed allocation mechanisms.
  • 6. DataCite Summer 2013 / Washington DC • High Performance Production Computing for the Office of Science • Characterized by a large number of projects (over 400) and users ( over 4800) • Leadership Computing for Open Science • Characterized by a small number of projects ( about 50) and users (about 800) with computationally intensive projects • Linking it together – ESnet • Investing in the future – R&E Prototypes ESnet Titan at ORNL (#2) Mira at ANL (#5) Hopper at LBNL (#24) June 2013 DOE Office of Science HPC User Facilities
  • 7. DataCite Summer 2013 / Washington DC DOE Office of Science HPC User Facilities
  • 8. DataCite Summer 2013 / Washington DC With Big Computations Comes Big Data • DOE HPC User Facilities produce enormous volumes of data • Each User Facility has tertiary (archival) storage, often HPSS – statistics for one such computer center pictured here • In addition, each center provides secondary storage – for example: a 10PB Lustre parallel file system
  • 9. DataCite Summer 2013 / Washington DC • Part of a Collaborative DOE Office of Science program at ORNL and ANL • Mission: Provide the computational and data resources required to solve the most challenging problems. • Access to the most powerful computer in the world for open access computing (Titan) • Highly competitive user allocation programs (INCITE, ALCC). • Projects receive 10x to 100x more resource than at other generally available centers. • OLCF centers partner with users to enable science & engineering breakthroughs (Liaisons, Catalysts). Oak Ridge Leadership Computing Facility (OLCF) -- A Leading DOE User Facility
  • 10. DataCite Summer 2013 / Washington DC We have increased our system capability by 10,000 times since 2004 • Strong partnerships with supercomputer vendors. • LCF users employ large portions of the machine for large fractions of time. • Strong partnerships with our users to scale codes and algorithms.
  • 11. DataCite Summer 2013 / Washington DC OLCF Future (Based On Extrapolation) Jaguar: 2.3 PF Leadership system for science Titan (OLCF-3): 10–20 PF Leadership system 2009 2012 2016 2019 OLCF-5: 1 EF OLCF-4: 100–250 PF • Computer system performance increases through parallelism – Clock speed trend flat to slower over coming years – In the last 28 years, systems have scaled from 64 cores to ~300,000 – Applications must utilize all inherent parallelism • Our compute and data resources have grown 10,000X over the decade, are in high demand, and are effectively used.
  • 12. DataCite Summer 2013 / Washington DC The Data Deluge 2013 4PB disk & 34PB tape [Titan] 2017 64PB disk & 600PB tape [Coral] 2021 1EB disk & 10EB tape (?) • Key Challenge: Make Sense of So Much Data • We‟ll Need Better Tools • If “many hands make light work,” how can we enable more people to make sense of the data?
  • 13. DataCite Summer 2013 / Washington DC What Breakthroughs Are We Missing? • HPC will remain important to Scientific Discovery – Important for Climate, Material Science, Energy Security • Today, the state-of-the-art is (still!) bibliographic publications • But The Gains From Bibliographic Sharing Are Limited – Constraints in paper length – Limited Focus of paper – Limited ability to convey with graphs, figures, tables • Urgently Needed: A Quick Way To „Enable‟ Data
  • 14. DataCite Summer 2013 / Washington DC New External Drivers for Supercomputing Centers • The push is on to squeeze more results from High-Performance Computing – Scientists have difficulty in replicating (or even understanding) other‟s results – Tax payers want more openness – The Holdren memo
  • 15. DataCite Summer 2013 / Washington DC Our Response: Make Supercomputer Produced Data As Widely Available As Possible • DOIs provide the necessary mechanism & implementation • Makes sense for OLCF (uniquely qualified for 100TB datasets) • Will benefit from DataCite‟s integration with Thomson Reuter‟s data citation index and other services. • Already successful for sensor-driven research like NASA • As research goes forward, the project Principal Investigator stores “appropriate data” – Presumably, if data can support a bibliographic result (graph, figure, data), the data is worth curation. • After curation, the data is available to the entire scientific community ✔ Helps OLCF with „research tracking‟ ✔ Helps OLCF with „reporting to sponsors‟ ✔ Helps OLCF resolve data disposition questions ✔ All The Traditional Benefits To Researchers
  • 16. DataCite Summer 2013 / Washington DC DOI BenefitsDOI Benefits • Identify & Cite key data products of interest and value, and annotate them. • Safely share data with their collaborators even before publishing the result in a scientific communication. • Future data analyses can easily feed off of the data products, fostering a highly dynamic, and collaborative environment. From User‟s Perspective, DOIs can: From Sponsor‟s Perspective, DOIs can: • Help with research tracking and identifying the major results coming out of a project allocation on the center‟s resources. • Aid in reporting to sponsors. • Since the DOIs also capture some basic metadata along with the index, it can help the center to answer questions on the disposition of the data, search and discover them. From Center‟s Perspective, DOIs can: • Added benefit of seeing data sharing flourish within the community, and more data analyses spawned from the data products. • Both users and centers that the sponsor funds now have rich tools for data management. • Preserve data products for a longer-term, much beyond the expiration of their projects at the centers. • Satisfy requirements from funding agencies on data management plans in terms of long-term preservation, sharing and dissemination of research results. • DOIs enable more value for the dollar spent. In addition to software tools, research artifacts, and papers, there is now a new entity, the citable data product. • Better utilization of HPC center resources. • Provides a tool the to cull the data holdings. Provide tangible policies to users for long-term data preservation. • Evolve to support “data-only” users through data science tools such as DOIs. • Provide an opportunity for our center to distinguish itself from other centers (they have the best data tools)
  • 17. DataCite Summer 2013 / Washington DC Workflow for DOI Creation 1. User creates data 2. User requests DOI 3. ORNL requests DOI 4. OSTI provides DOI 5. DOI stored at data portal 6. Request Permanent Data Copy 7. Data Migrated to Archive 8. Archive success response 9. DOI success response
  • 18. DataCite Summer 2013 / Washington DC Workflow for DOI Data Retrieval 1. User provides search criteria 4. Request Data Subset 5. Data Migrated for Upload 2. Matches found via Metadata 3. User identifies needed data 6. User retrieves data
  • 19. DataCite Summer 2013 / Washington DC Some Challenges Are Expected • How will permanent data storage be funded? – Projects last 3 years. • Researchers are affiliated with institutions that have their own data policies. – For example, the Princeton Plasma Physics Lab may have policies affecting how we can support it‟s fusion projects. • Some fields will require effort to make their data “portable” for a wide audience. – Astrophysics has a standard file format, Fusion does not. • Developing good metadata is a human intensive effort – Getting PIs to provide the metadata – Looking to OSTI & DataCite for some help with DOI Q&A
  • 20. DataCite Summer 2013 / Washington DC …More Challenges • What about Authenticated access to data? Or malicious users in general... • What about the long-term QA aspects of maintaining data? • What about the logistics of very large data? – Staging – Retrieving huge files (can‟t be on disk) Where’s The Data?
  • 21. DataCite Summer 2013 / Washington DC Current Project Status • Provided a DOI recommendation for the Center – Pros and Cons – Long term implications • Designed the Workflow • Created infrastructure to support the workflow – Frontend infrastructure for storing & DOI association – Backend infrastructure for search & retrieval • Having conversations with a few selected HPC user communities 1. Astrophysics 2. Groundwater Simulation 3. Climate 4. Turbulence 5. Fusion
  • 22. DataCite Summer 2013 / Washington DC Summary • High Performance Computing & Data are integral to scientific discovery • Bibliographic publications cannot contain the wealth of insight available in the raw data • ORNL is leading an effort to make HPC data available to all with DOIs • In the future, “Publish” to a scientist will probably refer to obtaining a DOI for a supercomputer dataset
  • 23. DataCite Summer 2013 / Washington DC Acknowledgements • OLCF DOI Team – Sudharshan Vazhkudai – Doug Fuller – Terry Jones This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. • OSTI Support – Mark Martin – Jannean Elliott • ORNL Support – Jack Wells – Giri Palanisamy – John Cobb – Stan White
  • 24. DataCite Summer 2013 / Washington DC Questions? trj@ornl.gov
  • 25. DataCite Summer 2013 / Washington DC Extra Viewgraphs
  • 26. DataCite Summer 2013 / Washington DC High-Temperature Superconductivity Biofluidic Systems Plasma Physics Cosmology Taking a Quantum Leap in Time to Solution for Simulations of High-TC Superconductors 19 Petaflops Simulation of Protein Suspensions in Crowding Conditions Radiative Signatures of the Relativistic Kelvin-Helmholtz Instability HACC: Extreme Scaling and Performance Across Diverse Architectures Titan Titan Titan Sequoia, Mira, Titan How Does The OLCF Compare With Other Centers? Peter Staar ETH Zurich Massimo Bernaschi ICNR-IAC Rome Michael Bussmann HZDR - Dresden Salman Habib ANL Four of Six SC13 Gordon Bell Finalists Used Titan
  • 27. DataCite Summer 2013 / Washington DC The New Laboratory (continued): High-Performance Computing is widely applicable

Editor's Notes

  1. Scientific breakthroughs change our lives:* Explained photosynthesis. Ever wonder how plants turn sunlight into energy? A National Lab scientist determined the path of carbon through photosynthesis, a scientific milestone that illuminated one of life’s most important processes. Today, this work allows scientists to explore how to derive sustainable energy sources from the sun.*Made refrigerators cool.Next-generation refrigerators will likely put the freeze on harmful chemical coolants in favor of an environmentally friendly alloy, thanks to National Lab scientists.* Brought safe water to millions.Removing arsenic from drinking water is a global priority. A long-lasting particle engineered at a National Lab can now do exactly that, making contaminated water safe to drink. Another technology developed at a National Lab uses ultraviolet light to kill microbes that cause water-borne diseases such as dysentery. This process has reduced child mortality in the developing world.Put the digital in DVDsThe optical digital recording technology behind music, video, and data storage originated at a National Lab nearly 40 years ago.Tamed hydrogen with nanoparticlesTo replace gasoline, hydrogen must be safely stored and easy to use, but this has proved elusive. National Lab researchers have now designed a new pliable material using nanoparticles that can rapidly absorb and release hydrogen without ill effects, a major step in making fuel-cell powered cars a commercial reality
  2. Exabyte comes after PettabyteThen ZettabyteThen Yottabyte
  3. In May, an OMB memo and an Executive Order were released in support of the Holdren memo
  4. Opens the door to other vast communities (as evidenced by the wide-ranging audience at this meeting)
  5. Previously, users did not have a tool to identify what is important to them, which resulted in indiscriminately storing all intermediate snapshot data from scratch storage into archival storage. However, with DOIs, there is now a means to identify datasets of value, which may change this user behavior, resulting in manageable data sizes. This has ramifications to the provisioning of center storage resources.
  6. Tie-in to DataCite attendees; one thing we liked about the DataCite philosophy that will help us is the landing page philosophy will help us (anyone can go to the landing page)Some data could be embargoed (but available to others later)