SlideShare a Scribd company logo
1 of 87
Download to read offline
Foundations
for the future
of science
Ian Foster, Rachana Ananthakrishnan, Kyle Chard, Vas Vasiliadis
GlobusWorld - May 12, 2021
Serial Synchrotron Crystallography of SARS-CoV-2 proteins
The COVID’19 data pipeline:
HPC, ML, people developing machine readable datasets for small molecule libraries
CHEMICAL
LIBRARY DATABASE
AND MORE
known
molecules
4B
COMPUTING
RESOURCES
CANONICALIZATION COMPUTE FEATURES DEEP LEARNING
FILTERING
FINGERPRINTING SIMILARITY SEARCH
GENERATE IMAGES CNN FILTERING
Yadu Babuji, Ben Blaiszik, Kyle Chard, Ryan Chard, Ian Foster, Logan Ward, Tom Brettin et al
A National Pandemic
Observatory
Ingest
Annotate,Assemble,
Align,Interpolate,
Normalize
Introspect
Correct,Calibrate
Characterize,Detect
Anomalies
FAIR Data
Commons
Intelligent Edge
AdaptiveSampling &EdgeComputing
Data Sources
Data Product
Create,Publish
Catalog,Version,
ShareDOI
Experiment
Engine
Active
Data
Path
Continuous
Reanalysis
Active
Learning
Simulation
Scientists, Public,
DecisionMakers
8
The next frontier? “AI for science”
“Most of the modeling and prediction necessary to
produce the next generation of breakthroughs in
science, energy, medicine, and national security will
come not from applying traditional theory, but from
employing data-driven methods at extreme scale
tightly coupled to experiments and scientific user
facilities.”
— US Department of Energy
FY 2021 Congressional Budget Justification
Why am I excited about “AI for science”?
Push
• Step changes in AI/ML
methods, notably deep
neural networks
• Major advances in areas like
machine translation, speech
recognition, image
processing
• New hardware specialized
for deep neural networks
Pull
• Exploding volumes of data due
to new sensors and
instrumentation exceed
human capabilities
• End of Moore’s Law puts hard
problems out of reach
• Growing complexity of science
and engineering problems
slowing rate of discovery
Why are we excited about “AI for science”?
Push
• Step changes in AI/ML
methods, notably deep neural
networks
• Major advances in areas like
machine translation, speech
recognition, image processing
• New hardware specialized for
deep neural networks
Pull
• Exploding volumes of data due
to new sensors and
instrumentation exceed
human capabilities
• End of Moore’s Law puts hard
problems out of reach
• Growing complexity of science
and engineering problems
slowing rate of discovery
AI Enabled
Experimental Workflows
(how to make it)
…materials, polymers, organisms…
…self-driving labs, synthesis search…
• data Sets
• literature
• science “news”
• strategy
Cleaned
Updated
Annotated
Aggregated
Interpreted
AI Enabled
Scientific Comprehension
(what it means)
AI-Enabled
Design Workflows
(what to make)
Insight?
AI Science Applications: One per Planet
Augmented
Simulations
Design Control
Science and Math
Comprehension
Generative
Models
Inverse
Problems
Multimodal
Learning
Decision-
Making
Materials
Biology
Chemistry
Devices
Batteries
Drugs
Waveforms
Text
Images
Structured
Graphs
Time-
series
Image2Phase
Spectra
2
Structures
Waveform
2
Source
Detector
Simulations
Cosmology
Biodesign
Experiments
Accelerators
Reactors
Mobility
Simulation
Energy
Landscape
Search
Surrogates
Optimize
Mathematics
Physics
Biochemistry
Risk
Assessment
Research
Priorities
The
Next
Problem
AI for Science: AI Building Blocks (examples)
Protein
engineering
Liquid-handling
robot
SAXS, SA-
XPCS:
8-ID-I Beamline
Digital twin +
AI components
Robotic
pendant drop
Screen ~108
conditions for
LLPS
Screen ~104 combos for
LLPS (turbidity, confocal
microscope imaging)
Screen ~102 combos at
various temperatures
Selected matrixes
(e.g., salt, pH, PEG)
Stock proteins
(different periods,
repeats)
X-ray
Info transfer and control,
demonstrated
Information transfer and control,
not yet demonstrated
Material transfer, not yet
demonstrated
Change sample Measure sample
HPC simulation
Compute ~105 properties
ALCF
APCF APS
Arvind Ramanathan et al.
Example: Rational design of intrinsically disordered
polypeptides
AI for science means rethinking infrastructure
15
Infrastructure for AI-enabled Science
Scientific instruments
Major user facilities
Laboratories
Automated labs
…
Sensors
Environmental
Laboratories
Mobile
…
Simulation codes
Computational results
Function memoization
…
Databases
Reference data
Experimental data
Computed properties
Scientific literature
…
Scientists, engineers
Expert input
Goal setting
…
Industry, academia
New methods
Open source codes
AI accelerators
…
Data
ingest
Inference
HPO
Data
enhancement
Data
QA/QC
Feature
selection
Model
training
UQ
Model
reduction
Active/
reinforcement
learning
Artificial Intelligence
Methods
Data
Models
Accelerators
Compute
Agile
Infrastructure
Surrogates
System Software
Data
mgmt
Operating
system
Portability
Compilers
Runtime
system
Workflow
Automation
Prog.
envs.
Languages
Model
creation
Libraries
Resource
mgmt
Authen/Access
Diverse
impacts
across the
globe
16
Understanding SARS-CoV-2 Protein Structure
17
“These data services have taken the time
to solve a structure from weeks to days
and now to hours”
Darren Sherrell, SBC beamline scientist
APS Sector 19
Data Management at Cyro-EM Facilities
18
Case Western Reserve – Cryo-EM Core
Credit: https://case.edu/medicine/research/som-core-
facilities/cryo-electron-microscopy-core
Credit: https://pncc.labworks.org/about-us
Pacific Northwest Cryo-EM Processing Center
(PNNL and Oregon Health Sciences University)
Globus for
– automated data
sync as new data is
collected
– provisioning of data
access for
researchers
– reliable, secure data
access for users
– Monitoring and
management via
console
The Bioinformatics Core of the
Lineberger Comprehensive Cancer Center
at the University of North Carolina
Global data distribution at bioinformatics core
– Multiple research projects use Globus for data sharing with external
collaborators
– Support wide variety of projects: different locations, sources, sizes,
cancer types, institution type, storage systems, and identities
Digital agriculture – University of Winnipeg
• Increasing crop yields using
machine learning models
• Building training data sets
– 40K images per day, tagged
with metadata
– Move data from diverse
sources to campus storage,
then onto Compute Canada
HPC to run models
• Orchestrate data transfer
using Globus CLI
20
Credit: Dilbarjot and Michael Beck,
Physics and Applied Computer Science , University of Winnipeg
Dark Energy Science Collaboration
• Preparation for the arrival
of the Rubin Observatory
• Data Challenge 2:
extreme-scale simulation
of 300 sq degree patch of
the sky over five years
– 5 TB of data
– ~90M core house at ALCF
and NERSC
• Data Portal based on
Globus makes data
accessible to collaborators
21
Federated Research Data Repository
• National Research Data
Management platform, where
data can be
– Ingested, curated, and preserved
– Discovered, cited, and shared
• Globus Services
– Authentication
– Transfer to repository service
– Search for metadata catalog for
data discovery (includes metadata
from 70 other repositories)
22
Rebuilding A Kidney
GPCR
GUDMAP
Synapse
FaceBase
● DERIVA is an asset management
platform for science used in various
biomedical data repositories
● Globus Auth for authentication with
external identities
● Globus groups for roles (e.g., curator,
viewer, administrator)
● Globus Auth for desktop GUI and CLI
DERIVA
24
Data Provider Models / Functions
API layer
API layer
Data Publishers Model Publishers
Consumers
Science!
Increasing Data Interoperability & Reusability
From foundry import Foundry
f = Foundry()
X,y = f.load(“dataset1”, v=“1.0”)
y_pred = f.run(“model1”, v=“1.0”, X)
f.data.publish(“./”
“dataset1”, v=“1.1”)
f.model.publish(“./”
“model1”, v=“1.1”)
• Models run locally or on distributed endpoints
• Capabilities to pull datasets to desired location
or move compute to desired location
Dataset Function
CH MaD
• Radically reduce the energy barrier to access curated
ML datasets and ML models
• Facilitate reuse, meta-studies, benchmarking, and more
• Long term implications for education
NSF CSSI Started Oct. 2019
(Dane Morgan, Paul Voyles, Michael Ferris, Marcus Schwarting, Ben Blaiszik)
National cyberinfrastructure adoption
25
Enabled by the Globus data platform
Researcher initiates
transfer request; or
requested automatically
by script, science
gateway
1
Instrument
Compute Facility
Globus transfers files
reliably, securely
2
Globus controls
access to shared
files on existing
storage; no need
to move files to
cloud storage!
4
Researcher
selects files to
share, selects
user or group,
and sets access
permissions
3
Collaborator logs in to
Globus and accesses
shared files; no local
account required;
download via Globus
5
Automating research
workflows and
ensuring those that
need access to the
data have it.
8
Personal Computer
Transfer
Share
• Use a Web browser or
platform services
• Access any storage
• Use an existing identity
Build
The Globus
Command Line
Interface, API sets,
and Python SDK
provide a platform…
6
… for building
science gateways,
portals and
publication services.
7
Enabling
the next
wave
27
Globus Search
Globus Search
• Scalable, secure search for
research data
• Features:
– Metadata store with fine-
grained visibility controls
– Schema agnostic
– Free text and faceted search
– Integrated with Globus
research platform (Auth,
Groups)
28
Input form Extract
Metadata
Ingest metadata, set
visibility polices
Discovery
POST /index/123
{ "filters": [
{ "field_name": "record_year",
"values": ["2020"],"type":
"match_all” },
{ "field_name": "temp_farenheit",
"values": [{"from": 90, "to":
"*"}]"type": "range" ] }
Query
Bulk ingest
Example: Cosmology
29
Globus Search
• Documentation: docs.globus.org/api/search
• SDK: globus-sdk-python.readthedocs.io
• CLI: pypi.org/project/globus-search-cli
• Sample code and walkthrough:
docs.globus.org/api/search/guides/searchable_files
30
31
Globus automation services
Managed, secure and reliable task
orchestration
across heterogenous resources,
with declarative language for composition,
and extensible to plugin custom actions,
supporting an event driven execution model,
for automation at scale
Create and deploy flows
32
• Define the flow and
deploy to Flows service
• Uses declarative
language (JSON or
YAML)
• Set policy: visibility,
runnable by
Action 1 Action 2 Action 3 Action 4
Action 1
Action 2
Choice
Action 4 Action 5
Action 3
Start and manage runs
33
• An instance of Flow
execution
– Provide input parameter
– Check status
– Cancel
• Set policy: monitor,
manager
• Triggers to start flows
Build action providers
34
• Action Provider is a
service endpoint
– Run
– Status
– Cancel
– Release
– Resume
• Action Provider Toolkit
action-provider-
tools.readthedocs.io/en/latest
Search
Transfer
Notification
ACLs Identifier
Delete
Ingest
User
Form
Describe Xtract
funcX Web
Form
Custom built
Globus Provided
Automation services ecosystem
GET /provider_url/
POST /provider_url/run
GET /provider_url/action_id/status
GET /provider_url/action_id/cancel
GET /provider_url/action_id/status
Create Action
Providers
Define and
deploy flows
{ “StartAt”: ”ToProject”,
”States” : {
”ToProject” : { … },
”SetPermission” : { …},
“ProcessData” : { … } … }}
Run flows
Example: CFDE
36
Data
Coordinating
Centers
User Data Portal
Deposit
metadata
Index for
discovery
Powered by Globus
Auth, Groups & Flows
Common Fund
Data Ecosystem
Example: High-Performance Ptychography Workflows
Funding Sources: ASCR, BES
Automation services
• Documentation:
docs.globus.org/globus-automation-services
• CLI: globus-automate-client.readthedocs.io
• Python SDK: globus-automate-client.readthedocs.io
• Sample flows visible to all users
38
(Re)laying the foundation: GCSv5
39
Globus Connect Server v5
• Feature parity with v4
• Custom DNS names (e.g. data.university.edu)
• Multi-factor authentication policy
• Enhanced sharing policy
• Containerized deployment
40
Fire-and forget transfers
Data sharing with collaborators
Partnership with the community
to develop new connectors
Community Connector Program
Easy egress and ingress of data
Data sharing with collaborators
Publish data
POSIX Staging Connector
• For POSIX file system that cache from tertiary storage
• Custom plug-in for staging files
• Example:
– IBM Spectrum Scale plugin, Brock Palen at University of
Michigan - github.com/brockpalen/ltfsee-globus
44
Current connector landscape
Globus Groups
• Groups platform in production
• Administrators can add users, in addition to invite
• Membership policies simplified
groups.api.globus.org/redoc
46
Transfer and Sharing
• Skip files with not found errors
– List of skipped files once task is completed
• Fail tasks with quota errors
• Scheduled and replicated transfers
– Manage scheduled/repeated transfer and sync tasks
– pypi.org/project/globus-timer-cli
47
Leveraging the Globus data platform…
48
APS XPCS: secure data discovery
49
Globus Auth
Globus Groups
Globus Search
APS XPCS: data access & preview
50
Globus Transfer
HTTPS access
APS XPCS: automated processing & indexing
51
Globus Flows:
Transfer,
analysis, and
ingest to
search index
52
The
(product)
road
ahead
Globus Connect
• Tools to migrate from v4 to v5
– Migration in phases (Q2 – Q3)
– Goal: not require end user intervention
• IPv6 support
• Connectors
– Azure Blob
– Intel DAOS
53
IAM and Data platform
• Support use cases that need higher task throughput
• Enhancements to data permissions management
• Improvements to consent management
• Integration with NIH Researcher Auth Service
• Search service for high assurance tier
• Leverage Search for Globus resources
54
Automation platform
• Lower the barrier for adoption
– Web interfaces
– Supporting tools/libraries
– Action Providers for all Globus functionality
• Exemplar flows for common use cases
– Instrument data management
– Data publication
• Supported in high assurance tier
55
Clients
• Streamline SDK/CLI to across services
• Web App
– Updated management console
– Accessibility standards
• Enhancements to sample portal
– Open source, for customization and deployment
– Flask, Django
56
57
Looking
to the
future
Building the compute foundation for Globus
58
Requirements for reliable, scalable, remote
computing
Researcher needs to
run a computation on
a remote PC, cloud,
supercomputer
1. Compute
Compute Facility
Collaborator
wants to run their
colleague’s
computation on
another system
closer to their data
3. Share
Instrument
5. Build
Gateway and application
developers want to add remote
computation to their code
2. Specialize
Researcher
needs to move it
to a new system
or architecture to
improve
performance
4. Community
Access
Collaborators
want to share
access to a
single allocation
to run compute
tasks
Function as a Service (FaaS)
Developers work in terms of
programming functions
1. Pick a runtime (e.g.,
Python)
2. Register function code
3. Run (and scale)
Low latency, on-demand,
elastic scaling, easy to
deploy and update
60
def compute(input_args):
# do something
return results
funcX: managed and federated FaaS
• Cloud-hosted service for managing compute
• Register and share compute endpoints
• Register and share Python functions
• Reliably, scalable, securely execute functions on
remote endpoints
• Integrated with Globus Auth and data ecosystem
61
Try funcx on Binder
https://funcx.org
Transform laptops, clusters, clouds into function
serving endpoints
• Python-based agent and pip
installable locally or in Conda
• Elastically provisions resources
from local, cluster, or cloud system
• Manages concurrent execution on
provisioned resources
• Optionally manages execution in
Docker, Singularity, Shifter
containers
• Share endpoints with collaborators
62
$ pip install funcx-endpoint
$ funcx-endpoint configure myep
$ funcx-endpoint start myep
Register and share functions
Create funcX client (and authn)
63
def compute(input_args):
# do something
return results
def compute(input_args):
# do something
return results
def compute(input_args):
# do something
return results
Define and register Python function
Execute tasks on any accessible endpoint
Select: function ID, endpoint ID, and
input arguments
Retrieve results asynchronously
(funcX stores results in the cloud)
64
F(ep1,1)
F(ep1, 2)
F(ep1, 3)
F(ep1, 4)
F(ep1, 5)
F(ep1, 6)
F(ep2, 7)
https://funcx.org
https://mybinder.org/v2/gh/funcx-faas/examples/HEAD
Canonical research automation flow for instruments
69
Data Capture Data Analysis /
Model in the Loop
Publication
Data Staging
Metadata Extraction
And Data Cataloging
Data Staging
Catalog
Feedback
Data
Generation
Examples
• Serial X-Ray
Crystallography
• X-Ray Photon
Correlated
Spectroscopy
• High energy
diffraction
microscopy
• High throughput
ptychography
• High energy x-ray
diffractions
Applying the Globus
platform to science at
the APS
70
Advanced
Photon
Source
Key: funcX agent
Globus Connect
Theta
Bebop
Cluster
Argonne
Leadership
Computing
Facility
Laboratory
Computing
Research Center
Petrel store
APS
Computing
Orthros Cluster
APS DM system
Portal
server
Portal
server
Cooley
Action 1 Action 2 Action 3 Action 4
Example: Rapid Training of Deep Neural Networks
using Remote Resources
• DNN at the edge for fast
processing, filtering, QC
• Requires tight coupling
with simulation and
training with real-time data
• Globus Flow:
71
Data Source HPC/DCAI Edge(Host)
Globus,
Automate
Commands
Status
Data Model
User
Request
Status
Commands
Status
C/S
Zhengchun Liu, Jana Thayar, et al.
– Globus to rapidly move data for training
– funcX for simulation and model training
– Globus to move models to the edge
– (Future) funcX for inference at the edge
Making
this
possible
72
Our Mission
Increase the efficiency and
effectiveness of researchers
engaged in data-driven
science and scholarship
through sustainable software
74
Active endpoints in
over 70 countries
Adoption among R1 Institutions: 126 of 130 use Globus
So, how are we doing?
75
Adoption among U.S. national laboratories
So, how are we doing?
76
Notables…
77
BIG Movers
66.3 PB
2 Share
1,593
💛 Frequent Movers
887,000
Thank you, funders...
U . S . D E P A R T M E N T O F
ENERGY
Thank you to our Platinum sponsor!
Thank you Gold sponsors!
Thank you Gold sponsors!
Thank you Gold sponsors!
Thank you Gold sponsors!
Thank you Gold sponsors!
Thank you Gold sponsors!
Thank you Patron sponsors!
A word from our Platinum Sponsor
Jordan Winkelman, Field Solutions CTO
89
Join us in Gather.Town
• Get answers at the Globus Genius Bar
• Visit the Sponsor Showcase
• Joint the scavenger hunt in The Garden
• Play a game
bit.ly/globustown
(passcode: globus)
90
#globusworld @globus

More Related Content

What's hot

Globus and Dataverse: Towards big Data Publication
Globus and Dataverse: Towards big Data PublicationGlobus and Dataverse: Towards big Data Publication
Globus and Dataverse: Towards big Data PublicationGlobus
 
GlobusWorld 2021 Tutorial: Building with the Globus Platform
GlobusWorld 2021 Tutorial: Building with the Globus PlatformGlobusWorld 2021 Tutorial: Building with the Globus Platform
GlobusWorld 2021 Tutorial: Building with the Globus PlatformGlobus
 
Study on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemStudy on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemijitjournal
 
Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)Globus
 
Scalable and adaptive data replica placement for geo distributed cloud storages
Scalable and adaptive data replica placement for geo distributed cloud storagesScalable and adaptive data replica placement for geo distributed cloud storages
Scalable and adaptive data replica placement for geo distributed cloud storagesVenkat Projects
 
Datum in action jisc final event 23032012 v1 1 linked
Datum in action jisc final event 23032012 v1 1 linkedDatum in action jisc final event 23032012 v1 1 linked
Datum in action jisc final event 23032012 v1 1 linkedJeremy Ellman
 
Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus
 
Data Sharing via Globus in the NIH Intramural Program
Data Sharing via Globus in the NIH Intramural ProgramData Sharing via Globus in the NIH Intramural Program
Data Sharing via Globus in the NIH Intramural ProgramGlobus
 
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End UsersFrom Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End UsersDenodo
 
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...Globus
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hivesrikanthhadoop
 
Group6finals 200405101728
Group6finals 200405101728Group6finals 200405101728
Group6finals 200405101728lantao020818
 
Less talking, more doing: Crowd-sourcing the integration of Galaxy with a hig...
Less talking, more doing: Crowd-sourcing the integration of Galaxy with a hig...Less talking, more doing: Crowd-sourcing the integration of Galaxy with a hig...
Less talking, more doing: Crowd-sourcing the integration of Galaxy with a hig...Michael R. Crusoe
 

What's hot (19)

Globus and Dataverse: Towards big Data Publication
Globus and Dataverse: Towards big Data PublicationGlobus and Dataverse: Towards big Data Publication
Globus and Dataverse: Towards big Data Publication
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
GlobusWorld 2021 Tutorial: Building with the Globus Platform
GlobusWorld 2021 Tutorial: Building with the Globus PlatformGlobusWorld 2021 Tutorial: Building with the Globus Platform
GlobusWorld 2021 Tutorial: Building with the Globus Platform
 
Study on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemStudy on potential capabilities of a nodb system
Study on potential capabilities of a nodb system
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)
 
The WSTIERIA Project – A Web of Services
The  WSTIERIA Project – A Web of ServicesThe  WSTIERIA Project – A Web of Services
The WSTIERIA Project – A Web of Services
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Scalable and adaptive data replica placement for geo distributed cloud storages
Scalable and adaptive data replica placement for geo distributed cloud storagesScalable and adaptive data replica placement for geo distributed cloud storages
Scalable and adaptive data replica placement for geo distributed cloud storages
 
Datum in action jisc final event 23032012 v1 1 linked
Datum in action jisc final event 23032012 v1 1 linkedDatum in action jisc final event 23032012 v1 1 linked
Datum in action jisc final event 23032012 v1 1 linked
 
1771 1775
1771 17751771 1775
1771 1775
 
Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)
 
Data Sharing via Globus in the NIH Intramural Program
Data Sharing via Globus in the NIH Intramural ProgramData Sharing via Globus in the NIH Intramural Program
Data Sharing via Globus in the NIH Intramural Program
 
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End UsersFrom Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
 
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hive
 
Group6finals 200405101728
Group6finals 200405101728Group6finals 200405101728
Group6finals 200405101728
 
Group 6 (finals)
Group 6 (finals)Group 6 (finals)
Group 6 (finals)
 
Less talking, more doing: Crowd-sourcing the integration of Galaxy with a hig...
Less talking, more doing: Crowd-sourcing the integration of Galaxy with a hig...Less talking, more doing: Crowd-sourcing the integration of Galaxy with a hig...
Less talking, more doing: Crowd-sourcing the integration of Galaxy with a hig...
 

Similar to Foundations for the Future of Science

Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Spark Summit
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridIan Foster
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformLaura Clarke
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...Larry Smarr
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordKerstin Lehnert
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudAmazon Web Services
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Robert Grossman
 
Semantic Web in Physical Science
Semantic Web in Physical ScienceSemantic Web in Physical Science
Semantic Web in Physical Sciencepetermurrayrust
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNADaniel S. Katz
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceIan Foster
 

Similar to Foundations for the Future of Science (20)

Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And Grid
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination Platform
 
Collins seattle-2014-final
Collins seattle-2014-finalCollins seattle-2014-final
Collins seattle-2014-final
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic Record
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)
 
Semantic Web in Physical Science
Semantic Web in Physical ScienceSemantic Web in Physical Science
Semantic Web in Physical Science
 
Big Data
Big Data Big Data
Big Data
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNA
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
 

More from Globus

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration TopicsGlobus
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowGlobus
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaSGlobus
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesGlobus
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusGlobus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for ResearchersGlobus
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with GlobusGlobus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System AdministratorsGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersGlobus
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersGlobus
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Globus
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeGlobus
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New UsersGlobus
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsGlobus
 
Globus Automation
Globus AutomationGlobus Automation
Globus AutomationGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 

More from Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 

Recently uploaded

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Foundations for the Future of Science

  • 1. Foundations for the future of science Ian Foster, Rachana Ananthakrishnan, Kyle Chard, Vas Vasiliadis GlobusWorld - May 12, 2021
  • 2.
  • 3.
  • 4. Serial Synchrotron Crystallography of SARS-CoV-2 proteins
  • 5. The COVID’19 data pipeline: HPC, ML, people developing machine readable datasets for small molecule libraries CHEMICAL LIBRARY DATABASE AND MORE known molecules 4B COMPUTING RESOURCES CANONICALIZATION COMPUTE FEATURES DEEP LEARNING FILTERING FINGERPRINTING SIMILARITY SEARCH GENERATE IMAGES CNN FILTERING Yadu Babuji, Ben Blaiszik, Kyle Chard, Ryan Chard, Ian Foster, Logan Ward, Tom Brettin et al
  • 6.
  • 8. Ingest Annotate,Assemble, Align,Interpolate, Normalize Introspect Correct,Calibrate Characterize,Detect Anomalies FAIR Data Commons Intelligent Edge AdaptiveSampling &EdgeComputing Data Sources Data Product Create,Publish Catalog,Version, ShareDOI Experiment Engine Active Data Path Continuous Reanalysis Active Learning Simulation Scientists, Public, DecisionMakers 8
  • 9. The next frontier? “AI for science” “Most of the modeling and prediction necessary to produce the next generation of breakthroughs in science, energy, medicine, and national security will come not from applying traditional theory, but from employing data-driven methods at extreme scale tightly coupled to experiments and scientific user facilities.” — US Department of Energy FY 2021 Congressional Budget Justification
  • 10. Why am I excited about “AI for science”? Push • Step changes in AI/ML methods, notably deep neural networks • Major advances in areas like machine translation, speech recognition, image processing • New hardware specialized for deep neural networks Pull • Exploding volumes of data due to new sensors and instrumentation exceed human capabilities • End of Moore’s Law puts hard problems out of reach • Growing complexity of science and engineering problems slowing rate of discovery Why are we excited about “AI for science”? Push • Step changes in AI/ML methods, notably deep neural networks • Major advances in areas like machine translation, speech recognition, image processing • New hardware specialized for deep neural networks Pull • Exploding volumes of data due to new sensors and instrumentation exceed human capabilities • End of Moore’s Law puts hard problems out of reach • Growing complexity of science and engineering problems slowing rate of discovery
  • 11. AI Enabled Experimental Workflows (how to make it) …materials, polymers, organisms… …self-driving labs, synthesis search… • data Sets • literature • science “news” • strategy Cleaned Updated Annotated Aggregated Interpreted AI Enabled Scientific Comprehension (what it means) AI-Enabled Design Workflows (what to make) Insight? AI Science Applications: One per Planet
  • 12. Augmented Simulations Design Control Science and Math Comprehension Generative Models Inverse Problems Multimodal Learning Decision- Making Materials Biology Chemistry Devices Batteries Drugs Waveforms Text Images Structured Graphs Time- series Image2Phase Spectra 2 Structures Waveform 2 Source Detector Simulations Cosmology Biodesign Experiments Accelerators Reactors Mobility Simulation Energy Landscape Search Surrogates Optimize Mathematics Physics Biochemistry Risk Assessment Research Priorities The Next Problem AI for Science: AI Building Blocks (examples)
  • 13.
  • 14. Protein engineering Liquid-handling robot SAXS, SA- XPCS: 8-ID-I Beamline Digital twin + AI components Robotic pendant drop Screen ~108 conditions for LLPS Screen ~104 combos for LLPS (turbidity, confocal microscope imaging) Screen ~102 combos at various temperatures Selected matrixes (e.g., salt, pH, PEG) Stock proteins (different periods, repeats) X-ray Info transfer and control, demonstrated Information transfer and control, not yet demonstrated Material transfer, not yet demonstrated Change sample Measure sample HPC simulation Compute ~105 properties ALCF APCF APS Arvind Ramanathan et al. Example: Rational design of intrinsically disordered polypeptides
  • 15. AI for science means rethinking infrastructure 15 Infrastructure for AI-enabled Science Scientific instruments Major user facilities Laboratories Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memoization … Databases Reference data Experimental data Computed properties Scientific literature … Scientists, engineers Expert input Goal setting … Industry, academia New methods Open source codes AI accelerators … Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Artificial Intelligence Methods Data Models Accelerators Compute Agile Infrastructure Surrogates System Software Data mgmt Operating system Portability Compilers Runtime system Workflow Automation Prog. envs. Languages Model creation Libraries Resource mgmt Authen/Access
  • 17. Understanding SARS-CoV-2 Protein Structure 17 “These data services have taken the time to solve a structure from weeks to days and now to hours” Darren Sherrell, SBC beamline scientist APS Sector 19
  • 18. Data Management at Cyro-EM Facilities 18 Case Western Reserve – Cryo-EM Core Credit: https://case.edu/medicine/research/som-core- facilities/cryo-electron-microscopy-core Credit: https://pncc.labworks.org/about-us Pacific Northwest Cryo-EM Processing Center (PNNL and Oregon Health Sciences University) Globus for – automated data sync as new data is collected – provisioning of data access for researchers – reliable, secure data access for users – Monitoring and management via console
  • 19. The Bioinformatics Core of the Lineberger Comprehensive Cancer Center at the University of North Carolina Global data distribution at bioinformatics core – Multiple research projects use Globus for data sharing with external collaborators – Support wide variety of projects: different locations, sources, sizes, cancer types, institution type, storage systems, and identities
  • 20. Digital agriculture – University of Winnipeg • Increasing crop yields using machine learning models • Building training data sets – 40K images per day, tagged with metadata – Move data from diverse sources to campus storage, then onto Compute Canada HPC to run models • Orchestrate data transfer using Globus CLI 20 Credit: Dilbarjot and Michael Beck, Physics and Applied Computer Science , University of Winnipeg
  • 21. Dark Energy Science Collaboration • Preparation for the arrival of the Rubin Observatory • Data Challenge 2: extreme-scale simulation of 300 sq degree patch of the sky over five years – 5 TB of data – ~90M core house at ALCF and NERSC • Data Portal based on Globus makes data accessible to collaborators 21
  • 22. Federated Research Data Repository • National Research Data Management platform, where data can be – Ingested, curated, and preserved – Discovered, cited, and shared • Globus Services – Authentication – Transfer to repository service – Search for metadata catalog for data discovery (includes metadata from 70 other repositories) 22
  • 23. Rebuilding A Kidney GPCR GUDMAP Synapse FaceBase ● DERIVA is an asset management platform for science used in various biomedical data repositories ● Globus Auth for authentication with external identities ● Globus groups for roles (e.g., curator, viewer, administrator) ● Globus Auth for desktop GUI and CLI DERIVA
  • 24. 24 Data Provider Models / Functions API layer API layer Data Publishers Model Publishers Consumers Science! Increasing Data Interoperability & Reusability From foundry import Foundry f = Foundry() X,y = f.load(“dataset1”, v=“1.0”) y_pred = f.run(“model1”, v=“1.0”, X) f.data.publish(“./” “dataset1”, v=“1.1”) f.model.publish(“./” “model1”, v=“1.1”) • Models run locally or on distributed endpoints • Capabilities to pull datasets to desired location or move compute to desired location Dataset Function CH MaD • Radically reduce the energy barrier to access curated ML datasets and ML models • Facilitate reuse, meta-studies, benchmarking, and more • Long term implications for education NSF CSSI Started Oct. 2019 (Dane Morgan, Paul Voyles, Michael Ferris, Marcus Schwarting, Ben Blaiszik)
  • 26. Enabled by the Globus data platform Researcher initiates transfer request; or requested automatically by script, science gateway 1 Instrument Compute Facility Globus transfers files reliably, securely 2 Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4 Researcher selects files to share, selects user or group, and sets access permissions 3 Collaborator logs in to Globus and accesses shared files; no local account required; download via Globus 5 Automating research workflows and ensuring those that need access to the data have it. 8 Personal Computer Transfer Share • Use a Web browser or platform services • Access any storage • Use an existing identity Build The Globus Command Line Interface, API sets, and Python SDK provide a platform… 6 … for building science gateways, portals and publication services. 7
  • 28. Globus Search Globus Search • Scalable, secure search for research data • Features: – Metadata store with fine- grained visibility controls – Schema agnostic – Free text and faceted search – Integrated with Globus research platform (Auth, Groups) 28 Input form Extract Metadata Ingest metadata, set visibility polices Discovery POST /index/123 { "filters": [ { "field_name": "record_year", "values": ["2020"],"type": "match_all” }, { "field_name": "temp_farenheit", "values": [{"from": 90, "to": "*"}]"type": "range" ] } Query Bulk ingest
  • 30. Globus Search • Documentation: docs.globus.org/api/search • SDK: globus-sdk-python.readthedocs.io • CLI: pypi.org/project/globus-search-cli • Sample code and walkthrough: docs.globus.org/api/search/guides/searchable_files 30
  • 31. 31 Globus automation services Managed, secure and reliable task orchestration across heterogenous resources, with declarative language for composition, and extensible to plugin custom actions, supporting an event driven execution model, for automation at scale
  • 32. Create and deploy flows 32 • Define the flow and deploy to Flows service • Uses declarative language (JSON or YAML) • Set policy: visibility, runnable by Action 1 Action 2 Action 3 Action 4 Action 1 Action 2 Choice Action 4 Action 5 Action 3
  • 33. Start and manage runs 33 • An instance of Flow execution – Provide input parameter – Check status – Cancel • Set policy: monitor, manager • Triggers to start flows
  • 34. Build action providers 34 • Action Provider is a service endpoint – Run – Status – Cancel – Release – Resume • Action Provider Toolkit action-provider- tools.readthedocs.io/en/latest Search Transfer Notification ACLs Identifier Delete Ingest User Form Describe Xtract funcX Web Form Custom built Globus Provided
  • 35. Automation services ecosystem GET /provider_url/ POST /provider_url/run GET /provider_url/action_id/status GET /provider_url/action_id/cancel GET /provider_url/action_id/status Create Action Providers Define and deploy flows { “StartAt”: ”ToProject”, ”States” : { ”ToProject” : { … }, ”SetPermission” : { …}, “ProcessData” : { … } … }} Run flows
  • 36. Example: CFDE 36 Data Coordinating Centers User Data Portal Deposit metadata Index for discovery Powered by Globus Auth, Groups & Flows Common Fund Data Ecosystem
  • 37. Example: High-Performance Ptychography Workflows Funding Sources: ASCR, BES
  • 38. Automation services • Documentation: docs.globus.org/globus-automation-services • CLI: globus-automate-client.readthedocs.io • Python SDK: globus-automate-client.readthedocs.io • Sample flows visible to all users 38
  • 40. Globus Connect Server v5 • Feature parity with v4 • Custom DNS names (e.g. data.university.edu) • Multi-factor authentication policy • Enhanced sharing policy • Containerized deployment 40
  • 41. Fire-and forget transfers Data sharing with collaborators
  • 42. Partnership with the community to develop new connectors Community Connector Program
  • 43. Easy egress and ingress of data Data sharing with collaborators Publish data
  • 44. POSIX Staging Connector • For POSIX file system that cache from tertiary storage • Custom plug-in for staging files • Example: – IBM Spectrum Scale plugin, Brock Palen at University of Michigan - github.com/brockpalen/ltfsee-globus 44
  • 46. Globus Groups • Groups platform in production • Administrators can add users, in addition to invite • Membership policies simplified groups.api.globus.org/redoc 46
  • 47. Transfer and Sharing • Skip files with not found errors – List of skipped files once task is completed • Fail tasks with quota errors • Scheduled and replicated transfers – Manage scheduled/repeated transfer and sync tasks – pypi.org/project/globus-timer-cli 47
  • 48. Leveraging the Globus data platform… 48
  • 49. APS XPCS: secure data discovery 49 Globus Auth Globus Groups Globus Search
  • 50. APS XPCS: data access & preview 50 Globus Transfer HTTPS access
  • 51. APS XPCS: automated processing & indexing 51 Globus Flows: Transfer, analysis, and ingest to search index
  • 53. Globus Connect • Tools to migrate from v4 to v5 – Migration in phases (Q2 – Q3) – Goal: not require end user intervention • IPv6 support • Connectors – Azure Blob – Intel DAOS 53
  • 54. IAM and Data platform • Support use cases that need higher task throughput • Enhancements to data permissions management • Improvements to consent management • Integration with NIH Researcher Auth Service • Search service for high assurance tier • Leverage Search for Globus resources 54
  • 55. Automation platform • Lower the barrier for adoption – Web interfaces – Supporting tools/libraries – Action Providers for all Globus functionality • Exemplar flows for common use cases – Instrument data management – Data publication • Supported in high assurance tier 55
  • 56. Clients • Streamline SDK/CLI to across services • Web App – Updated management console – Accessibility standards • Enhancements to sample portal – Open source, for customization and deployment – Flask, Django 56
  • 58. Building the compute foundation for Globus 58
  • 59. Requirements for reliable, scalable, remote computing Researcher needs to run a computation on a remote PC, cloud, supercomputer 1. Compute Compute Facility Collaborator wants to run their colleague’s computation on another system closer to their data 3. Share Instrument 5. Build Gateway and application developers want to add remote computation to their code 2. Specialize Researcher needs to move it to a new system or architecture to improve performance 4. Community Access Collaborators want to share access to a single allocation to run compute tasks
  • 60. Function as a Service (FaaS) Developers work in terms of programming functions 1. Pick a runtime (e.g., Python) 2. Register function code 3. Run (and scale) Low latency, on-demand, elastic scaling, easy to deploy and update 60 def compute(input_args): # do something return results
  • 61. funcX: managed and federated FaaS • Cloud-hosted service for managing compute • Register and share compute endpoints • Register and share Python functions • Reliably, scalable, securely execute functions on remote endpoints • Integrated with Globus Auth and data ecosystem 61 Try funcx on Binder https://funcx.org
  • 62. Transform laptops, clusters, clouds into function serving endpoints • Python-based agent and pip installable locally or in Conda • Elastically provisions resources from local, cluster, or cloud system • Manages concurrent execution on provisioned resources • Optionally manages execution in Docker, Singularity, Shifter containers • Share endpoints with collaborators 62 $ pip install funcx-endpoint $ funcx-endpoint configure myep $ funcx-endpoint start myep
  • 63. Register and share functions Create funcX client (and authn) 63 def compute(input_args): # do something return results def compute(input_args): # do something return results def compute(input_args): # do something return results Define and register Python function
  • 64. Execute tasks on any accessible endpoint Select: function ID, endpoint ID, and input arguments Retrieve results asynchronously (funcX stores results in the cloud) 64 F(ep1,1) F(ep1, 2) F(ep1, 3) F(ep1, 4) F(ep1, 5) F(ep1, 6) F(ep2, 7)
  • 66. Canonical research automation flow for instruments 69 Data Capture Data Analysis / Model in the Loop Publication Data Staging Metadata Extraction And Data Cataloging Data Staging Catalog Feedback Data Generation Examples • Serial X-Ray Crystallography • X-Ray Photon Correlated Spectroscopy • High energy diffraction microscopy • High throughput ptychography • High energy x-ray diffractions
  • 67. Applying the Globus platform to science at the APS 70 Advanced Photon Source Key: funcX agent Globus Connect Theta Bebop Cluster Argonne Leadership Computing Facility Laboratory Computing Research Center Petrel store APS Computing Orthros Cluster APS DM system Portal server Portal server Cooley Action 1 Action 2 Action 3 Action 4
  • 68. Example: Rapid Training of Deep Neural Networks using Remote Resources • DNN at the edge for fast processing, filtering, QC • Requires tight coupling with simulation and training with real-time data • Globus Flow: 71 Data Source HPC/DCAI Edge(Host) Globus, Automate Commands Status Data Model User Request Status Commands Status C/S Zhengchun Liu, Jana Thayar, et al. – Globus to rapidly move data for training – funcX for simulation and model training – Globus to move models to the edge – (Future) funcX for inference at the edge
  • 70. Our Mission Increase the efficiency and effectiveness of researchers engaged in data-driven science and scholarship through sustainable software
  • 72. Adoption among R1 Institutions: 126 of 130 use Globus So, how are we doing? 75
  • 73. Adoption among U.S. national laboratories So, how are we doing? 76
  • 74. Notables… 77 BIG Movers 66.3 PB 2 Share 1,593 💛 Frequent Movers 887,000
  • 75. Thank you, funders... U . S . D E P A R T M E N T O F ENERGY
  • 76.
  • 77. Thank you to our Platinum sponsor!
  • 78. Thank you Gold sponsors!
  • 79. Thank you Gold sponsors!
  • 80. Thank you Gold sponsors!
  • 81. Thank you Gold sponsors!
  • 82. Thank you Gold sponsors!
  • 83. Thank you Gold sponsors!
  • 84. Thank you Patron sponsors!
  • 85. A word from our Platinum Sponsor Jordan Winkelman, Field Solutions CTO 89
  • 86. Join us in Gather.Town • Get answers at the Globus Genius Bar • Visit the Sponsor Showcase • Joint the scavenger hunt in The Garden • Play a game bit.ly/globustown (passcode: globus) 90