We review experiences with the deployment of a cloud-hosted IPython Notebook service to serve as a collaborative platform for earth observation (EO) data analysis and processing.
OPTIRAD (OPTImisation environment for joint retrieval of multi-sensor RADiances) is an ESA funded project addressing the challenge of producing consistent EO land surface information products from heterogeneous EO data inputs. The project poses a number of challenges from an infrastructure provisioning perspective: First, the need was identified to provide a collaborative research environment as a means to engender closer working between algorithm specialists, modellers and end users. Secondly any hosting platform needs sufficient compute memory and storage capacity to support processing at high spatial and temporal resolutions with computationally expensive algorithms. Finally, the system would need to support the execution and development of existing Python code and the provision of interactive tutorials for new users. To this end, a solution has been developed based on the IPython Notebook hosted on the private cloud provided by the JASMIN / CEMS data analysis facility at STFC Rutherford Appleton Laboratory in the UK.
The IPython Notebook has been gaining traction in recent years as a collaborative tool for scientific computing and data analysis. It provides an interactive Python shell hosted in an intuitive user-friendly interface together with the ability to save and share sessions. As a web-based application it is readily amenable for hosting on a cloud, enabling the scaling of resources - especially in this context in terms of the compute capability and memory at the disposal of each user. JASMIN/CEMS uses IPython’s JupyterHub to provide multi-user support and each user session has access to IPython.parallel which effectively wraps parallel compute capability behind a simple Python interface. This platform therefore provides a customisable training and processing environment with compute resources beyond the scale available to desktop users.
Further work is underway to enhance the existing system to broaden and extend its capabilities. The JASMIN/CEMS deployment is being trialled to run in Docker containers building on recent work done by the IPython community. This will facilitate greater portability between cloud providers. Combined with systems for provenance capture, the user of containers can contribute towards replicable science with any given algorithm annotated with provenance metadata and its runtime environment effectively encapsulated within a given container.
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative EO Data Analysis and Processing
1. The
OPTIRAD
Pla-orm:
Cloud-‐hosted
IPython
Notebooks
for
collabora?ve
EO
Data
Analysis
and
Processing
ESA
EO
Open
Science
2.0
Conference
12-‐14
October
2015
Philip
Kershaw
(CEDA),
John
Holt
(Tessella
plc.)
José
Gómez-‐Dans,
Philip
Lewis
(UCL)
Nicola
Pounder,
Jon
Styles
(Assimila
Ltd.)
JASMIN
(STFC/Stephen
Kill)
2. Introduc?on
• OPTIRAD
=
OPTImisa?on
environment
for
joint
retrieval
of
mul?-‐sensor
RADiances
– Collabora?on:
CEDA,
UCL,
Assimila
Ltd,
FastOpt
and
VU
Amsterdam
– Funded
by
ESA
• Overview
of
technical
solu?on
– Introduc?on
to
IPython
(Jupyter)
Notebook
– Deployment
on
JASMIN-‐CEMS
science
cloud
• Make
the
case,
IPython
Notebook
+
Cloud
=
powerful
combina?on
for
EO
Open
Science
2.0
3. OPTIRAD
Goals
Address
the
challenge
of
producing
consistent
EO
land
surface
informa?on
products
from
heterogeneous
EO
data
input:
Collabora?on:
provide
a
collabora?ve
research
environment
as
a
means
to
engender
closer
working
between
algorithm
specialists,
modellers
and
end
users.
Compu?ng
resources:
processing
at
high
spa?al
and
temporal
resolu?ons
with
computa?onally
expensive
algorithms.
Usability
and
access:
easy
execu?on
and
development
of
exis?ng
Python
code
and
the
provision
of
interac?ve
tutorials
for
new
users
4. IPython
Notebook
• Provides
Python
kernels
accessible
via
a
web
browser
• Sessions
can
be
saved
and
shared
• Trivial
access
to
parallel
processing
capabili?es
–
IPython.parallel
(ipyparallel)
• IPython
Jupyter
Notebook
• Support
for
other
languages
such
as
R
• New
JupyterHub
allows
mul?-‐user
management
of
notebooks
• Gained
trac?on
as
a
teaching
and
collabora?ve
tool
5. IPython
Notebook
+
Cloud
• Cloud’s
characteris?cs:
– Broad
network
access,
resource
pooling,
elas?city,
scale
–
compute
and
storage
– Good
fit
for
Big
Data
science
applica?ons
• Cloud-‐hosted
Notebook
-‐
a
model
already
demonstrated
with
public
cloud
services
e.g.
– Wakari,
Azure,
Rackspace
• Central
hos?ng
allows
central
management
of
socware
packages
– no
installa?on
steps
needed
for
the
user
• Algorithm
prototyping
environment
next
to
Big
Data
– Acts
as
a
precursor
to
opera?onal
processing
services
6. Notebook:
a
user
–
applica?on
perspec?ve
Support
a
spectrum
of
usage
models
Different
classes
of
user
Long-‐tail
of
science
users
è
7. Design
and
development
considera?ons
• Host
on
JASMIN-‐CEMS
– Data
analysis
facility
and
science
cloud
at
Rutherford
Appleton
Lab,
UK
– Advantage
of
proximity
to
locally
hosted
EO
and
climate
science
datasets
– Integra?on
with
environmental
sciences
community
• Lightweight
development
and
deployment
philosophy
– Build
on
Open
Source
and
community
efforts
to
use
what’s
already
available
• How
to
meet
mul?-‐user
support
requirement?
– Buy
off-‐the-‐shelf:
run
Wakari
on
JASMIN-‐CEMS
pla-orm
or
– Try
JupyterHub:
mul?-‐user
IPython
Notebook
solu?on
or
– Roll
our
own
solu?on
• How
to
integrate
parallel
processing?
– IPython.parallel
(ipyparallel)
Python
API
accessed
via
the
Notebook
8. OPTIRAD
JASMIN
Cloud
Tenancy
Docker
Container
VM:
Swarm
pool
0
VM:
Swarm
pool
0
Deployment
Architecture
JupyterHub
VM:
Swarm
pool
0
Docker
Container
IPython
Notebook
Kernel
Docker
Container
IPython
Notebook
Kernel
Kernel
Kernel
Parallel
Controller
Parallel
Controller
VM:
Swarm
pool
0
VM:
Swarm
pool
0
VM:
slave
0
Parallel
Engine
Parallel
Engine
Nodes
for
parallel
Processing
Notebooks
and
kernels
in
containers
Swarm
manages
alloca?on
of
containers
for
notebooks
Manage
users
and
provision
of
notebooks
Swarm
Firewall
VM:
shared
services
NFS
LDAP
Browser
access
9. Conclusions
+
Next
Steps
• Experiences
from
project
delivery
– Off-‐shelf
solu?on
using
JupyterHub
paid
off
– JupyterHub
and
Swarm
was
new
but
– Installa?on
straigh-orward
+
opera?onally
robust
• Challenges
and
future
development
– Extend
use
of
containers
for
parallel
compute
– Challenge:
managing
cloud
elas?city
with
both
containers
and
host
VMs
– Provide
object
storage
–
CEPH
likely
to
be
adopted
– Expand
from
OPTIRAD
pilot
to
wider
user
community
– Deploy
with
toolboxes
e.g.
Sen?nels
or
CIS.
10. Demo
.
.
.
• A
tutorial
on
EO
data
assimila?on
– Notebook
blurs
the
tradi?onal
separa?on
between
tutorial
documenta?on
and
using
the
target
system
– The
two
are
one
self-‐
contained
interac?ve
unit
J
11. Further
informa?on
• OPTIRAD:
– Op?misa?on
Environment
For
Joint
Retrieval
Of
Mul?-‐Sensor
Radiances
(OPTIRAD),
Proceedings
of
the
ESA
2014
Conference
on
Big
Data
from
Space
(BiDS’14)
hip://dx.doi.org/10.2788/1823
• JASMIN
paper
(Sept
2013)
– hip://home.badc.rl.ac.uk/lawrence/sta?c/2013/10/14/
LawEA13_Jasmin.pdf
– Cloud
paper
to
follow
soon
• Cloud-‐hosted
JupyterHub
with
Docker
for
teaching:
– hips://developer.rackspace.com/blog/deploying-‐jupyterhub-‐for-‐
educa?on/
• JASMIN
and
CEDA:
– hip://jasmin.ac.uk/
– hip://www.ceda.ac.uk
• @PhilipJKershaw