Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The	
  OPTIRAD	
  Pla-orm:	
  Cloud-­‐hosted	
  IPython	
  Notebooks	
  
for	
  collabora?ve	
  EO	
  Data	
  Analysis	
  ...
Introduc?on	
  
•  OPTIRAD	
  =	
  OPTImisa?on	
  environment	
  for	
  joint	
  retrieval	
  of	
  
mul?-­‐sensor	
  RADi...
OPTIRAD	
  Goals	
  
Address	
  the	
  challenge	
  of	
  
producing	
  consistent	
  EO	
  
land	
  surface	
  informa?on...
IPython	
  Notebook	
  
•  Provides	
  Python	
  kernels	
  accessible	
  via	
  a	
  
web	
  browser	
  	
  
•  Sessions	...
IPython	
  Notebook	
  +	
  Cloud	
  
•  Cloud’s	
  characteris?cs:	
  
–  Broad	
  network	
  access,	
  resource	
  pool...
Notebook:	
  	
  
a	
  user	
  –	
  applica?on	
  perspec?ve	
  
Support	
  a	
  spectrum	
  of	
  usage	
  models	
  
Diff...
Design	
  and	
  development	
  	
  
considera?ons	
  
•  Host	
  on	
  JASMIN-­‐CEMS	
  	
  
–  Data	
  analysis	
  facil...
OPTIRAD	
  JASMIN	
  Cloud	
  Tenancy	
  
Docker	
  Container	
  
VM:	
  Swarm	
  pool	
  0	
  
VM:	
  Swarm	
  pool	
  0	...
Conclusions	
  +	
  Next	
  Steps	
  
•  Experiences	
  from	
  project	
  delivery	
  
–  Off-­‐shelf	
  solu?on	
  using	...
Demo	
  .	
  .	
  .	
  
•  A	
  tutorial	
  on	
  EO	
  data	
  
assimila?on	
  
–  Notebook	
  blurs	
  the	
  
tradi?ona...
Further	
  informa?on	
  
•  OPTIRAD:	
  
–  Op?misa?on	
  Environment	
  For	
  Joint	
  Retrieval	
  Of	
  Mul?-­‐Sensor...
Upcoming SlideShare
Loading in …5
×

The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative EO Data Analysis and Processing

We review experiences with the deployment of a cloud-hosted IPython Notebook service to serve as a collaborative platform for earth observation (EO) data analysis and processing.

OPTIRAD (OPTImisation environment for joint retrieval of multi-sensor RADiances) is an ESA funded project addressing the challenge of producing consistent EO land surface information products from heterogeneous EO data inputs. The project poses a number of challenges from an infrastructure provisioning perspective: First, the need was identified to provide a collaborative research environment as a means to engender closer working between algorithm specialists, modellers and end users. Secondly any hosting platform needs sufficient compute memory and storage capacity to support processing at high spatial and temporal resolutions with computationally expensive algorithms. Finally, the system would need to support the execution and development of existing Python code and the provision of interactive tutorials for new users. To this end, a solution has been developed based on the IPython Notebook hosted on the private cloud provided by the JASMIN / CEMS data analysis facility at STFC Rutherford Appleton Laboratory in the UK.

The IPython Notebook has been gaining traction in recent years as a collaborative tool for scientific computing and data analysis. It provides an interactive Python shell hosted in an intuitive user-friendly interface together with the ability to save and share sessions. As a web-based application it is readily amenable for hosting on a cloud, enabling the scaling of resources - especially in this context in terms of the compute capability and memory at the disposal of each user. JASMIN/CEMS uses IPython’s JupyterHub to provide multi-user support and each user session has access to IPython.parallel which effectively wraps parallel compute capability behind a simple Python interface. This platform therefore provides a customisable training and processing environment with compute resources beyond the scale available to desktop users.

Further work is underway to enhance the existing system to broaden and extend its capabilities. The JASMIN/CEMS deployment is being trialled to run in Docker containers building on recent work done by the IPython community. This will facilitate greater portability between cloud providers. Combined with systems for provenance capture, the user of containers can contribute towards replicable science with any given algorithm annotated with provenance metadata and its runtime environment effectively encapsulated within a given container.

  • Be the first to comment

The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative EO Data Analysis and Processing

  1. 1. The  OPTIRAD  Pla-orm:  Cloud-­‐hosted  IPython  Notebooks   for  collabora?ve  EO  Data  Analysis  and  Processing   ESA  EO  Open  Science  2.0  Conference  12-­‐14  October  2015     Philip  Kershaw  (CEDA),  John  Holt  (Tessella  plc.)  José  Gómez-­‐Dans,  Philip  Lewis  (UCL)   Nicola  Pounder,  Jon  Styles  (Assimila  Ltd.)   JASMIN  (STFC/Stephen  Kill)  
  2. 2. Introduc?on   •  OPTIRAD  =  OPTImisa?on  environment  for  joint  retrieval  of   mul?-­‐sensor  RADiances   –  Collabora?on:  CEDA,  UCL,  Assimila  Ltd,  FastOpt  and  VU  Amsterdam   –  Funded  by  ESA   •  Overview  of  technical  solu?on   –  Introduc?on  to  IPython  (Jupyter)  Notebook   –  Deployment  on  JASMIN-­‐CEMS  science  cloud   •  Make  the  case,  IPython  Notebook  +  Cloud  =  powerful   combina?on  for  EO  Open  Science  2.0  
  3. 3. OPTIRAD  Goals   Address  the  challenge  of   producing  consistent  EO   land  surface  informa?on   products  from   heterogeneous  EO  data   input:   Collabora?on:  provide  a  collabora?ve   research  environment  as  a  means  to   engender  closer  working  between  algorithm   specialists,  modellers  and  end  users.       Compu?ng  resources:  processing  at  high   spa?al  and  temporal  resolu?ons  with   computa?onally  expensive  algorithms.       Usability  and  access:  easy  execu?on  and   development  of  exis?ng  Python  code  and   the  provision  of  interac?ve  tutorials  for  new   users  
  4. 4. IPython  Notebook   •  Provides  Python  kernels  accessible  via  a   web  browser     •  Sessions  can  be  saved  and  shared     •  Trivial  access  to  parallel  processing   capabili?es  –  IPython.parallel  (ipyparallel)   •  IPython  Jupyter  Notebook   •  Support  for  other  languages  such  as   R   •  New  JupyterHub  allows  mul?-­‐user   management  of  notebooks   •  Gained  trac?on  as  a  teaching  and   collabora?ve  tool    
  5. 5. IPython  Notebook  +  Cloud   •  Cloud’s  characteris?cs:   –  Broad  network  access,  resource  pooling,  elas?city,  scale  –  compute  and   storage     –  Good  fit  for  Big  Data  science  applica?ons     •  Cloud-­‐hosted  Notebook  -­‐  a  model  already  demonstrated  with   public  cloud  services  e.g.   –  Wakari,  Azure,  Rackspace     •  Central  hos?ng  allows  central  management  of  socware  packages   –  no  installa?on  steps  needed  for  the  user     •  Algorithm  prototyping  environment  next  to  Big  Data   –  Acts  as  a  precursor  to  opera?onal  processing  services  
  6. 6. Notebook:     a  user  –  applica?on  perspec?ve   Support  a  spectrum  of  usage  models   Different   classes  of   user   Long-­‐tail  of  science  users  è  
  7. 7. Design  and  development     considera?ons   •  Host  on  JASMIN-­‐CEMS     –  Data  analysis  facility  and  science  cloud  at  Rutherford  Appleton  Lab,  UK   –  Advantage  of  proximity  to  locally  hosted  EO  and  climate  science  datasets   –  Integra?on  with  environmental  sciences  community     •  Lightweight  development  and  deployment  philosophy   –  Build  on  Open  Source  and  community  efforts  to  use  what’s  already  available     •  How  to  meet  mul?-­‐user  support  requirement?   –  Buy  off-­‐the-­‐shelf:  run  Wakari  on  JASMIN-­‐CEMS  pla-orm  or   –  Try  JupyterHub:  mul?-­‐user  IPython  Notebook  solu?on  or   –  Roll  our  own  solu?on     •  How  to  integrate  parallel  processing?   –  IPython.parallel  (ipyparallel)  Python  API  accessed  via  the  Notebook  
  8. 8. OPTIRAD  JASMIN  Cloud  Tenancy   Docker  Container   VM:  Swarm  pool  0   VM:  Swarm  pool  0   Deployment  Architecture   JupyterHub   VM:  Swarm  pool  0   Docker  Container   IPython   Notebook   Kernel   Docker  Container   IPython   Notebook   Kernel   Kernel   Kernel   Parallel   Controller   Parallel   Controller   VM:  Swarm   pool  0   VM:  Swarm   pool  0   VM:  slave  0   Parallel   Engine   Parallel   Engine   Nodes  for  parallel   Processing   Notebooks  and  kernels  in   containers   Swarm  manages  alloca?on   of  containers  for  notebooks   Manage  users   and  provision  of   notebooks   Swarm   Firewall     VM:  shared  services   NFS   LDAP   Browser  access  
  9. 9. Conclusions  +  Next  Steps   •  Experiences  from  project  delivery   –  Off-­‐shelf  solu?on  using  JupyterHub  paid  off   –  JupyterHub  and  Swarm  was  new  but   –  Installa?on  straigh-orward  +  opera?onally  robust   •  Challenges  and  future  development   –  Extend  use  of  containers  for  parallel  compute   –  Challenge:  managing  cloud  elas?city  with  both  containers  and  host   VMs   –  Provide  object  storage  –  CEPH  likely  to  be  adopted   –  Expand  from  OPTIRAD  pilot  to  wider  user  community   –  Deploy  with  toolboxes  e.g.  Sen?nels  or  CIS.  
  10. 10. Demo  .  .  .   •  A  tutorial  on  EO  data   assimila?on   –  Notebook  blurs  the   tradi?onal  separa?on   between  tutorial   documenta?on  and   using  the  target  system   –  The  two  are  one  self-­‐ contained  interac?ve   unit  J  
  11. 11. Further  informa?on   •  OPTIRAD:   –  Op?misa?on  Environment  For  Joint  Retrieval  Of  Mul?-­‐Sensor  Radiances   (OPTIRAD),  Proceedings  of  the  ESA  2014  Conference  on  Big  Data  from   Space  (BiDS’14)  hip://dx.doi.org/10.2788/1823     •  JASMIN  paper  (Sept  2013)   –  hip://home.badc.rl.ac.uk/lawrence/sta?c/2013/10/14/ LawEA13_Jasmin.pdf   –  Cloud  paper  to  follow  soon   •  Cloud-­‐hosted  JupyterHub  with  Docker  for  teaching:   –  hips://developer.rackspace.com/blog/deploying-­‐jupyterhub-­‐for-­‐ educa?on/       •  JASMIN  and  CEDA:   –  hip://jasmin.ac.uk/     –  hip://www.ceda.ac.uk   •  @PhilipJKershaw    

    Be the first to comment

    Login to see the comments

  • willingc

    Dec. 4, 2015

We review experiences with the deployment of a cloud-hosted IPython Notebook service to serve as a collaborative platform for earth observation (EO) data analysis and processing. OPTIRAD (OPTImisation environment for joint retrieval of multi-sensor RADiances) is an ESA funded project addressing the challenge of producing consistent EO land surface information products from heterogeneous EO data inputs. The project poses a number of challenges from an infrastructure provisioning perspective: First, the need was identified to provide a collaborative research environment as a means to engender closer working between algorithm specialists, modellers and end users. Secondly any hosting platform needs sufficient compute memory and storage capacity to support processing at high spatial and temporal resolutions with computationally expensive algorithms. Finally, the system would need to support the execution and development of existing Python code and the provision of interactive tutorials for new users. To this end, a solution has been developed based on the IPython Notebook hosted on the private cloud provided by the JASMIN / CEMS data analysis facility at STFC Rutherford Appleton Laboratory in the UK. The IPython Notebook has been gaining traction in recent years as a collaborative tool for scientific computing and data analysis. It provides an interactive Python shell hosted in an intuitive user-friendly interface together with the ability to save and share sessions. As a web-based application it is readily amenable for hosting on a cloud, enabling the scaling of resources - especially in this context in terms of the compute capability and memory at the disposal of each user. JASMIN/CEMS uses IPython’s JupyterHub to provide multi-user support and each user session has access to IPython.parallel which effectively wraps parallel compute capability behind a simple Python interface. This platform therefore provides a customisable training and processing environment with compute resources beyond the scale available to desktop users. Further work is underway to enhance the existing system to broaden and extend its capabilities. The JASMIN/CEMS deployment is being trialled to run in Docker containers building on recent work done by the IPython community. This will facilitate greater portability between cloud providers. Combined with systems for provenance capture, the user of containers can contribute towards replicable science with any given algorithm annotated with provenance metadata and its runtime environment effectively encapsulated within a given container.

Views

Total views

534

On Slideshare

0

From embeds

0

Number of embeds

2

Actions

Downloads

7

Shares

0

Comments

0

Likes

1

×