SlideShare a Scribd company logo
1 of 45
Download to read offline
1
A tool for datascience at scale.
Matthias Bussonnier
(UC Berkeley mbussonnier@berkeley.edu)
Slides examples on GitHub:
https://github.com/Carreau/talks/tree/master/labtech-2015
Jupyter
A bit of History
The Notebook Application
The document format & Publication
Multi-User & scaling
The Ecosystem
2
The Lifecycle of a Scientific Idea
1. Individual exploratory work
2. Collaborative development
3. Parallel production runs (HPC, cloud, …)
4. Publication & communication (reproducibly!)
5. Education
6. Goto 1
3
The Lifecycle of a Scientific Idea
1. Individual exploratory work – Matlab command line
2. Collaborative development – email scripts back and forth ?
3. Parallel production runs (HPC, cloud, …) – rewrite Fortran/MPI
4. Publication & communication (reproducibly!) – Copy Past in PPT
5. Education – Specific tools
6. Goto 1
4
Can we have a single tool that cover all the lifecycle of
a scientific idea, from data collection to publication ?
5
A bit of History
Fernando Perez, 2001, CU Boulder (instead of
writing a physics dissertation):
Python can replace the collection of bash,
perl, C/C++ Script. But the Python REPL
can be better.
6
NOVEMBER 2001: "JUST AN AFTERNOON HACK"
259 Line Python script. (https://gist.github.com/fperez/1579699)
sys.ps1 -> In [N].
sys.displayhook -> Out[N], caches results.
Plotting, Numeric, etc.
2014 (OPENHUB STATS)
19,279 commits
442 contributors
Total Lines: 187,326
Number of Languages : 7 (JS, CSS, HTML, …)
7
Improve over the terminal
❖ The REPL as a network protocol
❖ Kernels
❖ execute code
❖ Clients
❖ Read input
❖ Present output
Simple abstractions enable rich,
sophisticated clients 8
❖ Rich web client
❖ Text & math
❖ Code
❖ Results
❖ Share, reproduce.
2011: The IPython Notebook
9
The Team
(people that spend a noticeable amount of time on the project, subjective of course)
Fernando Perez (UC Berkeley LBL)
Brian Granger (CalPoly)
Oberon Lopez (summer student)
Cameron Oelsen (summer student)
SimonVurens (summer student)
Ryan Morshed (summer student)
Min Ragan-Kelley (Simula)
Thomas Kluyver (UK)
Matthias Bussonnier (UC Berkeley)
Jon Frederic (Cal Poly)
Jess Hamrick (UC Berkeley)
Kyle Kelley (Rackspace)
Jason Grout (Bloomberg)
Sylvain Corlay (Bloomberg)
Kester Tong (Google)
Nicholas Bollweg
Will Whitney (MIT)
Damián Avila (Continuum)
Steven Silvester (Continuum)
Chris Colbert (Continuum)
David Willmer (Continuum)
Peter Parente (IBM)
Dan Gisolfi (IBM)
Gino Bustelo (IBM)
All 400+ GitHub contributors.
Bold:Working full time on IPython/Jupyter, underline: Contribute to Jupyter/IPython with corporate agreement
10
Funding
11
Jupyter vs IPython
Network protocol for interactive
computing
Clients for protocol
Console
Qt Console
Notebook
Notebook file format & tools
(nbconvert…)
JupyterHub
Nbviewer
NbGrader
Tmpnb
…
Interactive Python shell at the
terminal
Kernel for this protocol in Python
Tools for Cross-Language
integration
Tools for Interactive Parallel
computing
The “reference” kernel for
Jupyter
12
Why ?
Don’t reinvent the wheel: reimplement 1 piece, get the rest for
free.
You don’t like the frontend, write a new one for Python get
50+ languages that work out of the box with it. (https://github.com/
ipython/ipython/wiki/IPython-kernels-for-other-languages)
You don’t like a language, write your own kernel, get all the
IDEs, conversion tools.
Etc..
13
The Notebook
Try it on https://try.jupyter.org
Demo
Notebook app, also have a terminal, text editor, increasing
number of plugins, and of course support 50 languages.
14
The notebook
Web Application, that allow code to produce
web-rich representation (images, sound, video,
math, …)
The Browser, Server, and kernel(s) can be on
separate machines.
The default application to edit `.ipynb` files.
`.ipynb` file are JSON based files embeding
input and output, so which can be read &
converted without a running kernel.
15
The Notebook Fileformat (`.ipynb`)
16
NbViewer
Zero-install reading of
notebooks
Just share a URL
nbviewer.org
Under the hood: get raw URL
and convert to HTML on the fly.
Sharing:
git push, or dropbox sync.
17
Nbviewer on GitHub
Since May GitHub renders
Notebooks
Powered by `nbconvert`,
the library that deals with
`.ipynb` -> *
Over 200,000 notebooks
on GitHub
18
What content as notebook ?
http://www.nature.com/ismej/journal/v7/n3/full/
ismej2012123a.html
http://qiime.org/home_static/nih-cloud-apr2012
Papers with code as AMI/VMs
19
Blogs
Jake van der Plas @ UW
http://blogs.scientificamerican.com/
sa-visual/2014/09/16/visualizing-4-
dimensional-asteroids
20
Course, MOOCS
21
Books
By Cameron Davidson-PilonBy Matthew Russell By José Unpingco
You can download and execute the books locally.
22
Check The Gallery
https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks23
Replicating, simpler for readers
http://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261
What if you didn’t
had to install anything ?
Docker Container
Just for you.
24
Replicating, simpler for authors
1. Fixed set of notebooks/dependencies
1. Tmpnb, (https://lambdaops.com/ipythonjupyter-tmpnb-debuts/)
2. CodeNeuro, (http://codeneuro.org/)
2. Build on demand
1. Everware, (https://github.com/everware)
2. Binder, (MyBinder.org)
25
https://github.com/binder-project
by Jeremy Freeman
(Demo mybinder.org )
26
The networking architecture
(single user)
Https/
websocket
ZMQ
27
MULTI-USER
Jupyter Notebook is Single-User by design.
Multi-User enable through JupyterHub
- Allow Better scalability
- resources monitoring/user
- Per-user configuration/version of IPython
- Better integration with existing infrastructure
28
Hub
Https/websocket proxy
Auth
& Security
Hub
29
Everything* is a plugin
• Auth:
• Unix PAM (default), OAuth, LDAP… (ie, not yet another thing to manage)
• Spawner - Start a single user server for each user.
• Localhost, Rackspace, EC2, Docker
• Meant for sysadmin,
• Deployments relatively involved
• Recent software stack (Node.js/Python3)
Hub
30
Quick Demo
31
JupyterHub in education
Jess Hamrick @ Cal
K. Kelley
Rackspace
M. Ragan-Kelley
Simula
B. Granger
Cal Poly
https://developer.rackspace.com/blog/deploying-jupyterhub-for-education
❖ Computationally intensive course, ~220 students
❖ Fully hosted environment, zero-install
❖ Integration with autograding.
32
Deploy at larger scale this fall at UC Berkeley
- Data Science 101
- Everyone with CalNet account.
New Jupyter in Education Mailing List:
https://groups.google.com/forum/#!forum/jupyter-education
33
Ecosystem
34
Non-Notebook
projects
IDE/Frontends:
Atom Hydrogen
EIN
VIM IPython
Rodeo
PyCharm
MicrosoftVisual Studio
35
Non-Notebook
projects
RISE (interactive slideshow)
runipy (notebooks are report templates)
ipymd (store notebook as markdown)
NbGrader (grade assignments)
Jupyter-Drive (store notebooks on G-drive)
pgcontents (store notebooks on PostGres)
urth (declarative widget, + dashboard from notebook)
36
Google CoLaboratory
Kayur Patel, Kester Tong, Mark Sanders, Corinna Cortes @ Google
Matt Turk @ NCSA/UIUC
Currently being merge
into Jupyter itself.
37
O’Reilly: authoring and delivering executable books
Atlas, ipymd and Thebe
beta.oreilly.com
38
The Future
39
40
Future work
Interactive Computing
Notebooks as interactive applications
Modular, reusable UI/UX
Software engineering with notebooks
Computational Narratives
nbconvert
Element filtering
Documentation
Collaboration
Real time collaboration
JupyterHub
Sustainability
People
Events
41
Future work
Component and tiled-layout
are oft requested feature.
Collaboration with
Continuum Analytics.
Plan on adding panels for
Text editor, output, variable
inspectors, debuggers, …
Discussion with Microsoft
PTVS team for “debugger
protocol”
42
43
Hiring
At UC Berkeley
Two new postdocs
Project manager
Web developer, tech writer (short contracts)
One administrative assistant.
At Cal Poly
Three software engineers (one already hired)
One designer
One administrative assistant.
44
Time for questions ?
Thanks
45

More Related Content

What's hot

What's hot (20)

Introduction to matplotlib
Introduction to matplotlibIntroduction to matplotlib
Introduction to matplotlib
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Data Visualization in Python
Data Visualization in PythonData Visualization in Python
Data Visualization in Python
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Lecture 9 Perceptron
Lecture 9 PerceptronLecture 9 Perceptron
Lecture 9 Perceptron
 
Backpropagation algo
Backpropagation  algoBackpropagation  algo
Backpropagation algo
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Fuzzy c means manual work
Fuzzy c means manual workFuzzy c means manual work
Fuzzy c means manual work
 
I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...
I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...
I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...
 
Introduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter NotebooksIntroduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter Notebooks
 
Machine learning
Machine learningMachine learning
Machine learning
 
Intro to Jupyter Notebooks
Intro to Jupyter NotebooksIntro to Jupyter Notebooks
Intro to Jupyter Notebooks
 
Branch and bound
Branch and boundBranch and bound
Branch and bound
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python Pandas
 

Viewers also liked

Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems level
Lars Juhl Jensen
 
Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological system
Lars Juhl Jensen
 

Viewers also liked (20)

LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/Bioconductor
 
Analytics meets Big Data – R/Python auf der Hadoop/Spark-Plattform
Analytics meets Big Data – R/Python auf der Hadoop/Spark-PlattformAnalytics meets Big Data – R/Python auf der Hadoop/Spark-Plattform
Analytics meets Big Data – R/Python auf der Hadoop/Spark-Plattform
 
Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems Biology
 
Computational Biology and Bioinformatics
Computational Biology and BioinformaticsComputational Biology and Bioinformatics
Computational Biology and Bioinformatics
 
Apps for Science - Elsevier Developer Network Workshop 201102
Apps for Science - Elsevier Developer Network Workshop 201102Apps for Science - Elsevier Developer Network Workshop 201102
Apps for Science - Elsevier Developer Network Workshop 201102
 
COMPUTATIONAL BIOLOGY
COMPUTATIONAL BIOLOGYCOMPUTATIONAL BIOLOGY
COMPUTATIONAL BIOLOGY
 
MongoDB - Big Data mit Open Source
MongoDB - Big Data mit Open SourceMongoDB - Big Data mit Open Source
MongoDB - Big Data mit Open Source
 
The Computer Scientist and the Cleaner v4
The Computer Scientist and the Cleaner v4The Computer Scientist and the Cleaner v4
The Computer Scientist and the Cleaner v4
 
Donald Knuth
Donald KnuthDonald Knuth
Donald Knuth
 
Zwischen Browser, Code & Photoshop - aus dem Leben eines Webworkers
Zwischen Browser, Code & Photoshop - aus dem Leben eines WebworkersZwischen Browser, Code & Photoshop - aus dem Leben eines Webworkers
Zwischen Browser, Code & Photoshop - aus dem Leben eines Webworkers
 
Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems level
 
DNA Information and Creation (PDF)
DNA Information and Creation (PDF)DNA Information and Creation (PDF)
DNA Information and Creation (PDF)
 
IBM - Big Value from Big Data
IBM - Big Value from Big DataIBM - Big Value from Big Data
IBM - Big Value from Big Data
 
Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological system
 
Data Scientist - The Sexiest Job of the 21st Century?
Data Scientist - The Sexiest Job of the 21st Century?Data Scientist - The Sexiest Job of the 21st Century?
Data Scientist - The Sexiest Job of the 21st Century?
 
Computational Systems Biology (JCSB)
Computational Systems Biology (JCSB)Computational Systems Biology (JCSB)
Computational Systems Biology (JCSB)
 
Tutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopTutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer Workshop
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its tools
 

Similar to Jupyter, A Platform for Data Science at Scale

Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
EGI Federation
 

Similar to Jupyter, A Platform for Data Science at Scale (20)

Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 Analytics
 
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
 
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
 
Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroids
 
Introduction to EasyBuild: Tutorial Part 1
Introduction to EasyBuild: Tutorial Part 1Introduction to EasyBuild: Tutorial Part 1
Introduction to EasyBuild: Tutorial Part 1
 
Behold the Power of Python
Behold the Power of PythonBehold the Power of Python
Behold the Power of Python
 
PyCon2022 - Building Python Extensions
PyCon2022 - Building Python ExtensionsPyCon2022 - Building Python Extensions
PyCon2022 - Building Python Extensions
 
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 
Python 101 For The Net Developer
Python 101 For The Net DeveloperPython 101 For The Net Developer
Python 101 For The Net Developer
 
1_International_Google_CoLab_20220307.pptx
1_International_Google_CoLab_20220307.pptx1_International_Google_CoLab_20220307.pptx
1_International_Google_CoLab_20220307.pptx
 
Python 101 for the .NET Developer
Python 101 for the .NET DeveloperPython 101 for the .NET Developer
Python 101 for the .NET Developer
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
Python 3.5: An agile, general-purpose development language.
Python 3.5: An agile, general-purpose development language.Python 3.5: An agile, general-purpose development language.
Python 3.5: An agile, general-purpose development language.
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
 
Do you know all of Puppet?
Do you know all of Puppet?Do you know all of Puppet?
Do you know all of Puppet?
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
 
London level39
London level39London level39
London level39
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Jupyter, A Platform for Data Science at Scale

  • 1. 1 A tool for datascience at scale. Matthias Bussonnier (UC Berkeley mbussonnier@berkeley.edu) Slides examples on GitHub: https://github.com/Carreau/talks/tree/master/labtech-2015
  • 2. Jupyter A bit of History The Notebook Application The document format & Publication Multi-User & scaling The Ecosystem 2
  • 3. The Lifecycle of a Scientific Idea 1. Individual exploratory work 2. Collaborative development 3. Parallel production runs (HPC, cloud, …) 4. Publication & communication (reproducibly!) 5. Education 6. Goto 1 3
  • 4. The Lifecycle of a Scientific Idea 1. Individual exploratory work – Matlab command line 2. Collaborative development – email scripts back and forth ? 3. Parallel production runs (HPC, cloud, …) – rewrite Fortran/MPI 4. Publication & communication (reproducibly!) – Copy Past in PPT 5. Education – Specific tools 6. Goto 1 4
  • 5. Can we have a single tool that cover all the lifecycle of a scientific idea, from data collection to publication ? 5
  • 6. A bit of History Fernando Perez, 2001, CU Boulder (instead of writing a physics dissertation): Python can replace the collection of bash, perl, C/C++ Script. But the Python REPL can be better. 6
  • 7. NOVEMBER 2001: "JUST AN AFTERNOON HACK" 259 Line Python script. (https://gist.github.com/fperez/1579699) sys.ps1 -> In [N]. sys.displayhook -> Out[N], caches results. Plotting, Numeric, etc. 2014 (OPENHUB STATS) 19,279 commits 442 contributors Total Lines: 187,326 Number of Languages : 7 (JS, CSS, HTML, …) 7
  • 8. Improve over the terminal ❖ The REPL as a network protocol ❖ Kernels ❖ execute code ❖ Clients ❖ Read input ❖ Present output Simple abstractions enable rich, sophisticated clients 8
  • 9. ❖ Rich web client ❖ Text & math ❖ Code ❖ Results ❖ Share, reproduce. 2011: The IPython Notebook 9
  • 10. The Team (people that spend a noticeable amount of time on the project, subjective of course) Fernando Perez (UC Berkeley LBL) Brian Granger (CalPoly) Oberon Lopez (summer student) Cameron Oelsen (summer student) SimonVurens (summer student) Ryan Morshed (summer student) Min Ragan-Kelley (Simula) Thomas Kluyver (UK) Matthias Bussonnier (UC Berkeley) Jon Frederic (Cal Poly) Jess Hamrick (UC Berkeley) Kyle Kelley (Rackspace) Jason Grout (Bloomberg) Sylvain Corlay (Bloomberg) Kester Tong (Google) Nicholas Bollweg Will Whitney (MIT) Damián Avila (Continuum) Steven Silvester (Continuum) Chris Colbert (Continuum) David Willmer (Continuum) Peter Parente (IBM) Dan Gisolfi (IBM) Gino Bustelo (IBM) All 400+ GitHub contributors. Bold:Working full time on IPython/Jupyter, underline: Contribute to Jupyter/IPython with corporate agreement 10
  • 12. Jupyter vs IPython Network protocol for interactive computing Clients for protocol Console Qt Console Notebook Notebook file format & tools (nbconvert…) JupyterHub Nbviewer NbGrader Tmpnb … Interactive Python shell at the terminal Kernel for this protocol in Python Tools for Cross-Language integration Tools for Interactive Parallel computing The “reference” kernel for Jupyter 12
  • 13. Why ? Don’t reinvent the wheel: reimplement 1 piece, get the rest for free. You don’t like the frontend, write a new one for Python get 50+ languages that work out of the box with it. (https://github.com/ ipython/ipython/wiki/IPython-kernels-for-other-languages) You don’t like a language, write your own kernel, get all the IDEs, conversion tools. Etc.. 13
  • 14. The Notebook Try it on https://try.jupyter.org Demo Notebook app, also have a terminal, text editor, increasing number of plugins, and of course support 50 languages. 14
  • 15. The notebook Web Application, that allow code to produce web-rich representation (images, sound, video, math, …) The Browser, Server, and kernel(s) can be on separate machines. The default application to edit `.ipynb` files. `.ipynb` file are JSON based files embeding input and output, so which can be read & converted without a running kernel. 15
  • 16. The Notebook Fileformat (`.ipynb`) 16
  • 17. NbViewer Zero-install reading of notebooks Just share a URL nbviewer.org Under the hood: get raw URL and convert to HTML on the fly. Sharing: git push, or dropbox sync. 17
  • 18. Nbviewer on GitHub Since May GitHub renders Notebooks Powered by `nbconvert`, the library that deals with `.ipynb` -> * Over 200,000 notebooks on GitHub 18
  • 19. What content as notebook ? http://www.nature.com/ismej/journal/v7/n3/full/ ismej2012123a.html http://qiime.org/home_static/nih-cloud-apr2012 Papers with code as AMI/VMs 19
  • 20. Blogs Jake van der Plas @ UW http://blogs.scientificamerican.com/ sa-visual/2014/09/16/visualizing-4- dimensional-asteroids 20
  • 22. Books By Cameron Davidson-PilonBy Matthew Russell By José Unpingco You can download and execute the books locally. 22
  • 24. Replicating, simpler for readers http://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261 What if you didn’t had to install anything ? Docker Container Just for you. 24
  • 25. Replicating, simpler for authors 1. Fixed set of notebooks/dependencies 1. Tmpnb, (https://lambdaops.com/ipythonjupyter-tmpnb-debuts/) 2. CodeNeuro, (http://codeneuro.org/) 2. Build on demand 1. Everware, (https://github.com/everware) 2. Binder, (MyBinder.org) 25
  • 27. The networking architecture (single user) Https/ websocket ZMQ 27
  • 28. MULTI-USER Jupyter Notebook is Single-User by design. Multi-User enable through JupyterHub - Allow Better scalability - resources monitoring/user - Per-user configuration/version of IPython - Better integration with existing infrastructure 28
  • 30. Everything* is a plugin • Auth: • Unix PAM (default), OAuth, LDAP… (ie, not yet another thing to manage) • Spawner - Start a single user server for each user. • Localhost, Rackspace, EC2, Docker • Meant for sysadmin, • Deployments relatively involved • Recent software stack (Node.js/Python3) Hub 30
  • 32. JupyterHub in education Jess Hamrick @ Cal K. Kelley Rackspace M. Ragan-Kelley Simula B. Granger Cal Poly https://developer.rackspace.com/blog/deploying-jupyterhub-for-education ❖ Computationally intensive course, ~220 students ❖ Fully hosted environment, zero-install ❖ Integration with autograding. 32
  • 33. Deploy at larger scale this fall at UC Berkeley - Data Science 101 - Everyone with CalNet account. New Jupyter in Education Mailing List: https://groups.google.com/forum/#!forum/jupyter-education 33
  • 36. Non-Notebook projects RISE (interactive slideshow) runipy (notebooks are report templates) ipymd (store notebook as markdown) NbGrader (grade assignments) Jupyter-Drive (store notebooks on G-drive) pgcontents (store notebooks on PostGres) urth (declarative widget, + dashboard from notebook) 36
  • 37. Google CoLaboratory Kayur Patel, Kester Tong, Mark Sanders, Corinna Cortes @ Google Matt Turk @ NCSA/UIUC Currently being merge into Jupyter itself. 37
  • 38. O’Reilly: authoring and delivering executable books Atlas, ipymd and Thebe beta.oreilly.com 38
  • 40. 40
  • 41. Future work Interactive Computing Notebooks as interactive applications Modular, reusable UI/UX Software engineering with notebooks Computational Narratives nbconvert Element filtering Documentation Collaboration Real time collaboration JupyterHub Sustainability People Events 41
  • 42. Future work Component and tiled-layout are oft requested feature. Collaboration with Continuum Analytics. Plan on adding panels for Text editor, output, variable inspectors, debuggers, … Discussion with Microsoft PTVS team for “debugger protocol” 42
  • 43. 43
  • 44. Hiring At UC Berkeley Two new postdocs Project manager Web developer, tech writer (short contracts) One administrative assistant. At Cal Poly Three software engineers (one already hired) One designer One administrative assistant. 44
  • 45. Time for questions ? Thanks 45