5. Jupyter Notebooks supports ML Lifecycle
1. Collect
Data
Retrieve Files
Query SQL Databases
Call Web Services
“Scrape” Web Pages
2.
Prepare
Data
Explore Data
Validate Data
Clean Data
Features / Data
4.
Evaluate
Model
Test Performance
Compare Models
Validate Model
Visualize
5. Deploy
Model
Export Model File
Prepare Job
Deploy Container
Re-package Model
Execute code blocks:
- Python, R… code
- SQL queries
- Shell commands
3. Train
Model
Prepare Training Set
Experiment
Test Model
Visualize
Write Documentation:
- Markdown language
Visualize Data
- Viz tools…
9. Mathematica evolved…
Jupyter Notebook
Market leader
Started for single use
Academic community
GitHub integration
Added Jupyter Hub for
collaboration
Zeppelin Notebook
Start for collaboration
Enterprise
Security
Vendor Notebook
Databricks for Apache Spark
Jupyter-like, but proprietary
format
@lynnlangit
10. Running Notebooks
Desktop
Install and run
Local Server
Can use Jupyter Hub for groups
Cloud
Large number of options
@lynnlangit
Docker
Start a container
11. Extending, Refactoring Open Notebooks
• Write functions in one notebook
• Link to another notebook
• Write extensions (nbextensions.com)
14. What is Blastn?
Basic Local Alignment Search Tool - BLAST finds regions of similarity
between biological sequences. The program compares nucleotide or
protein sequences to sequence databases and calculates the
statistical significance.
15.
16. Cloud-based Jupyter
PaaS
• AWS SageMaker
• Azure Notebooks
• Google Colabs
Wireframe that
first the role of UX
in agencies
@lynnlangit
17.
18. Tools for Jupyter
• Binder for GitHub
• Point to your GitHub Repo
• Jupyter Notebooks
• Requirements.txt
• It builds a Docker image
• You can run your Notebooks
@lynnlangit
21. Future of Jupyter for Research
Academic
Institutions
and
Research
Labs
UC Berkeley, Davis, San Diego
Cal Poly San Luis Obispo
Clemson University
UC Boulder
U of Illinois, Minnesota, Missouri, Rochester, Texas
MIT
Michigan State U
Texas A & M
@lynnlangit
Editor's Notes
History talk from Cristian Prieto (NDC Oslo 2016) -- https://vimeo.com/223984769
http://blog.fperez.org/2012/01/ipython-notebook-historical.html
Local install
pip install –iPython all -OR- can use anaconda, which installs Jupyter notebooks by default
pip install jupyter[all] and you can pip install R
You can use Docker – 2.1 GB image contains all libraries or you can use Azure Notebooks or AWS SageMaker Notebooks
Only Python2 is installed by default, you can install other runtimes
Start and run in local browser (no database, uses local .json files)
IPython notebook -> localhost:8888/tree
Use GitHub-flavor Markdown (by default)
https://dwhsys.com/2017/03/25/apache-zeppelin-vs-jupyter-notebook/
https://medium.com/@lynnlangit/aws-sagemaker-for-bioinformatics-b8e8a96479d8
Jupyter on GCE VM -- https://towardsdatascience.com/running-jupyter-notebook-in-google-cloud-platform-in-15-min-61e16da34d52
https://mybinder.org/ -ALSO-
https://nbviewer.jupyter.org/ - allows you to run notebooks stored in GitHub