SlideShare a Scribd company logo
1 of 32
Download to read offline
Computable Content: 

Notebooks, containers, and data-centric
organizational learning
Domino Data Science Popup

2017-02-22
Paco Nathan, @pacoid

Dir, Learning Group @ O’Reilly Media
Project Jupyter
3
Project Jupyter is the evolution of iPython notebooks,
applied to a range of different programming languages
and environments
https://jupyter.org/
https://github.com/ipython/ipython/wiki/IPython-
kernels-for-other-languages
Some history…
4
Download Anaconda:
continuum.io/downloads
Activate the environment needed:
source activate py3k
Launch Juypter:
jupyter notebook
An example notebook (requires installs; see notes):
github.com/ceteri/oriole_jupyterday_atl/blob/master/example.ipynb
Installation and launch using Anaconda
5
text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.
'''
from textblob import TextBlob
blob = TextBlob(text)
print(blob.tags)
print(blob.noun_phrases)
Installation and launch using Anaconda
6
7
At its core, one can think of Jupyter as a suite 

of network protocols:
Jupyter is to the remote semantics of a REPL

as…

HTTP is to the remote semantics of file share
A suite of network protocols
8
An excellent team
9
JupyterHub
github.com/jupyterhub/jupyterhub
Jupyter in Education
groups.google.com/forum/#!forum/jupyter-education
JupyterLab (alpha preview)
github.com/jupyterlab/jupyterlab
Jupyter Kernels
github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages
Projects:
10
documentation
jupyter.readthedocs.io/en/latest/index.html
discussions
groups.google.com/forum/#!forum/jupyter
gitter.im/jupyter/jupyter
events
calendar.google.com/calendar/embed?
src=p51j0ac1iccmj44tae12hq4dk0%40group.calendar.google.com
Resources:
11
speaking of upcoming events, stay tuned for …
Resources:
Computable Content
14
An observation…
15
Jupyter @ O’Reilly Media
Embracing Jupyter Notebooks at O'Reilly

oreilly.com/ideas/jupyter-at-oreilly
Learn alongside innovators, thought-by-thought, in context

oreilly.com/ideas/oreilly-oriole-learn-alongside-innovators-
thought-by-thought-in-context
Oriole Online Tutorials

safaribooksonline.com/oriole/
How Do You Learn?
oreilly.com/learning/how-do-you-learn
16
For example…
• A unique new medium blends code,
data, text, and video into a narrated
learning experience with computable
content
• Purely browser-based UX; zero
installation required
• Substantially higher engagement
metrics
• Opens the door for live coding 

in assessments
• GitHub lists over 300K public 

Jupyter notebooks
Regex Golf by Peter Norvig

oreilly.com/learning/regex-golf-
with-peter-norvig
17
Motivations
O’Reilly needed a way for authors to use Jupyter notebooks to create
professional publications. We also wanted to integrate video narration
into the UX. The result is a unique new medium called Oriole:
• Jupyter notebooks are used in the middleware
• each viewer gets a 100% HTML experience 

(no download/install needed)
• context as a “unit of thought”
• the code and video are sync’ed together
• each web session has a Docker container running in the cloud
18
Motivations
Innovators in programming, data science, dev ops, design, etc., tend to
be really busy people. Tutorials are now much quicker to publish than
“traditional” books and videos. The audience gets direct, hands-on,
contextualized experience across a wide variety of programming
environments.
19
Literate Programming, Don Knuth

literateprogramming.com/
Paraphrased:
Instead of telling computers what to do, tell other
people what you want the computers to do
Some history
20
Wolfram Research introduced notebooks in 1988 

for working with Mathematica…
Some history
21
PyCon 2016 Keynote, Lorena Barba
youtu.be/ckW1xuGVpug?t=35m11s (video)
figshare.com/articles/PyCon2016_Keynote/3407779 (slides)
Highly recommended: speech acts (based 

on Winograd and Flores) as theory for what 

we’re doing here
More recently
Notebook Practice
23
• focus on a concise “unit of thought”
• invest the time and editorial effort to create a good intro
• keep your narrative simple and reasonably linear
• “chunk” the text and code into understandable parts
• alternate between text, code, output, further links, etc.
• use markdown for interesting links: background, deep-dive, etc.
• code cells shouldn’t be long (< 10 lines), must show output
• load data+libraries from the container, not the network
• clear all output then “Run All” – or it didn’t happen
• video narratives: there’s text, and there’s subtext...
• pause after each “beat” – smile, breathe, let people follow you
Tips learned by teaching with Jupyter
For the JVM people: stop thinking only about IDEs, Ivy, Maven, etc. (ibid, Knuth1984)

BUILD UBER JARS, LOAD LIBS FROM CONTAINER, NOT THE NETWORK!

(apologies for shouting)
24
Jupyter notebooks + Git repos provide a low-cost,
pragmatic way toward the practice of repeatable
science – in this case, repeatable Data Science
• executable documents
• code + params + results + descriptions
• shareable insights
Notebooks: a cure for silos
25
In data science, we see the benefits to teams for shared
insights, storytelling, etc.
Meanwhile domain expertise is generally more important than
knowledge about tools
There’s a value for developers to use notebooks in lieu of IDEs
in some cases – what are those cases?
GitHub now renders notebooks, so they can be used for
documentation, reporting, etc.
Digital Object Identifiers (DOI) can be assigned through
Zenodo, making notebooks citable for academic publication
“Sharing is caring”
Authoring & Scale-Out
27
Launchbot.io
28
Launchbot allows a notebook author to build a
container that includes the required Jupyter kernel,
installed libraries, datasets, etc.
You need to have Docker installed on your laptop
The backend uses Git and DockerHub to manage
containers
For scale, deploy to DC/OS
Achieving scale
29
A notebook, a container, and ~20 minutes of
informal video walk into a bar...
O’Reilly Media conferences + training:
NLP in Python

repeated live online courses
Strata

SJ Mar 13-16

Deep Learning sessions, 2-day training
Artificial Intelligence

NY Jun 26-29, SF Sep 17-20

SF CFP is open, follow @OreillyAI for updates
speaker:
periodic newsletter for updates, 

events, conf summaries, etc.:
liber118.com/pxn/

@pacoid
A modest proposalJust Enough Math Building Data
Science Teams
Hylbert-SpeysHow Do You Learn?

More Related Content

Viewers also liked

Data Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software EngineersData Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software EngineersDomino Data Lab
 
Data Science and Goodhart's Law
Data Science and Goodhart's LawData Science and Goodhart's Law
Data Science and Goodhart's LawDomino Data Lab
 
Success Through an Actionable Data Science Stack
Success Through an Actionable Data Science StackSuccess Through an Actionable Data Science Stack
Success Through an Actionable Data Science StackDomino Data Lab
 
Sentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social MediaSentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social MediaDomino Data Lab
 
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment IndustriesCapturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment IndustriesDomino Data Lab
 
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry DataA Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry DataDomino Data Lab
 
Realtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going OnRealtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going OnDomino Data Lab
 
Machine Learning at Netflix
Machine Learning at NetflixMachine Learning at Netflix
Machine Learning at NetflixDomino Data Lab
 
Challenges of Predicting User Engagement
Challenges of Predicting User EngagementChallenges of Predicting User Engagement
Challenges of Predicting User EngagementDomino Data Lab
 
DSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco NathanDSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco NathanPaco Nathan
 
Paquetes oficiales living tours peru
Paquetes oficiales living tours peruPaquetes oficiales living tours peru
Paquetes oficiales living tours peruFAUL KNER RAMOS LEON
 
Gayane cather resume 2017
Gayane cather resume 2017Gayane cather resume 2017
Gayane cather resume 2017Gayane Cather
 

Viewers also liked (14)

Data Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software EngineersData Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software Engineers
 
Data Science and Goodhart's Law
Data Science and Goodhart's LawData Science and Goodhart's Law
Data Science and Goodhart's Law
 
Success Through an Actionable Data Science Stack
Success Through an Actionable Data Science StackSuccess Through an Actionable Data Science Stack
Success Through an Actionable Data Science Stack
 
Sentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social MediaSentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social Media
 
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment IndustriesCapturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
 
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry DataA Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
 
Open Data for Social Good
Open Data for Social GoodOpen Data for Social Good
Open Data for Social Good
 
The Right Question
The Right QuestionThe Right Question
The Right Question
 
Realtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going OnRealtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going On
 
Machine Learning at Netflix
Machine Learning at NetflixMachine Learning at Netflix
Machine Learning at Netflix
 
Challenges of Predicting User Engagement
Challenges of Predicting User EngagementChallenges of Predicting User Engagement
Challenges of Predicting User Engagement
 
DSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco NathanDSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco Nathan
 
Paquetes oficiales living tours peru
Paquetes oficiales living tours peruPaquetes oficiales living tours peru
Paquetes oficiales living tours peru
 
Gayane cather resume 2017
Gayane cather resume 2017Gayane cather resume 2017
Gayane cather resume 2017
 

Similar to Computable content: Notebooks, containers, and data-centric organizational learning

Computable Content
Computable ContentComputable Content
Computable ContentPaco Nathan
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons LearnedPaco Nathan
 
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoEGI Federation
 
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)PyData
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksNatalino Busa
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioMuralidharan Deenathayalan
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioMuralidharan Deenathayalan
 
Machine learning in cybersecutiry
Machine learning in cybersecutiryMachine learning in cybersecutiry
Machine learning in cybersecutiryVishwas N
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of PythonAsia Smith
 
Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroidsJose Enrique Ruiz
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.Nicholas Pringle
 
Jupyter notebook for interactive data visualization敖
Jupyter notebook for interactive data visualization敖Jupyter notebook for interactive data visualization敖
Jupyter notebook for interactive data visualization敖Jellyfish.tech
 
Sensing Platform Overview
Sensing Platform OverviewSensing Platform Overview
Sensing Platform Overviewabyssknight
 
Azure Notebooks - Jupyter for the Cloud
Azure Notebooks - Jupyter for the CloudAzure Notebooks - Jupyter for the Cloud
Azure Notebooks - Jupyter for the CloudCameron Vetter
 
Jupyter: A Gateway for Scientific Collaboration and Education
Jupyter: A Gateway for Scientific Collaboration and EducationJupyter: A Gateway for Scientific Collaboration and Education
Jupyter: A Gateway for Scientific Collaboration and EducationCarol Willing
 
Portland Science Hack Day: Open Source Hardware
Portland Science Hack Day: Open Source HardwarePortland Science Hack Day: Open Source Hardware
Portland Science Hack Day: Open Source HardwareDrew Fustini
 
JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...No Bu
 
Raspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflowRaspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflow霈萱 蔡
 
Take the Smalltalk Red Pill
Take the Smalltalk Red PillTake the Smalltalk Red Pill
Take the Smalltalk Red PillOSOCO
 

Similar to Computable content: Notebooks, containers, and data-centric organizational learning (20)

Computable Content
Computable ContentComputable Content
Computable Content
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons Learned
 
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
 
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter Notebooks
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
 
Machine learning in cybersecutiry
Machine learning in cybersecutiryMachine learning in cybersecutiry
Machine learning in cybersecutiry
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
 
Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroids
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.
 
Jupyter notebook for interactive data visualization敖
Jupyter notebook for interactive data visualization敖Jupyter notebook for interactive data visualization敖
Jupyter notebook for interactive data visualization敖
 
London level39
London level39London level39
London level39
 
Sensing Platform Overview
Sensing Platform OverviewSensing Platform Overview
Sensing Platform Overview
 
Azure Notebooks - Jupyter for the Cloud
Azure Notebooks - Jupyter for the CloudAzure Notebooks - Jupyter for the Cloud
Azure Notebooks - Jupyter for the Cloud
 
Jupyter: A Gateway for Scientific Collaboration and Education
Jupyter: A Gateway for Scientific Collaboration and EducationJupyter: A Gateway for Scientific Collaboration and Education
Jupyter: A Gateway for Scientific Collaboration and Education
 
Portland Science Hack Day: Open Source Hardware
Portland Science Hack Day: Open Source HardwarePortland Science Hack Day: Open Source Hardware
Portland Science Hack Day: Open Source Hardware
 
JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...
 
Raspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflowRaspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflow
 
Take the Smalltalk Red Pill
Take the Smalltalk Red PillTake the Smalltalk Red Pill
Take the Smalltalk Red Pill
 

More from Domino Data Lab

What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...Domino Data Lab
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...Domino Data Lab
 
Racial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops dataRacial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops dataDomino Data Lab
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itDomino Data Lab
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationDomino Data Lab
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryDomino Data Lab
 
Summertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile VirusSummertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile VirusDomino Data Lab
 
Reproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterReproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterDomino Data Lab
 
GeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceGeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceDomino Data Lab
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Domino Data Lab
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Domino Data Lab
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at ScaleDomino Data Lab
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataDomino Data Lab
 
Software Engineering for Data Scientists
Software Engineering for Data ScientistsSoftware Engineering for Data Scientists
Software Engineering for Data ScientistsDomino Data Lab
 
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Domino Data Lab
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyDomino Data Lab
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsDomino Data Lab
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino Data Lab
 
The Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data ScienceThe Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data ScienceDomino Data Lab
 

More from Domino Data Lab (20)

What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...
 
Racial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops dataRacial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops data
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentation
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive Industry
 
Summertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile VirusSummertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile Virus
 
Reproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterReproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with Jupyter
 
GeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceGeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data Science
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at Scale
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked Data
 
Software Engineering for Data Scientists
Software Engineering for Data ScientistsSoftware Engineering for Data Scientists
Software Engineering for Data Scientists
 
Making Big Data Smart
Making Big Data SmartMaking Big Data Smart
Making Big Data Smart
 
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technology
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
The Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data ScienceThe Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data Science
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Computable content: Notebooks, containers, and data-centric organizational learning

  • 1. Computable Content: 
 Notebooks, containers, and data-centric organizational learning Domino Data Science Popup
 2017-02-22 Paco Nathan, @pacoid
 Dir, Learning Group @ O’Reilly Media
  • 3. 3 Project Jupyter is the evolution of iPython notebooks, applied to a range of different programming languages and environments https://jupyter.org/ https://github.com/ipython/ipython/wiki/IPython- kernels-for-other-languages Some history…
  • 4. 4 Download Anaconda: continuum.io/downloads Activate the environment needed: source activate py3k Launch Juypter: jupyter notebook An example notebook (requires installs; see notes): github.com/ceteri/oriole_jupyterday_atl/blob/master/example.ipynb Installation and launch using Anaconda
  • 5. 5 text = ''' The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be damned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant. ''' from textblob import TextBlob blob = TextBlob(text) print(blob.tags) print(blob.noun_phrases) Installation and launch using Anaconda
  • 6. 6
  • 7. 7 At its core, one can think of Jupyter as a suite 
 of network protocols: Jupyter is to the remote semantics of a REPL
 as…
 HTTP is to the remote semantics of file share A suite of network protocols
  • 9. 9 JupyterHub github.com/jupyterhub/jupyterhub Jupyter in Education groups.google.com/forum/#!forum/jupyter-education JupyterLab (alpha preview) github.com/jupyterlab/jupyterlab Jupyter Kernels github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages Projects:
  • 11. 11 speaking of upcoming events, stay tuned for … Resources:
  • 12.
  • 15. 15 Jupyter @ O’Reilly Media Embracing Jupyter Notebooks at O'Reilly
 oreilly.com/ideas/jupyter-at-oreilly Learn alongside innovators, thought-by-thought, in context
 oreilly.com/ideas/oreilly-oriole-learn-alongside-innovators- thought-by-thought-in-context Oriole Online Tutorials
 safaribooksonline.com/oriole/ How Do You Learn? oreilly.com/learning/how-do-you-learn
  • 16. 16 For example… • A unique new medium blends code, data, text, and video into a narrated learning experience with computable content • Purely browser-based UX; zero installation required • Substantially higher engagement metrics • Opens the door for live coding 
 in assessments • GitHub lists over 300K public 
 Jupyter notebooks Regex Golf by Peter Norvig
 oreilly.com/learning/regex-golf- with-peter-norvig
  • 17. 17 Motivations O’Reilly needed a way for authors to use Jupyter notebooks to create professional publications. We also wanted to integrate video narration into the UX. The result is a unique new medium called Oriole: • Jupyter notebooks are used in the middleware • each viewer gets a 100% HTML experience 
 (no download/install needed) • context as a “unit of thought” • the code and video are sync’ed together • each web session has a Docker container running in the cloud
  • 18. 18 Motivations Innovators in programming, data science, dev ops, design, etc., tend to be really busy people. Tutorials are now much quicker to publish than “traditional” books and videos. The audience gets direct, hands-on, contextualized experience across a wide variety of programming environments.
  • 19. 19 Literate Programming, Don Knuth
 literateprogramming.com/ Paraphrased: Instead of telling computers what to do, tell other people what you want the computers to do Some history
  • 20. 20 Wolfram Research introduced notebooks in 1988 
 for working with Mathematica… Some history
  • 21. 21 PyCon 2016 Keynote, Lorena Barba youtu.be/ckW1xuGVpug?t=35m11s (video) figshare.com/articles/PyCon2016_Keynote/3407779 (slides) Highly recommended: speech acts (based 
 on Winograd and Flores) as theory for what 
 we’re doing here More recently
  • 23. 23 • focus on a concise “unit of thought” • invest the time and editorial effort to create a good intro • keep your narrative simple and reasonably linear • “chunk” the text and code into understandable parts • alternate between text, code, output, further links, etc. • use markdown for interesting links: background, deep-dive, etc. • code cells shouldn’t be long (< 10 lines), must show output • load data+libraries from the container, not the network • clear all output then “Run All” – or it didn’t happen • video narratives: there’s text, and there’s subtext... • pause after each “beat” – smile, breathe, let people follow you Tips learned by teaching with Jupyter For the JVM people: stop thinking only about IDEs, Ivy, Maven, etc. (ibid, Knuth1984)
 BUILD UBER JARS, LOAD LIBS FROM CONTAINER, NOT THE NETWORK!
 (apologies for shouting)
  • 24. 24 Jupyter notebooks + Git repos provide a low-cost, pragmatic way toward the practice of repeatable science – in this case, repeatable Data Science • executable documents • code + params + results + descriptions • shareable insights Notebooks: a cure for silos
  • 25. 25 In data science, we see the benefits to teams for shared insights, storytelling, etc. Meanwhile domain expertise is generally more important than knowledge about tools There’s a value for developers to use notebooks in lieu of IDEs in some cases – what are those cases? GitHub now renders notebooks, so they can be used for documentation, reporting, etc. Digital Object Identifiers (DOI) can be assigned through Zenodo, making notebooks citable for academic publication “Sharing is caring”
  • 28. 28 Launchbot allows a notebook author to build a container that includes the required Jupyter kernel, installed libraries, datasets, etc. You need to have Docker installed on your laptop The backend uses Git and DockerHub to manage containers For scale, deploy to DC/OS Achieving scale
  • 29. 29 A notebook, a container, and ~20 minutes of informal video walk into a bar...
  • 30. O’Reilly Media conferences + training: NLP in Python
 repeated live online courses Strata
 SJ Mar 13-16
 Deep Learning sessions, 2-day training Artificial Intelligence
 NY Jun 26-29, SF Sep 17-20
 SF CFP is open, follow @OreillyAI for updates
  • 31.
  • 32. speaker: periodic newsletter for updates, 
 events, conf summaries, etc.: liber118.com/pxn/
 @pacoid A modest proposalJust Enough Math Building Data Science Teams Hylbert-SpeysHow Do You Learn?