SlideShare a Scribd company logo
1 of 29
Reproducible Science with Python
Andreas Schreiber
Department for Intelligent and Distributed Systems
German Aerospace Center (DLR), Cologne/Berlin
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 1
Simulations, experiments, data analytics, …
Science
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 2
SIMULATION FAILED
Reproducing results relies on
• Open Source codes
• Code reviews
• Code repositories
• Publications with code
• Computational environment
captured (Docker etc.)
• Workflows
• Open Data formats
• Data management
• (Electronics) laboratory notebooks
• Provenance
Reproducible Science
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 3
Provenance is information about entities, activities, and
people involved in producing a piece of data or thing,
which can be used to form assessments about its
quality, reliability or trustworthiness.
PROV W3C Working Group
https://www.w3.org/TR/prov-overview
Provenance
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 4
W3C Provenance Working Group:
https://www.w3.org/2011/prov
PROV
• The goal of PROV is to enable the wide publication and
interchange of provenance on the Web and other information
systems
• PROV enables one to represent and interchange provenance
information using widely available formats such as RDF and XML
PROV
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 5
Entities
• Physical, digital, conceptual, or other kinds of things
• For example, documents, web sites, graphics, or data sets
Activities
• Activities generate new entities or
make use of existing entities
• Activities could be actions or processes
Agents
• Agents takes a role in an activity and
have the responsibility for the activity
• For example, persons, pieces of software, or organizations
Overview of PROV
Key Concepts
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 6
PROV Data Model
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 7
Activity
Entity
Agent
wasGeneratedBy
used
wasDerivedFrom
wasAttributedTo
wasAssociatedWith
Example Provenance
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 8
https://provenance.ecs.soton.ac.uk/store/documents/113794/
Storing Provenance
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 9
Provenance
Store
Provenance recording
during runtime
Applications
/ Workflow
Data (results)
Some Storage Technologies
• Relational databases and SQL
• XML and Xpath
• RDF and SPARQL
• Graph databases and Gremlin/Cypher
Services
• REST APIs
• ProvStore (University of Southampton)
Storing and Retrieving Provenance
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 10
ProvStore
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 11
https://provenance.ecs.soton.ac.uk/store/
Provenance is a directed acyclic graph (DAG)
Graphs
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 12
A
B
E
F
G
D
C
Naturally, graph databases are a good technology for storing
(Provenance) graphs
Many graph databases are available
• Neo4J
• Titan
• ArangoDB
• ...
Query languages
• Cypher
• Gremlin (TinkerPop)
• GraphQL
Graph Databases
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 13
• Open-Source
• Implemented in Java
• Stores property graphs
(key-value-based, directed)
http://neo4j.com
Neo4j
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 14
Depends on your application (tools, languages, etc.)
Libraries for Python
• prov
• provneo4j
• NoWorkflow
• …
Tools
• Git2PROV
• Prov-sty for LaTeX
• …
Gathering Provenance
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 15
Python Library prov
https://github.com/trungdong/prov
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 16
from prov.model import ProvDocument
# Create a new provenance document
d1 = ProvDocument()
# Entity: now:employment-article-v1.html
e1 = d1.entity('now:employment-article-v1.html')
# Agent: nowpeople:Bob
d1.agent('nowpeople:Bob')
# Attributing the article to the agent
d1.wasAttributedTo(e1, 'nowpeople:Bob')
d1.entity('govftp:oesm11st.zip',
{'prov:label': 'employment-stats-2011',
'prov:type': 'void:Dataset'})
d1.wasDerivedFrom('now:employment-article-v1.html',
'govftp:oesm11st.zip')
# Adding an activity
d1.activity('is:writeArticle')
d1.used('is:writeArticle', 'govftp:oesm11st.zip')
d1.wasGeneratedBy('now:employment-article-v1.html', 'is:writeArticle')
Python Library prov
https://github.com/trungdong/prov
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 17
Example
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 18
http://localhost:8888/notebooks/WeightCompanion PROV.ipynb
provneo4j – Storing PROV Documents in Neo4j
https://github.com/DLR-SC/provneo4j
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 19
import provneo4j.api
provneo4j_api = provneo4j.api.Api(
base_url="http://localhost:7474/db/data",
username="neo4j", password="python")
provneo4j_api.document.create(prov_doc, name=”MyProv”)
provneo4j – Storing PROV Documents in Neo4j
https://github.com/DLR-SC/provneo4j
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 20
NoWorkflow – Provenance of Scripts
https://github.com/gems-uff/noworkflow
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 21
Project
experiment.py
p12.dat
p13.dat
precipitation.py
p14.dat
out.png
$ now run -e Tracker experiment.py
Git2PROV
http://git2prov.org
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 22
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 23
LaTeX Annotations
prov-sty for LaTeX
https://github.com/prov-suite/prov-sty
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 24
begin{document}
provAuthor{Andreas Schreiber}{http://orcid.org/0000-0001-5750-5649}
provOrganization{German Aerospace Center (DLR)}{http://www.dlr.de}
provTitle{A Provenance Model for Quantified Self Data}
provProject
{PROV-SPEC (FS12016)}
{http://www.dlr.de/sc/desktopdefault.aspx/tabid-8073/}
{http://www.bmwi.de/}
Electronic Laboratory Notebook
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 25
Query „Who worked on experiment X?“
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 26
$experiment = g.key($_g, 'identifier', X)
$user = $experiment/inE/inV/outE[@label = controlled_by]
Practical Use Case – Archive all Data of a Paper
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 27
https://github.com/DLR-SC/DataFinder
New PROV Library for Python in development
• https://github.com/DLR-SC/prov-db-connector
• Connectors for Neo4j implemented, ArangoDB planned
• APIs in REST, ZeroMQ, MQTT
Trusted Provenance
• Storing Provenance in Blockchains
Provenance for people
• New approaches for visualization
• For example, PROV Comics
Current Research and Development
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 28
> PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 29
Thank You!
Questions?
Andreas.Schreiber@dlr.de
www.DLR.de/sc | @onyame

More Related Content

What's hot

What's hot (20)

Applying Machine Learning using H2O
Applying Machine Learning using H2OApplying Machine Learning using H2O
Applying Machine Learning using H2O
 
MySQL performance monitoring using Statsd and Graphite (PLUK2013)
MySQL performance monitoring using Statsd and Graphite (PLUK2013)MySQL performance monitoring using Statsd and Graphite (PLUK2013)
MySQL performance monitoring using Statsd and Graphite (PLUK2013)
 
Statsd introduction
Statsd introductionStatsd introduction
Statsd introduction
 
What's new in Spark 2.0?
What's new in Spark 2.0?What's new in Spark 2.0?
What's new in Spark 2.0?
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
 
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael VaroquauxPyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
 
Write Graph Algorithms Like a Boss Andrew Ray
Write Graph Algorithms Like a Boss Andrew RayWrite Graph Algorithms Like a Boss Andrew Ray
Write Graph Algorithms Like a Boss Andrew Ray
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
 
Case Studies in advanced analytics with R
Case Studies in advanced analytics with RCase Studies in advanced analytics with R
Case Studies in advanced analytics with R
 
Seventh openCypher Implementers Group Meeting: Status Update
Seventh openCypher Implementers Group Meeting: Status UpdateSeventh openCypher Implementers Group Meeting: Status Update
Seventh openCypher Implementers Group Meeting: Status Update
 
Big Data with Modern R & Spark
Big Data with Modern R & SparkBig Data with Modern R & Spark
Big Data with Modern R & Spark
 
Patching Mr Robot: Mitigating IoT-Related Cyber-social Disasters by getting F...
Patching Mr Robot: Mitigating IoT-Related Cyber-social Disasters by getting F...Patching Mr Robot: Mitigating IoT-Related Cyber-social Disasters by getting F...
Patching Mr Robot: Mitigating IoT-Related Cyber-social Disasters by getting F...
 
Managing data workflows with Luigi
Managing data workflows with LuigiManaging data workflows with Luigi
Managing data workflows with Luigi
 
Knowledge Graph for Cybersecurity: An Introduction By Kabul Kurniawan
Knowledge Graph for Cybersecurity: An Introduction By  Kabul KurniawanKnowledge Graph for Cybersecurity: An Introduction By  Kabul Kurniawan
Knowledge Graph for Cybersecurity: An Introduction By Kabul Kurniawan
 
Virtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisVirtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log Analysis
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
 
HNSciCloud Status Update
HNSciCloud Status UpdateHNSciCloud Status Update
HNSciCloud Status Update
 
WSO2 Virtual Hackathon Big Data in the Cloud Case Study
WSO2 Virtual Hackathon Big Data in the Cloud Case StudyWSO2 Virtual Hackathon Big Data in the Cloud Case Study
WSO2 Virtual Hackathon Big Data in the Cloud Case Study
 

Viewers also liked

Human Capital in de 21e eeuw
Human Capital in de 21e eeuwHuman Capital in de 21e eeuw
Human Capital in de 21e eeuw
han mesters
 
Incident Command: The far side of the edge
Incident Command: The far side of the edgeIncident Command: The far side of the edge
Incident Command: The far side of the edge
Fastly
 
vanEngelen 360 Inspiratieborrel - Trends Update 2014
vanEngelen 360 Inspiratieborrel - Trends Update 2014vanEngelen 360 Inspiratieborrel - Trends Update 2014
vanEngelen 360 Inspiratieborrel - Trends Update 2014
Van Engelen
 
Cedar Ridge Weekly Report
Cedar Ridge Weekly ReportCedar Ridge Weekly Report
Cedar Ridge Weekly Report
clstutts
 

Viewers also liked (20)

Can you handle The TRUTH ,..? Missing page history of JESUS and Hidden TRUTH
Can you handle The TRUTH ,..?  Missing page history of JESUS and Hidden TRUTHCan you handle The TRUTH ,..?  Missing page history of JESUS and Hidden TRUTH
Can you handle The TRUTH ,..? Missing page history of JESUS and Hidden TRUTH
 
Application Development on Metapod
Application Development on MetapodApplication Development on Metapod
Application Development on Metapod
 
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
 
Say no to var_dump
Say no to var_dumpSay no to var_dump
Say no to var_dump
 
Human Capital in de 21e eeuw
Human Capital in de 21e eeuwHuman Capital in de 21e eeuw
Human Capital in de 21e eeuw
 
Service Orchestrierung mit Apache Mesos
Service Orchestrierung mit Apache MesosService Orchestrierung mit Apache Mesos
Service Orchestrierung mit Apache Mesos
 
Building mental models
Building mental modelsBuilding mental models
Building mental models
 
Incident Command: The far side of the edge
Incident Command: The far side of the edgeIncident Command: The far side of the edge
Incident Command: The far side of the edge
 
De tabernakel
De tabernakelDe tabernakel
De tabernakel
 
Using NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion housesUsing NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion houses
 
Respond to and troubleshoot production incidents like an sa
Respond to and troubleshoot production incidents like an saRespond to and troubleshoot production incidents like an sa
Respond to and troubleshoot production incidents like an sa
 
Cloud adoption patterns
Cloud adoption patternsCloud adoption patterns
Cloud adoption patterns
 
Linux Malware Analysis
Linux Malware Analysis	Linux Malware Analysis
Linux Malware Analysis
 
vanEngelen 360 Inspiratieborrel - Trends Update 2014
vanEngelen 360 Inspiratieborrel - Trends Update 2014vanEngelen 360 Inspiratieborrel - Trends Update 2014
vanEngelen 360 Inspiratieborrel - Trends Update 2014
 
Build Stuff 2015 program
Build Stuff 2015 programBuild Stuff 2015 program
Build Stuff 2015 program
 
Next-gen Network Telemetry is Within Your Packets: In-band OAM
Next-gen Network Telemetry is Within Your Packets: In-band OAMNext-gen Network Telemetry is Within Your Packets: In-band OAM
Next-gen Network Telemetry is Within Your Packets: In-band OAM
 
Comparison: VNS3 and Openswan
Comparison: VNS3 and OpenswanComparison: VNS3 and Openswan
Comparison: VNS3 and Openswan
 
Interact Differently: Get More From Your Tools Through Exposed APIs
Interact Differently: Get More From Your Tools Through Exposed APIsInteract Differently: Get More From Your Tools Through Exposed APIs
Interact Differently: Get More From Your Tools Through Exposed APIs
 
Cedar Ridge Weekly Report
Cedar Ridge Weekly ReportCedar Ridge Weekly Report
Cedar Ridge Weekly Report
 
Dashboards: Using data to find out what's really going on
Dashboards: Using data to find out what's really going onDashboards: Using data to find out what's really going on
Dashboards: Using data to find out what's really going on
 

Similar to Reproducible Science with Python

20130409 1 developing apps for android with kivy
20130409 1 developing apps for android with kivy20130409 1 developing apps for android with kivy
20130409 1 developing apps for android with kivy
Droidcon Berlin
 
Scientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & SociologyScientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & Sociology
Neil Chue Hong
 

Similar to Reproducible Science with Python (20)

Provenance as a building block for an open science infrastructure
Provenance as a building block for an open science infrastructureProvenance as a building block for an open science infrastructure
Provenance as a building block for an open science infrastructure
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
 
Provenance for Reproducible Data Science
Provenance for Reproducible Data ScienceProvenance for Reproducible Data Science
Provenance for Reproducible Data Science
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code Review
 
Reveal's Advanced Analytics: Using R & Python
Reveal's Advanced Analytics: Using R & PythonReveal's Advanced Analytics: Using R & Python
Reveal's Advanced Analytics: Using R & Python
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
James Coplien: Trygve - Oct 17, 2016
James Coplien: Trygve - Oct 17, 2016James Coplien: Trygve - Oct 17, 2016
James Coplien: Trygve - Oct 17, 2016
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Is there a free lunch for cloud-based evolutionary algorithms?
Is there a free lunch for cloud-based evolutionary algorithms?Is there a free lunch for cloud-based evolutionary algorithms?
Is there a free lunch for cloud-based evolutionary algorithms?
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
 
Introduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research GraphIntroduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research Graph
 
20130409 1 developing apps for android with kivy
20130409 1 developing apps for android with kivy20130409 1 developing apps for android with kivy
20130409 1 developing apps for android with kivy
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Scientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & SociologyScientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & Sociology
 
Session 2
Session 2Session 2
Session 2
 
Introduction to interactive data visualisation using R Shiny
Introduction to interactive data visualisation using R ShinyIntroduction to interactive data visualisation using R Shiny
Introduction to interactive data visualisation using R Shiny
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)
 
R Programming Overview
R Programming Overview R Programming Overview
R Programming Overview
 

More from Andreas Schreiber

More from Andreas Schreiber (20)

Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
Provenance-based Security Audits and its Application to COVID-19 Contact Trac...Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
 
Visualization of Software Architectures in Virtual Reality and Augmented Reality
Visualization of Software Architectures in Virtual Reality and Augmented RealityVisualization of Software Architectures in Virtual Reality and Augmented Reality
Visualization of Software Architectures in Virtual Reality and Augmented Reality
 
Raising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace CenterRaising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace Center
 
Open Source Licensing for Rocket Scientists
Open Source Licensing for Rocket ScientistsOpen Source Licensing for Rocket Scientists
Open Source Licensing for Rocket Scientists
 
Interactive Visualization of Software Components with Virtual Reality Headsets
Interactive Visualization of Software Components with Virtual Reality HeadsetsInteractive Visualization of Software Components with Virtual Reality Headsets
Interactive Visualization of Software Components with Virtual Reality Headsets
 
Visualizing Provenance using Comics
Visualizing Provenance using ComicsVisualizing Provenance using Comics
Visualizing Provenance using Comics
 
Quantified Self Comics
Quantified Self ComicsQuantified Self Comics
Quantified Self Comics
 
Nachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
Nachvollziehbarkeit mit Hinblick auf Privacy-VerletzungenNachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
Nachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
 
A Provenance Model for Quantified Self Data
A Provenance Model for Quantified Self DataA Provenance Model for Quantified Self Data
A Provenance Model for Quantified Self Data
 
Open Source im DLR
Open Source im DLROpen Source im DLR
Open Source im DLR
 
Tracking after Stroke: Doctors, Dogs and All The Rest
Tracking after Stroke: Doctors, Dogs and All The RestTracking after Stroke: Doctors, Dogs and All The Rest
Tracking after Stroke: Doctors, Dogs and All The Rest
 
High Throughput Processing of Space Debris Data
High Throughput Processing of Space Debris DataHigh Throughput Processing of Space Debris Data
High Throughput Processing of Space Debris Data
 
Bericht von der QS15 Conference & Exposition
Bericht von der QS15 Conference & ExpositionBericht von der QS15 Conference & Exposition
Bericht von der QS15 Conference & Exposition
 
Telemedizin: Gesundheit, messbar für jedermann
Telemedizin: Gesundheit, messbar für jedermannTelemedizin: Gesundheit, messbar für jedermann
Telemedizin: Gesundheit, messbar für jedermann
 
Big Python
Big PythonBig Python
Big Python
 
Quantified Self mit Wearable Devices und Smartphone-Sensoren
Quantified Self mit Wearable Devices und Smartphone-SensorenQuantified Self mit Wearable Devices und Smartphone-Sensoren
Quantified Self mit Wearable Devices und Smartphone-Sensoren
 
Example Blood Pressure Report of BloodPressureCompanion
Example Blood Pressure Report of BloodPressureCompanionExample Blood Pressure Report of BloodPressureCompanion
Example Blood Pressure Report of BloodPressureCompanion
 
Beispiel-Blutdruckbericht des BlutdruckBegleiter
Beispiel-Blutdruckbericht des BlutdruckBegleiterBeispiel-Blutdruckbericht des BlutdruckBegleiter
Beispiel-Blutdruckbericht des BlutdruckBegleiter
 
Informatik für die Welt von Morgen
Informatik für die Welt von MorgenInformatik für die Welt von Morgen
Informatik für die Welt von Morgen
 
Python for High Performance and Scientific Computing
Python for High Performance and Scientific ComputingPython for High Performance and Scientific Computing
Python for High Performance and Scientific Computing
 

Recently uploaded

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 

Recently uploaded (20)

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 

Reproducible Science with Python

  • 1. Reproducible Science with Python Andreas Schreiber Department for Intelligent and Distributed Systems German Aerospace Center (DLR), Cologne/Berlin > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 1
  • 2. Simulations, experiments, data analytics, … Science > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 2 SIMULATION FAILED
  • 3. Reproducing results relies on • Open Source codes • Code reviews • Code repositories • Publications with code • Computational environment captured (Docker etc.) • Workflows • Open Data formats • Data management • (Electronics) laboratory notebooks • Provenance Reproducible Science > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 3
  • 4. Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. PROV W3C Working Group https://www.w3.org/TR/prov-overview Provenance > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 4
  • 5. W3C Provenance Working Group: https://www.w3.org/2011/prov PROV • The goal of PROV is to enable the wide publication and interchange of provenance on the Web and other information systems • PROV enables one to represent and interchange provenance information using widely available formats such as RDF and XML PROV > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 5
  • 6. Entities • Physical, digital, conceptual, or other kinds of things • For example, documents, web sites, graphics, or data sets Activities • Activities generate new entities or make use of existing entities • Activities could be actions or processes Agents • Agents takes a role in an activity and have the responsibility for the activity • For example, persons, pieces of software, or organizations Overview of PROV Key Concepts > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 6
  • 7. PROV Data Model > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 7 Activity Entity Agent wasGeneratedBy used wasDerivedFrom wasAttributedTo wasAssociatedWith
  • 8. Example Provenance > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 8 https://provenance.ecs.soton.ac.uk/store/documents/113794/
  • 9. Storing Provenance > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 9 Provenance Store Provenance recording during runtime Applications / Workflow Data (results)
  • 10. Some Storage Technologies • Relational databases and SQL • XML and Xpath • RDF and SPARQL • Graph databases and Gremlin/Cypher Services • REST APIs • ProvStore (University of Southampton) Storing and Retrieving Provenance > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 10
  • 11. ProvStore > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 11 https://provenance.ecs.soton.ac.uk/store/
  • 12. Provenance is a directed acyclic graph (DAG) Graphs > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 12 A B E F G D C
  • 13. Naturally, graph databases are a good technology for storing (Provenance) graphs Many graph databases are available • Neo4J • Titan • ArangoDB • ... Query languages • Cypher • Gremlin (TinkerPop) • GraphQL Graph Databases > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 13
  • 14. • Open-Source • Implemented in Java • Stores property graphs (key-value-based, directed) http://neo4j.com Neo4j > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 14
  • 15. Depends on your application (tools, languages, etc.) Libraries for Python • prov • provneo4j • NoWorkflow • … Tools • Git2PROV • Prov-sty for LaTeX • … Gathering Provenance > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 15
  • 16. Python Library prov https://github.com/trungdong/prov > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 16 from prov.model import ProvDocument # Create a new provenance document d1 = ProvDocument() # Entity: now:employment-article-v1.html e1 = d1.entity('now:employment-article-v1.html') # Agent: nowpeople:Bob d1.agent('nowpeople:Bob') # Attributing the article to the agent d1.wasAttributedTo(e1, 'nowpeople:Bob') d1.entity('govftp:oesm11st.zip', {'prov:label': 'employment-stats-2011', 'prov:type': 'void:Dataset'}) d1.wasDerivedFrom('now:employment-article-v1.html', 'govftp:oesm11st.zip') # Adding an activity d1.activity('is:writeArticle') d1.used('is:writeArticle', 'govftp:oesm11st.zip') d1.wasGeneratedBy('now:employment-article-v1.html', 'is:writeArticle')
  • 17. Python Library prov https://github.com/trungdong/prov > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 17
  • 18. Example > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 18 http://localhost:8888/notebooks/WeightCompanion PROV.ipynb
  • 19. provneo4j – Storing PROV Documents in Neo4j https://github.com/DLR-SC/provneo4j > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 19 import provneo4j.api provneo4j_api = provneo4j.api.Api( base_url="http://localhost:7474/db/data", username="neo4j", password="python") provneo4j_api.document.create(prov_doc, name=”MyProv”)
  • 20. provneo4j – Storing PROV Documents in Neo4j https://github.com/DLR-SC/provneo4j > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 20
  • 21. NoWorkflow – Provenance of Scripts https://github.com/gems-uff/noworkflow > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 21 Project experiment.py p12.dat p13.dat precipitation.py p14.dat out.png $ now run -e Tracker experiment.py
  • 22. Git2PROV http://git2prov.org > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 22
  • 23. > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 23
  • 24. LaTeX Annotations prov-sty for LaTeX https://github.com/prov-suite/prov-sty > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 24 begin{document} provAuthor{Andreas Schreiber}{http://orcid.org/0000-0001-5750-5649} provOrganization{German Aerospace Center (DLR)}{http://www.dlr.de} provTitle{A Provenance Model for Quantified Self Data} provProject {PROV-SPEC (FS12016)} {http://www.dlr.de/sc/desktopdefault.aspx/tabid-8073/} {http://www.bmwi.de/}
  • 25. Electronic Laboratory Notebook > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 25
  • 26. Query „Who worked on experiment X?“ > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 26 $experiment = g.key($_g, 'identifier', X) $user = $experiment/inE/inV/outE[@label = controlled_by]
  • 27. Practical Use Case – Archive all Data of a Paper > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 27 https://github.com/DLR-SC/DataFinder
  • 28. New PROV Library for Python in development • https://github.com/DLR-SC/prov-db-connector • Connectors for Neo4j implemented, ArangoDB planned • APIs in REST, ZeroMQ, MQTT Trusted Provenance • Storing Provenance in Blockchains Provenance for people • New approaches for visualization • For example, PROV Comics Current Research and Development > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 28
  • 29. > PyCon DE 2016 > Andreas Schreiber • Reproducible Science with Python > 29.10.2016DLR.de • Chart 29 Thank You! Questions? Andreas.Schreiber@dlr.de www.DLR.de/sc | @onyame