SlideShare a Scribd company logo
1 of 17
Download to read offline
PyDriller: Python Framework for
Mining Software Repositories
Davide Spadini, Mauricio Aniche, Alberto Bacchelli
PyDriller: Python Framework for
Mining Software Repositories
Davide Spadini, Mauricio Aniche, Alberto Bacchelli
ishepard @DavideSpadini
What?
Framework to analyse Git (and soon Mercurial)
repositories
Why?
• There are already many frameworks for Git
• Generally, one for each programming language
• Java -> JGit
• Python -> GitPython
• Javascript -> nodegit
• etc.
So, why?
How many commands does Git have?
• > 20?
• > 50?
• > 100?
• > 150?
154!!
PyDriller
• Aim: to ease the extraction of information from Git repositories
• What is supported:
• analysing the history of a project
• retrieving commit information (date, message, authors, etc.)
• retrieving files information (diff, source code)
• What is not supported:
• writing on the repo (git pull, git push, git add, git commit,
etc..)
Demo
Statistics
• Everything is lazy evaluated, so you “pay” what you get.
1. only commit information:
immediate (as git log)
2. commit and file information:
60 commits/sec (1240 commits in 22 seconds)
3. commit, file and metrics information:
4 commits/s (1240 commits in ~5min)
Thank you for your support!
• Some numbers:
1. Downloaded approximatively 4000 times
2. 100 times only last 2 weeks
• Community driven
• University of Zurich, TU Delft and University of Catania teach
PyDriller in their MSR courses
• SIG uses PyDriller in their quality assessments
What’s next?
• A company asked me to implement
RepositoryMining().traverse_files()
• Mercurial support
• Ideas? Talk to me or submit a PR :)
PyDriller
• Source code: https://github.com/ishepard/pydriller
• Doc: https://pydriller.readthedocs.io/en/latest/
• Feel free to leave a star! :)

More Related Content

Similar to PyDriller: Python Framework for Mining Software Repositories

LTから入門するPython開発環境 #PyLadiesTokyo
LTから入門するPython開発環境 #PyLadiesTokyoLTから入門するPython開発環境 #PyLadiesTokyo
LTから入門するPython開発環境 #PyLadiesTokyoHidenori Matsuki
 
OSINT tools for security auditing [FOSDEM edition]
OSINT tools for security auditing [FOSDEM edition] OSINT tools for security auditing [FOSDEM edition]
OSINT tools for security auditing [FOSDEM edition] Jose Manuel Ortega Candel
 
우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고용 최
 
Developing Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonDeveloping Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonSmartBear
 
The Five Stages of Enterprise Jupyter Deployment
The Five Stages of Enterprise Jupyter DeploymentThe Five Stages of Enterprise Jupyter Deployment
The Five Stages of Enterprise Jupyter DeploymentFrederick Reiss
 
Git for folk who like GUIs
Git for folk who like GUIsGit for folk who like GUIs
Git for folk who like GUIsTim Osborn
 
Azure Container Apps
Azure Container AppsAzure Container Apps
Azure Container AppsICS
 
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석iFunFactory Inc.
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with RBarbara Fusinska
 
PythonSD Test Driven Django Development Workshop
PythonSD Test Driven Django Development WorkshopPythonSD Test Driven Django Development Workshop
PythonSD Test Driven Django Development Workshoppythonsd
 
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internetOpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internettkisason
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Toolsm_richardson
 
Smile Gupta - Hacktoberfest Celebration 2020
Smile Gupta - Hacktoberfest Celebration 2020Smile Gupta - Hacktoberfest Celebration 2020
Smile Gupta - Hacktoberfest Celebration 2020Smile Gupta
 
Apache Geode - The First Six Months
Apache Geode -  The First Six MonthsApache Geode -  The First Six Months
Apache Geode - The First Six MonthsAnthony Baker
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Resumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUSResumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUSkhangtoh
 

Similar to PyDriller: Python Framework for Mining Software Repositories (20)

LTから入門するPython開発環境 #PyLadiesTokyo
LTから入門するPython開発環境 #PyLadiesTokyoLTから入門するPython開発環境 #PyLadiesTokyo
LTから入門するPython開発環境 #PyLadiesTokyo
 
OSINT tools for security auditing [FOSDEM edition]
OSINT tools for security auditing [FOSDEM edition] OSINT tools for security auditing [FOSDEM edition]
OSINT tools for security auditing [FOSDEM edition]
 
우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고
 
고등수학 스터디 결과발표
고등수학 스터디 결과발표고등수학 스터디 결과발표
고등수학 스터디 결과발표
 
hotdog a TD tool for DD
hotdog a TD tool for DDhotdog a TD tool for DD
hotdog a TD tool for DD
 
Developing Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonDeveloping Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & Python
 
The Five Stages of Enterprise Jupyter Deployment
The Five Stages of Enterprise Jupyter DeploymentThe Five Stages of Enterprise Jupyter Deployment
The Five Stages of Enterprise Jupyter Deployment
 
Git for folk who like GUIs
Git for folk who like GUIsGit for folk who like GUIs
Git for folk who like GUIs
 
Azure Container Apps
Azure Container AppsAzure Container Apps
Azure Container Apps
 
Azure Container Apps
Azure Container AppsAzure Container Apps
Azure Container Apps
 
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
PythonSD Test Driven Django Development Workshop
PythonSD Test Driven Django Development WorkshopPythonSD Test Driven Django Development Workshop
PythonSD Test Driven Django Development Workshop
 
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internetOpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Tools
 
Github basics
Github basicsGithub basics
Github basics
 
Smile Gupta - Hacktoberfest Celebration 2020
Smile Gupta - Hacktoberfest Celebration 2020Smile Gupta - Hacktoberfest Celebration 2020
Smile Gupta - Hacktoberfest Celebration 2020
 
Apache Geode - The First Six Months
Apache Geode -  The First Six MonthsApache Geode -  The First Six Months
Apache Geode - The First Six Months
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Resumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUSResumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUS
 

More from Delft University of Technology

More from Delft University of Technology (7)

Investigating Severity Thresholds for Test Smells
Investigating Severity Thresholds for Test SmellsInvestigating Severity Thresholds for Test Smells
Investigating Severity Thresholds for Test Smells
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code Review
 
Test-Driven Code Review: An Empirical Study
Test-Driven Code Review: An Empirical StudyTest-Driven Code Review: An Empirical Study
Test-Driven Code Review: An Empirical Study
 
Practices and Tools for Better Software Testing
Practices and Tools for  Better Software TestingPractices and Tools for  Better Software Testing
Practices and Tools for Better Software Testing
 
When Testing Meets Code Review: Why and How Developers Review Tests
When Testing Meets Code Review: Why and How Developers Review TestsWhen Testing Meets Code Review: Why and How Developers Review Tests
When Testing Meets Code Review: Why and How Developers Review Tests
 
On The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code QualityOn The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code Quality
 
To Mock or Not To Mock
To Mock or Not To MockTo Mock or Not To Mock
To Mock or Not To Mock
 

Recently uploaded

Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectErbil Polytechnic University
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 

Recently uploaded (20)

Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction Project
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 

PyDriller: Python Framework for Mining Software Repositories

  • 1. PyDriller: Python Framework for Mining Software Repositories Davide Spadini, Mauricio Aniche, Alberto Bacchelli
  • 2. PyDriller: Python Framework for Mining Software Repositories Davide Spadini, Mauricio Aniche, Alberto Bacchelli ishepard @DavideSpadini
  • 4. Framework to analyse Git (and soon Mercurial) repositories
  • 6. • There are already many frameworks for Git • Generally, one for each programming language • Java -> JGit • Python -> GitPython • Javascript -> nodegit • etc.
  • 8.
  • 9.
  • 10.
  • 11. How many commands does Git have? • > 20? • > 50? • > 100? • > 150? 154!!
  • 12. PyDriller • Aim: to ease the extraction of information from Git repositories • What is supported: • analysing the history of a project • retrieving commit information (date, message, authors, etc.) • retrieving files information (diff, source code) • What is not supported: • writing on the repo (git pull, git push, git add, git commit, etc..)
  • 13. Demo
  • 14. Statistics • Everything is lazy evaluated, so you “pay” what you get. 1. only commit information: immediate (as git log) 2. commit and file information: 60 commits/sec (1240 commits in 22 seconds) 3. commit, file and metrics information: 4 commits/s (1240 commits in ~5min)
  • 15. Thank you for your support! • Some numbers: 1. Downloaded approximatively 4000 times 2. 100 times only last 2 weeks • Community driven • University of Zurich, TU Delft and University of Catania teach PyDriller in their MSR courses • SIG uses PyDriller in their quality assessments
  • 16. What’s next? • A company asked me to implement RepositoryMining().traverse_files() • Mercurial support • Ideas? Talk to me or submit a PR :)
  • 17. PyDriller • Source code: https://github.com/ishepard/pydriller • Doc: https://pydriller.readthedocs.io/en/latest/ • Feel free to leave a star! :)