SlideShare a Scribd company logo
1 of 22
Download to read offline
Mining Python
Software
Sarah Mount - @snim2
What do you want to know today?
What do we know about software?
● How to make it correct
● How long it will take to write
● Expected bugs per kloc
Er … yeah.
Health warning...
This is a work in progress,
don’t take the numbers and
charts too seriously just yet...
Options for mining Python software
<?xml version="1.0" encoding="UTF-8"?>
<response>
<status>success</status>
<result>
<project>
<id>1</id>
<name>Subversion</name>
<created_at>2006-10-10T15:51:31Z</created_at>
<updated_at>2007-08-22T17:31:17Z</updated_at>
<homepage_url>http://subversion.tigris.org/</homepage_url>
<download_url>http://subversion.tigris.org/...
</download_url>
<updated_at>2007-07-12T12:21:11Z</updated_at>
<logged_at>2007-07-12T12:18:54Z</logged_at>
<min_month>2001-08-01T00:00:00Z</min_month>
<max_month>2007-07-01T00:00:00Z</max_month>
...
{
"repository":{
"url":"https://github.com/igrigorik/spdy",
"has_downloads":false,
"created_at":"2012/01/19 14:15:34 -0800",
"has_issues":true,
"description":"SPDY is an experiment with protocols for the web",
"forks":10,
"fork":false,
"has_wiki":false,
"homepage":"http://www.igvita.com/2011/04/07/life-beyond-http-11-googles-spdy/",
"size":420,
"private":false,
"name":"spdy",
"owner":"igrigorik",
"open_issues":4,
"watchers":206,
"pushed_at":"2012/01/11 10:38:16 -0700",
"language":"Ruby"
},
"created_at":"2012/02/11 10:38:16 -0700",
"public":true,
"actor":"igrigorik",
"payload":{
"head":"98f44cab69becb274c6f3b9035ef8e0bd7b2b1b7",
"size":1,
...
],
"ref":"refs/heads/master"
},
"url":"https://github.com/igrigorik/spdy/compare/5b74597e88...98f44cab69b",
"type":"PushEvent"
}
Google bigquery interface
/* top 100 repos for Ruby by number of pushes */
SELECT repository_name, count(repository_name) as pushes, repository_description,
repository_url
FROM [githubarchive:github.timeline]
WHERE type="PushEvent"
AND repository_language="Ruby"
AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC('2012-04-01 00:00:00')
GROUP BY repository_name, repository_description, repository_url
ORDER BY pushes DESC
LIMIT 100
Some preliminary work
Code clones
Type 1: Identical code, copy & pasted
Type 2: Identical code modulo names, layout,
comments, etc.
Type 3: Type 2 plus further modifications such
as changes in statements
Type 4: Different code, same semantics
Roy & Cordy (2007)
Sentiment (in comments)
Some ideas for mining projects
Mining ideas
● How do programming idioms develop and
spread?
● How do projects reach a critical mass of
developers and become “popular”?
● Are metrics like cyclomatic complexity, fan
out and Halstead’s complexity measure
useful, or are they all just proportional to
kLOCs?
Thank you.

More Related Content

Similar to Mining python-software-pyconuk13

Lessons Learned from Migrating Legacy Enterprise Applications to Microservices
Lessons Learned from Migrating Legacy Enterprise Applications to MicroservicesLessons Learned from Migrating Legacy Enterprise Applications to Microservices
Lessons Learned from Migrating Legacy Enterprise Applications to MicroservicesVMware Tanzu
 
Scraping with Python for Fun and Profit - PyCon India 2010
Scraping with Python for Fun and Profit - PyCon India 2010Scraping with Python for Fun and Profit - PyCon India 2010
Scraping with Python for Fun and Profit - PyCon India 2010Abhishek Mishra
 
Exadata cell update
Exadata cell updateExadata cell update
Exadata cell updatepat2001
 
Changing Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentChanging Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentMatt Graham
 
Web TCard - Speed optimization
Web TCard - Speed optimizationWeb TCard - Speed optimization
Web TCard - Speed optimizationEric Guo
 
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.Biz
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.BizAdvanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.Biz
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.BizSuthep Sangvirotjanaphat
 
Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023Scott Keck-Warren
 
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docx
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docxCisco Network Proposal Part 1by Jesse HolmesSubmission d.docx
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docxclarebernice
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesAltinity Ltd
 
Is your Magento fast enough?
Is your Magento fast enough?Is your Magento fast enough?
Is your Magento fast enough?Giannis Economou
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekrantav
 
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчика
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчикаМы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчика
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчикаNikita Prokopov
 
Continuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeployContinuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeployPeter Gfader
 
Continuous Delivery and Automated Operations on k8s with keptn
Continuous Delivery and Automated Operations on k8s with keptnContinuous Delivery and Automated Operations on k8s with keptn
Continuous Delivery and Automated Operations on k8s with keptnAndreas Grabner
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Intel® Software
 
Oracle_Patching_Untold_Story_Final_Part2.pdf
Oracle_Patching_Untold_Story_Final_Part2.pdfOracle_Patching_Untold_Story_Final_Part2.pdf
Oracle_Patching_Untold_Story_Final_Part2.pdfAlex446314
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsMike Brittain
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayCosimo Streppone
 
Introduction to SaltStack (An Event-Based Configuration Management)
Introduction to SaltStack (An Event-Based Configuration Management)Introduction to SaltStack (An Event-Based Configuration Management)
Introduction to SaltStack (An Event-Based Configuration Management)DevOps Indonesia
 

Similar to Mining python-software-pyconuk13 (20)

Lessons Learned from Migrating Legacy Enterprise Applications to Microservices
Lessons Learned from Migrating Legacy Enterprise Applications to MicroservicesLessons Learned from Migrating Legacy Enterprise Applications to Microservices
Lessons Learned from Migrating Legacy Enterprise Applications to Microservices
 
Scraping with Python for Fun and Profit - PyCon India 2010
Scraping with Python for Fun and Profit - PyCon India 2010Scraping with Python for Fun and Profit - PyCon India 2010
Scraping with Python for Fun and Profit - PyCon India 2010
 
Exadata cell update
Exadata cell updateExadata cell update
Exadata cell update
 
Changing Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentChanging Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous Deployment
 
Web TCard - Speed optimization
Web TCard - Speed optimizationWeb TCard - Speed optimization
Web TCard - Speed optimization
 
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.Biz
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.BizAdvanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.Biz
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.Biz
 
Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023
 
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docx
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docxCisco Network Proposal Part 1by Jesse HolmesSubmission d.docx
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docx
 
Shell_Rec
Shell_RecShell_Rec
Shell_Rec
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
Is your Magento fast enough?
Is your Magento fast enough?Is your Magento fast enough?
Is your Magento fast enough?
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk week
 
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчика
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчикаМы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчика
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчика
 
Continuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeployContinuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeploy
 
Continuous Delivery and Automated Operations on k8s with keptn
Continuous Delivery and Automated Operations on k8s with keptnContinuous Delivery and Automated Operations on k8s with keptn
Continuous Delivery and Automated Operations on k8s with keptn
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Oracle_Patching_Untold_Story_Final_Part2.pdf
Oracle_Patching_Untold_Story_Final_Part2.pdfOracle_Patching_Untold_Story_Final_Part2.pdf
Oracle_Patching_Untold_Story_Final_Part2.pdf
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard Way
 
Introduction to SaltStack (An Event-Based Configuration Management)
Introduction to SaltStack (An Event-Based Configuration Management)Introduction to SaltStack (An Event-Based Configuration Management)
Introduction to SaltStack (An Event-Based Configuration Management)
 

Recently uploaded

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Recently uploaded (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Mining python-software-pyconuk13