SlideShare a Scribd company logo
1 of 9
Download to read offline
Mining the Modern Code Review Repositories:
A Dataset of People, Process and Product
Xin Yang Raula G. Kula Norihiro Yoshida Hajimu Iida
May 14–15, 2016. Austin, Texas
MSR 2016 data showcase
Osaka University
Japan
Nagoya University
Japan
NAIST
Japan
NAIST
Japan
An Overview of the Code Review Dataset
1
● Code Review
● Source Code
● Human / Social
Why we made this dataset?
2
*Hamasaki et al., “Who does what during a code review? datasets of OSS peer review
repositories”. MSR '13
Our JSON-based
Dataset
(Hamasaki et al. MSR'13)*
Our previous work
(Hamasaki et al. MSR '13)*
Why we made this dataset?
2
*Hamasaki et al., “Who does what during a code review? datasets of OSS peer review
repositories”. MSR '13
Our JSON-based
Dataset
(Hamasaki et al. MSR'13)*
Some feedback:
“Hard to query...”
“Hard to convert...”
“Unable to access the source
code...”
Our previous work
(Hamasaki et al. MSR '13)*
Why we made this dataset?
2
*Hamasaki et al., “Who does what during a code review? datasets of OSS peer review
repositories”. MSR '13
Our JSON-based
Dataset
(Hamasaki et al. MSR'13)*
Some feedback:
“Hard to query...”
“Hard to convert...”
“Unable to access the source
code...”
Script
Typical Modern Code Review Process
3
Process
Product
People
You can mine from three different aspects
3
4 years 3 years 7 years 4 years 3 years
611 20 567 111 189
173,749 13,597 63,610 110,172 9,168
5,091 437 3,334 1,437 759
Dataset Statistics (updated to May 2015)
4
</></></>
goo.gl/Wi4UoJ
5
Download the Dataset

More Related Content

Viewers also liked

Mining Software Archives to Support Software Development
Mining Software Archives to Support Software DevelopmentMining Software Archives to Support Software Development
Mining Software Archives to Support Software DevelopmentThomas Zimmermann
 
Model Comparison for Delta-Compression
Model Comparison for Delta-CompressionModel Comparison for Delta-Compression
Model Comparison for Delta-CompressionMarkus Scheidgen
 
An Empirical Study of Goto in C Code from GitHub Repositories
An Empirical Study of Goto in C Code from GitHub RepositoriesAn Empirical Study of Goto in C Code from GitHub Repositories
An Empirical Study of Goto in C Code from GitHub RepositoriesSAIL_QU
 
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자Dylan Ko
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersTao Xie
 
연관도 분석을 이용한 데이터마이닝
연관도 분석을 이용한 데이터마이닝연관도 분석을 이용한 데이터마이닝
연관도 분석을 이용한 데이터마이닝Keunhyun Oh
 
고품질 Sw와 개발문화
고품질 Sw와 개발문화고품질 Sw와 개발문화
고품질 Sw와 개발문화도형 임
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Jujuseoul_engineer
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSung Kim
 
Dissertation Defense
Dissertation DefenseDissertation Defense
Dissertation DefenseSung Kim
 
위대한개발문화
위대한개발문화위대한개발문화
위대한개발문화신승환
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software RepositoriesIsrael Herraiz
 
Introduce Deep learning & A.I. Applications
Introduce Deep learning & A.I. ApplicationsIntroduce Deep learning & A.I. Applications
Introduce Deep learning & A.I. ApplicationsMario Cho
 
Crime Analysis using Data Analysis
Crime Analysis using Data AnalysisCrime Analysis using Data Analysis
Crime Analysis using Data AnalysisChetan Hireholi
 
Code coverage for MSR Researches [Work in Progress]
Code coverage for MSR Researches [Work in Progress]Code coverage for MSR Researches [Work in Progress]
Code coverage for MSR Researches [Work in Progress]Maurício Aniche
 
Creating and Analyzing Source Code Repository Models - A Model-based Approach...
Creating and Analyzing Source Code Repository Models - A Model-based Approach...Creating and Analyzing Source Code Repository Models - A Model-based Approach...
Creating and Analyzing Source Code Repository Models - A Model-based Approach...Markus Scheidgen
 
Oliot Consumer Electronics
Oliot Consumer ElectronicsOliot Consumer Electronics
Oliot Consumer ElectronicsDaeyoung Kim
 

Viewers also liked (18)

Mining Software Archives to Support Software Development
Mining Software Archives to Support Software DevelopmentMining Software Archives to Support Software Development
Mining Software Archives to Support Software Development
 
Model Comparison for Delta-Compression
Model Comparison for Delta-CompressionModel Comparison for Delta-Compression
Model Comparison for Delta-Compression
 
An Empirical Study of Goto in C Code from GitHub Repositories
An Empirical Study of Goto in C Code from GitHub RepositoriesAn Empirical Study of Goto in C Code from GitHub Repositories
An Empirical Study of Goto in C Code from GitHub Repositories
 
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that Matters
 
연관도 분석을 이용한 데이터마이닝
연관도 분석을 이용한 데이터마이닝연관도 분석을 이용한 데이터마이닝
연관도 분석을 이용한 데이터마이닝
 
고품질 Sw와 개발문화
고품질 Sw와 개발문화고품질 Sw와 개발문화
고품질 Sw와 개발문화
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Dissertation Defense
Dissertation DefenseDissertation Defense
Dissertation Defense
 
위대한개발문화
위대한개발문화위대한개발문화
위대한개발문화
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software Repositories
 
Introduce Deep learning & A.I. Applications
Introduce Deep learning & A.I. ApplicationsIntroduce Deep learning & A.I. Applications
Introduce Deep learning & A.I. Applications
 
Crime Analysis using Data Analysis
Crime Analysis using Data AnalysisCrime Analysis using Data Analysis
Crime Analysis using Data Analysis
 
06. graph mining
06. graph mining06. graph mining
06. graph mining
 
Code coverage for MSR Researches [Work in Progress]
Code coverage for MSR Researches [Work in Progress]Code coverage for MSR Researches [Work in Progress]
Code coverage for MSR Researches [Work in Progress]
 
Creating and Analyzing Source Code Repository Models - A Model-based Approach...
Creating and Analyzing Source Code Repository Models - A Model-based Approach...Creating and Analyzing Source Code Repository Models - A Model-based Approach...
Creating and Analyzing Source Code Repository Models - A Model-based Approach...
 
Oliot Consumer Electronics
Oliot Consumer ElectronicsOliot Consumer Electronics
Oliot Consumer Electronics
 

Similar to Mining the Modern Code Review Repositories: A Dataset of People, Process and Product (MSR 2016)

ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...Advanced-Concepts-Team
 
Love Can't Wait! Optimizing PageLoad Time of SPAs at Zoosk [FutureStack16]
Love Can't Wait!  Optimizing PageLoad Time of SPAs at Zoosk [FutureStack16]Love Can't Wait!  Optimizing PageLoad Time of SPAs at Zoosk [FutureStack16]
Love Can't Wait! Optimizing PageLoad Time of SPAs at Zoosk [FutureStack16]New Relic
 
Appendix A Work Distribution
Appendix A  Work DistributionAppendix A  Work Distribution
Appendix A Work DistributionSoumyadeepBasu4
 
Appendix A Work Distribution
Appendix A  Work DistributionAppendix A  Work Distribution
Appendix A Work DistributionSoumyadeepBasu4
 
(Big) Data for Research for "Science, Technology and Entrepreneurship"
(Big) Data for Research for "Science, Technology and Entrepreneurship"(Big) Data for Research for "Science, Technology and Entrepreneurship"
(Big) Data for Research for "Science, Technology and Entrepreneurship"Yasushi Hara
 
ER 2016 Tutorial
ER 2016 TutorialER 2016 Tutorial
ER 2016 TutorialRim Moussa
 
eNanoMapper database, search tools and templates
eNanoMapper database, search tools and templateseNanoMapper database, search tools and templates
eNanoMapper database, search tools and templatesNina Jeliazkova
 
Ischools workshop - 4 - data discovery
Ischools workshop - 4 - data discoveryIschools workshop - 4 - data discovery
Ischools workshop - 4 - data discoveryARDC
 
Week 2 tyoes of databases and ERD 2020
Week  2 tyoes of databases and ERD  2020Week  2 tyoes of databases and ERD  2020
Week 2 tyoes of databases and ERD 2020Osama Ghandour Geris
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engineKeeyong Han
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - IntroductionAlex Meadows
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...Anubhav Jain
 
Don't panic! - Postgres introduction
Don't panic! - Postgres introductionDon't panic! - Postgres introduction
Don't panic! - Postgres introductionFederico Campoli
 
Getting Started with MongoDB (TCF ITPC 2014)
Getting Started with MongoDB (TCF ITPC 2014)Getting Started with MongoDB (TCF ITPC 2014)
Getting Started with MongoDB (TCF ITPC 2014)Michael Redlich
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and OntarioBigData_Europe
 
The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...University of California, San Diego
 
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...Fabrizio Orlandi
 
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsLink Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsHolistic Benchmarking of Big Linked Data
 
Big data forum 19 march 2014
Big data forum   19 march 2014Big data forum   19 march 2014
Big data forum 19 march 2014Matt Carroll
 

Similar to Mining the Modern Code Review Repositories: A Dataset of People, Process and Product (MSR 2016) (20)

ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
 
Love Can't Wait! Optimizing PageLoad Time of SPAs at Zoosk [FutureStack16]
Love Can't Wait!  Optimizing PageLoad Time of SPAs at Zoosk [FutureStack16]Love Can't Wait!  Optimizing PageLoad Time of SPAs at Zoosk [FutureStack16]
Love Can't Wait! Optimizing PageLoad Time of SPAs at Zoosk [FutureStack16]
 
Introduction to Yasson
Introduction to YassonIntroduction to Yasson
Introduction to Yasson
 
Appendix A Work Distribution
Appendix A  Work DistributionAppendix A  Work Distribution
Appendix A Work Distribution
 
Appendix A Work Distribution
Appendix A  Work DistributionAppendix A  Work Distribution
Appendix A Work Distribution
 
(Big) Data for Research for "Science, Technology and Entrepreneurship"
(Big) Data for Research for "Science, Technology and Entrepreneurship"(Big) Data for Research for "Science, Technology and Entrepreneurship"
(Big) Data for Research for "Science, Technology and Entrepreneurship"
 
ER 2016 Tutorial
ER 2016 TutorialER 2016 Tutorial
ER 2016 Tutorial
 
eNanoMapper database, search tools and templates
eNanoMapper database, search tools and templateseNanoMapper database, search tools and templates
eNanoMapper database, search tools and templates
 
Ischools workshop - 4 - data discovery
Ischools workshop - 4 - data discoveryIschools workshop - 4 - data discovery
Ischools workshop - 4 - data discovery
 
Week 2 tyoes of databases and ERD 2020
Week  2 tyoes of databases and ERD  2020Week  2 tyoes of databases and ERD  2020
Week 2 tyoes of databases and ERD 2020
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - Introduction
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...
 
Don't panic! - Postgres introduction
Don't panic! - Postgres introductionDon't panic! - Postgres introduction
Don't panic! - Postgres introduction
 
Getting Started with MongoDB (TCF ITPC 2014)
Getting Started with MongoDB (TCF ITPC 2014)Getting Started with MongoDB (TCF ITPC 2014)
Getting Started with MongoDB (TCF ITPC 2014)
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and Ontario
 
The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...
 
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
 
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsLink Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
 
Big data forum 19 march 2014
Big data forum   19 march 2014Big data forum   19 march 2014
Big data forum 19 march 2014
 

More from Norihiro Yoshida

ファジングツールAFLの利用を支援するツールFuzz4Bによるファジング教育の試み
ファジングツールAFLの利用を支援するツールFuzz4Bによるファジング教育の試みファジングツールAFLの利用を支援するツールFuzz4Bによるファジング教育の試み
ファジングツールAFLの利用を支援するツールFuzz4Bによるファジング教育の試みNorihiro Yoshida
 
Extracting a Micro State Transition Table Using KLEE
Extracting a Micro State Transition Table Using KLEEExtracting a Micro State Transition Table Using KLEE
Extracting a Micro State Transition Table Using KLEENorihiro Yoshida
 
A Quantitative Comparison of Coverage-Based Greybox Fuzzers
A Quantitative Comparison of Coverage-Based Greybox FuzzersA Quantitative Comparison of Coverage-Based Greybox Fuzzers
A Quantitative Comparison of Coverage-Based Greybox FuzzersNorihiro Yoshida
 
ソフトウェア開発における産学協創フォーラム オープニング資料
ソフトウェア開発における産学協創フォーラム オープニング資料ソフトウェア開発における産学協創フォーラム オープニング資料
ソフトウェア開発における産学協創フォーラム オープニング資料Norihiro Yoshida
 
コードクローン 検出・変更管理ツール群の開発
コードクローン 検出・変更管理ツール群の開発コードクローン 検出・変更管理ツール群の開発
コードクローン 検出・変更管理ツール群の開発Norihiro Yoshida
 
Proactive Clone Recommendation System for Extract Method Refactoring
 Proactive Clone Recommendation System for Extract Method Refactoring Proactive Clone Recommendation System for Extract Method Refactoring
Proactive Clone Recommendation System for Extract Method RefactoringNorihiro Yoshida
 
Code Search Based on Deep Neural Network and Code Mutation
Code Search Based on Deep Neural Network and Code MutationCode Search Based on Deep Neural Network and Code Mutation
Code Search Based on Deep Neural Network and Code MutationNorihiro Yoshida
 
機械学習システムにおける技術的負債について
機械学習システムにおける技術的負債について機械学習システムにおける技術的負債について
機械学習システムにおける技術的負債についてNorihiro Yoshida
 
When, why and for whom do practitioners detect technical debts?: An experienc...
When, why and for whom do practitioners detect technical debts?: An experienc...When, why and for whom do practitioners detect technical debts?: An experienc...
When, why and for whom do practitioners detect technical debts?: An experienc...Norihiro Yoshida
 
Revisiting the Relationship Between Code Smells and Refactoring
Revisiting the Relationship Between Code Smells and RefactoringRevisiting the Relationship Between Code Smells and Refactoring
Revisiting the Relationship Between Code Smells and RefactoringNorihiro Yoshida
 

More from Norihiro Yoshida (12)

ファジングツールAFLの利用を支援するツールFuzz4Bによるファジング教育の試み
ファジングツールAFLの利用を支援するツールFuzz4Bによるファジング教育の試みファジングツールAFLの利用を支援するツールFuzz4Bによるファジング教育の試み
ファジングツールAFLの利用を支援するツールFuzz4Bによるファジング教育の試み
 
Extracting a Micro State Transition Table Using KLEE
Extracting a Micro State Transition Table Using KLEEExtracting a Micro State Transition Table Using KLEE
Extracting a Micro State Transition Table Using KLEE
 
A Quantitative Comparison of Coverage-Based Greybox Fuzzers
A Quantitative Comparison of Coverage-Based Greybox FuzzersA Quantitative Comparison of Coverage-Based Greybox Fuzzers
A Quantitative Comparison of Coverage-Based Greybox Fuzzers
 
ソフトウェア開発における産学協創フォーラム オープニング資料
ソフトウェア開発における産学協創フォーラム オープニング資料ソフトウェア開発における産学協創フォーラム オープニング資料
ソフトウェア開発における産学協創フォーラム オープニング資料
 
コードクローン 検出・変更管理ツール群の開発
コードクローン 検出・変更管理ツール群の開発コードクローン 検出・変更管理ツール群の開発
コードクローン 検出・変更管理ツール群の開発
 
Proactive Clone Recommendation System for Extract Method Refactoring
 Proactive Clone Recommendation System for Extract Method Refactoring Proactive Clone Recommendation System for Extract Method Refactoring
Proactive Clone Recommendation System for Extract Method Refactoring
 
Code Search Based on Deep Neural Network and Code Mutation
Code Search Based on Deep Neural Network and Code MutationCode Search Based on Deep Neural Network and Code Mutation
Code Search Based on Deep Neural Network and Code Mutation
 
機械学習システムにおける技術的負債について
機械学習システムにおける技術的負債について機械学習システムにおける技術的負債について
機械学習システムにおける技術的負債について
 
When, why and for whom do practitioners detect technical debts?: An experienc...
When, why and for whom do practitioners detect technical debts?: An experienc...When, why and for whom do practitioners detect technical debts?: An experienc...
When, why and for whom do practitioners detect technical debts?: An experienc...
 
Revisiting the Relationship Between Code Smells and Refactoring
Revisiting the Relationship Between Code Smells and RefactoringRevisiting the Relationship Between Code Smells and Refactoring
Revisiting the Relationship Between Code Smells and Refactoring
 
IWESEP 2013
IWESEP 2013IWESEP 2013
IWESEP 2013
 
MSR2013
MSR2013MSR2013
MSR2013
 

Recently uploaded

20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.Sharon Liu
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesShyamsundar Das
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfBrain Inventory
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmonyelliciumsolutionspun
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageDista
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptkinjal48
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesSoftwareMill
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfTobias Schneck
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionsNirav Modi
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsJaydeep Chhasatia
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024Mind IT Systems
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Jaydeep Chhasatia
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilVICTOR MAESTRE RAMIREZ
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIIvo Andreev
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 

Recently uploaded (20)

20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.ppt
 
Salesforce AI Associate Certification.pptx
Salesforce AI Associate Certification.pptxSalesforce AI Associate Certification.pptx
Salesforce AI Associate Certification.pptx
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-Council
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 

Mining the Modern Code Review Repositories: A Dataset of People, Process and Product (MSR 2016)

  • 1. Mining the Modern Code Review Repositories: A Dataset of People, Process and Product Xin Yang Raula G. Kula Norihiro Yoshida Hajimu Iida May 14–15, 2016. Austin, Texas MSR 2016 data showcase Osaka University Japan Nagoya University Japan NAIST Japan NAIST Japan
  • 2. An Overview of the Code Review Dataset 1 ● Code Review ● Source Code ● Human / Social
  • 3. Why we made this dataset? 2 *Hamasaki et al., “Who does what during a code review? datasets of OSS peer review repositories”. MSR '13 Our JSON-based Dataset (Hamasaki et al. MSR'13)*
  • 4. Our previous work (Hamasaki et al. MSR '13)* Why we made this dataset? 2 *Hamasaki et al., “Who does what during a code review? datasets of OSS peer review repositories”. MSR '13 Our JSON-based Dataset (Hamasaki et al. MSR'13)* Some feedback: “Hard to query...” “Hard to convert...” “Unable to access the source code...”
  • 5. Our previous work (Hamasaki et al. MSR '13)* Why we made this dataset? 2 *Hamasaki et al., “Who does what during a code review? datasets of OSS peer review repositories”. MSR '13 Our JSON-based Dataset (Hamasaki et al. MSR'13)* Some feedback: “Hard to query...” “Hard to convert...” “Unable to access the source code...” Script
  • 6. Typical Modern Code Review Process 3
  • 7. Process Product People You can mine from three different aspects 3
  • 8. 4 years 3 years 7 years 4 years 3 years 611 20 567 111 189 173,749 13,597 63,610 110,172 9,168 5,091 437 3,334 1,437 759 Dataset Statistics (updated to May 2015) 4 </></></>

Editor's Notes

  1. Why we made this dataset? Code review dataset from 5 successful OSS projects Source code from Git Human and social information (anonymized usernames and email addresses)
  2. Our previous work in MSR 2013 provides JSON format dataset and refined dataset with csv format. In these 3 years we have received many feedback from our dataset users. Some users complained that : ……. Thus, we improved our dataset by converting JSON to MySQL database, and provide shell scripts to access source code...
  3. Our previous work in MSR 2013 provides JSON format dataset and refined dataset with csv format. In these 3 years we have received many feedback from our dataset users. Some users complained that : ……. Thus, we improved our dataset by converting JSON to MySQL database, and provide shell scripts to access source code...
  4. Our previous work in MSR 2013 provides JSON format dataset and refined dataset with csv format. In these 3 years we have received many feedback from our dataset users. Some users complained that : ……. Thus, we improved our dataset by converting JSON to MySQL database, and provide shell scripts to access source code...
  5. This is a typical MCR process, Author create and update their patches (changes), Reviewers perform code reviews on changes and send feedback to authors Continuous Integration (CI) tools build and test changes, After several times revisions, the changes will pass reviews and be integrated to code repositories
  6. Our dataset try to retrieve the data from three different aspect of code review process. First, how developers, reviewers and CI tools collaborate (see People) Second, what is the life cycle of a change from initial commit to final decision (see Process) Final, what is the product of code review (see Product).
  7. Some basic statistics about our dataset We retrieve data from 5 big-scale successful OSS projects: OpenStack, Libreoffice, AOSP, Qt and Eclipse Time: how long this project use Gerrit code review (from the time they adopted Gerrit) Repositories: how many repositories are involved Patches: how many changes have been created Participants: how many people have participated in
  8. You can download our dataset here and now!