Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Learning from 6,000 projects mining specifications in the large

9,654 views

Published on

Models—abstract and simple descriptions of some artifact—are the backbone of all software engineering activities. While writing models is hard, existing code can serve as a source for abstract descriptions of how software behaves. To infer correct usage, code analysis needs usage examples, though; the more, the better.
We have built a lightweight parser that efficiently extracts API usage models from source code—models that can then be used to detect anomalies. Applied on the 200 mil- lion lines of code of the Gentoo Linux distribution, we would extract more than 15 million API constraints. On the web site checkmycode.org, anyone can check his/her code against the “wisdom of Linux”.

Published in: Technology
  • My struggles with my dissertation were long gone since the day I contacted Emily for my dissertation help. Great assistance by guys from ⇒⇒⇒WRITE-MY-PAPER.net ⇐⇐⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • You'll notice the difference immediately when you make the switch from working with amateurs to working with professionals. I've been betting with these guys for more than three years and in that time I've made nearly £500,000! That's a life changing amount of money. If I can give you one piece of advice it's this – sign up and sign up NOW! Last time I was one of the last guys to grab a spot before Patrick closed the doors. If I hadn't gotten lucky that day I'd be half a million pounds poorer now and my life would be a hell of a lot different. ✱✱✱ https://url.cn/gjzSn2x2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • It's been such a relief to work with a team of professionals instead of following the advice of some faceless internet tipster. These guys really know what they're talking about. You can definitely see what a difference field research makes when you look at the results. I'm making more than £3,000 weekly and it feels fantastic! Thanks guys. =>> https://url.cn/ycUAbdYm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • New $31,515 USD profit system revealed. using an automated software that predicts the winners for Soccer, Horse racing and even major sports such as NBA, NFL, NHL, MLB!  http://scamcb.com/zcodesys/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Learning from 6,000 projects mining specifications in the large

  1. Learning from 6,000 Projects Mining Models in the Large Andreas Zeller Saarland University
  2. Saarbrücken
  3. Saarbrücken
  4. Saarbrücken
  5. Saarbrücken
  6. Saarbrücken
  7. Saarbrücken
  8. Saarbrücken ® Visual Computing Institute
  9. Saarbrücken
  10. Some numbers
  11. Some numbers • ~70 PhD advisors in computer science
  12. Some numbers • ~70 PhD advisors in computer science • ≥ 300 PhD students in computer science
  13. Some numbers • ~70 PhD advisors in computer science • ≥ 300 PhD students in computer science • ~60 new PhD graduates per year
  14. Some numbers • ~70 PhD advisors in computer science • ≥ 300 PhD students in computer science • ~60 new PhD graduates per year • ~60 new MSc graduates per year
  15. Some numbers • ~70 PhD advisors in computer science • ≥ 300 PhD students in computer science • ~60 new PhD graduates per year • ~60 new MSc graduates per year • 800–1400 € per month as a PhD stipend (+ laptop & office • starting right after BSc • all courses in English)
  16. Two Graduates Michael Backes Andrej Rybalchenko TR35 in 2009 TR35 in 2010
  17. Michael Backes Andrej Rybalchenko
  18. secure protocols Andrej Rybalchenko
  19. secure protocols loop termination
  20. secure protocols loop termination hard to verify
  21. secure protocols loop termination hard to verify
  22. information ow secure protocols loop termination hard to verify
  23. information ow liveness secure protocols loop termination hard to verify
  24. buffer over ow information ow liveness secure protocols loop termination hard to verify
  25. buffer over ow resource leaks information ow liveness secure protocols loop termination hard to verify
  26. buffer over ow resource leaks information ow liveness secure protocols loop termination easy to specify hard to verify
  27. hard to specify
  28. sorting hard to specify
  29. ∀i ∈ {0, . . . , |x |} : x [i] < x [i + 1] |x| = |x | ∀i ∈ {0, . . . , |x|} : ιi ∈ {0, . . . , |x |} : x[i] = x [i ] ∀i ∈ {0, . . . , |x |} : ιi ∈ {0, . . . , |x|} : x [i ] = x[i] hard to specify
  30. ∀i ∈ {0, . . . , |x |} : x [i] < x [i + 1] |x| = |x | ∀i ∈ {0, . . . , |x|} : ιi ∈ {0, . . . , |x |} : x[i] = x [i ] ∀i ∈ {0, . . . , |x |} : ιi ∈ {0, . . . , |x|} : x [i ] = x[i] easy to verify hard to specify
  31. is-sorted(x ) ∧ is-permutation(x, x ) still hard to specify
  32. microsoft word
  33. microsoft word travel booking
  34. microsoft word travel booking airplane control
  35. microsoft word mobile phones travel booking airplane control
  36. microsoft word mobile phones travel booking operating systems airplane control
  37. microsoft word mobile phones travel booking operating systems airplane control banking systems
  38. microsoft word mobile phones travel booking operating systems airplane control banking systems hard to specify
  39. microsoft word mobile phones travel booking operating systems airplane control banking systems easy to verify hard to specify
  40. hard to specify
  41. hard to specify new language • duplicate effort • can’t abstract from details
  42. speci cation crisis
  43. mine speci cations
  44. mine speci cations
  45. mine speci cations from 6,000 projects
  46. Speci cations ∀i ∈ {0, . . . , |x |} : x [i] < x [i + 1] |x| = |x | ∀i ∈ {0, . . . , |x|} : ιi ∈ {0, . . . , |x |} : x[i] = x [i ] ∀i ∈ {0, . . . , |x |} : ιi ∈ {0, . . . , |x|} : x [i ] = x[i] pre- and postconditions
  47. Speci cations auth()! <init>() openPort() socket: null socket: ¬null state: NOT_CON state: PLAIN quit() auth() socket: ¬null state: AUTH nite state models
  48. OP-Miner
  49. OP-Miner Program
  50. OP-Miner Usage Models Program iter.hasNext () iter.next ()
  51. OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next
  52. OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next Patterns hasNext ≺ next hasNext ≺ hasNext
  53. OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next Anomalies Patterns hasNext ≺ next ✓ hasNext ≺ hasNext hasNext ≺ next hasNext ≺ next hasNext ≺ hasNext ✗ hasNext ≺ hasNext
  54. OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next Anomalies Patterns hasNext ≺ next ✓ hasNext ≺ hasNext hasNext ≺ next hasNext ≺ next hasNext ≺ hasNext ✗ hasNext ≺ hasNext
  55. public Stack createStack () { Random r = new Random (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; while (i < n) { s.push (rand (r)); i++; } s.push (-1); return s; }
  56. public Stack createStack () { Random r = new Random (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; while (i < n) { s.push (rand (r)); i++; } s.push (-1); return s; }
  57. Random r = new Random (); public Stack createStack () { Random r = new Random (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; while (i < n) { s.push (rand (r)); i++; } s.push (-1); return s; }
  58. Random r = new Random (); public Stack createStack () { Random r = new Random (); int n = r.nextInt (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; Stack s = new Stack (); while (i < n) { s.push (rand (r)); i++; int i = 0; } s.push (-1); return s; }
  59. Random r = new Random (); public Stack createStack () { Random r = new Random (); int n = r.nextInt (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; Stack s = new Stack (); while (i < n) { s.push (rand (r)); i++; int i = 0; } s.push (-1); i < n return s; i++; } s.push (rand (r));
  60. Random r = new Random (); public Stack createStack () { Random r = new Random (); int n = r.nextInt (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; Stack s = new Stack (); while (i < n) { s.push (rand (r)); i++; int i = 0; } s.push (-1); i < n i < n return s; i++; } s.push (-1); s.push (rand (r));
  61. Random r = new Random (); public Stack createStack () { Random r = new Random (); int n = r.nextInt (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; Stack s = new Stack (); while (i < n) { s.push (rand (r)); i++; int i = 0; } s.push (-1); i < n i < n return s; i++; } s.push (-1); s.push (rand (r));
  62. Random r = new Random (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; i < n i < n i++; s.push (-1); s.push (rand (r));
  63. Stack s = new Stack (); s.push (-1); s.push (rand (r));
  64. s.<init>() s.push (_) s.push (_)
  65. Random r = new Random (); int n = r.nextInt (); Stack s = new Stack (); int i = 0; i < n i < n i++; s.push (-1); s.push (rand (r));
  66. Random r = new Random (); int n = r.nextInt (); s.push (rand (r));
  67. r.<init> () r.nextInt () Utils.rand (r)
  68. OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next Anomalies Patterns hasNext ≺ next ✓ hasNext ≺ hasNext hasNext ≺ next hasNext ≺ next hasNext ≺ hasNext ✗ hasNext ≺ hasNext
  69. OP-Miner Usage Models Temporal Properties hasNext ≺ next Program hasNext ≺ hasNext iter.hasNext () iter.next () next ≺ hasNext next ≺ next Anomalies Patterns hasNext ≺ next ✓ hasNext ≺ hasNext hasNext ≺ next hasNext ≺ next hasNext ≺ hasNext ✗ hasNext ≺ hasNext
  70. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close Methods
  71. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods
  72. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open()
  73. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open() hello()
  74. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open() hello() parse()
  75. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open() hello() parse()
  76. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open() hello() parse()
  77. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close Pattern get() Methods open() hello() parse()
  78. Methods vs. Properties Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close Pattern get() Methods open() hello() parse() Support
  79. Discovering Anomalies Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close get() Methods open() hello() parse()
  80. Discovering Anomalies Temporal Properties start ≺ lock ≺ eof ≺ stop unlock close Anomaly get() ✘ Methods open() hello() parse()
  81. AspectJ
  82. for (Iterator iter = itdFields.iterator(); iter.hasNext();) { ... for (Iterator iter2 = worthRetrying.iterator(); iter.hasNext();) { ... } }
  83. for (Iterator iter = itdFields.iterator(); iter.hasNext();) { ... for (Iterator iter2 = worthRetrying.iterator(); iter.hasNext();) { ... should be iter2 } }
  84. public void visitNEWARRAY (NEWARRAY o) { byte t = o.getTypecode (); if (!((t == Constants.T_BOOLEAN) || (t == Constants.T_CHAR) || ... (t == Constants.T_LONG))) { constraintViolated (o, "(...) '+t+' (...)"); } }
  85. public void visitNEWARRAY (NEWARRAY o) { byte t = o.getTypecode (); if (!((t == Constants.T_BOOLEAN) || (t == Constants.T_CHAR) || ... (t == Constants.T_LONG))) { constraintViolated (o, "(...) '+t+' (...)"); } } should be double quotes
  86. Name internalNewName (String[] identifiers) ... for (int i = 1; i < count; i++) { SimpleName name = new SimpleName(this); name.internalSetIdentifier(identifiers[i]); ... } ... }
  87. Name internalNewName (String[] identifiers) ... for (int i = 1; i < count; i++) { SimpleName name = new SimpleName(this); name.internalSetIdentifier(identifiers[i]); ... } should stay as is ... }
  88. public String getRetentionPolicy () { ... for (Iterator it = ...; it.hasNext();) { ... = it.next(); ... return retentionPolicy; } ... }
  89. public String getRetentionPolicy () { ... for (Iterator it = ...; it.hasNext();) { ... = it.next(); ... return retentionPolicy; } ... should be xed }
  90. 44% of violations are defects or code smells
  91. mine speci cations
  92. mine speci cations across thousands of projects
  93. Wisdom of the crowds Francis Galton Nein, links auch nicht
  94. Wisdom of the crowds Francis Galton Nein, links auch nicht
  95. lightweight parsing
  96. Target Languages Java C++ C PHP Javascript
  97. Target Languages Java C++ C PHP Javascript Similar syntax {...} ; foo()
  98. Target Languages Java C++ C PHP Javascript Similar syntax {...} ; foo() Similar keywords while if switch return
  99. Lightweight Parser Abstract Temporal Source Code Representation Properties
  100. Lightweight Parser Abstract Temporal Source Code Representation Properties } language-independent lightweight parsing
  101. Abstract Temporal Source Code Representation Properties
  102. Abstract Temporal Source Code Representation Properties int j; int fA; int fB = open(“newFile”); fA = open(“myFile”); j = 7; while (j > 3) { read(fA); write(fB, “Hello”); j--; } close(fA); close(fB);
  103. Abstract Temporal Source Code Representation Properties int j; fB: open(CONST) int fA; int fB = open(“newFile”); fA: open(CONST) fA = open(“myFile”); j = 7; while (j > 3) { Loop: read(fA); read(fA) write(fB, “Hello”); write(fB, CONST) j--; } close(fA) close(fA); close(fB); close(fB)
  104. Abstract Temporal Source Code Representation Properties fB: open(CONST) fA: open(CONST) Loop: read(fA) write(fB, CONST) close(fA) close(fB)
  105. Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) read(fA) fA: open(CONST) close(fA) Loop: read(fA) write(fB, CONST) close(fA) close(fB)
  106. Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) read(fA) fA: open(CONST) close(fA) Loop: read(fA) write(fB, CONST) fB: open(CONST) close(fA) write(fB, CONST) close(fB) close(fB)
  107. Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) open() < read() read(fA) fA: open(CONST) close(fA) Loop: read(fA) write(fB, CONST) fB: open(CONST) close(fA) write(fB, CONST) close(fB) close(fB)
  108. Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) open() < read() open() < close() read(fA) fA: open(CONST) close(fA) Loop: read(fA) write(fB, CONST) fB: open(CONST) close(fA) write(fB, CONST) close(fB) close(fB)
  109. Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) open() < read() open() < close() read(fA) read() < read() fA: open(CONST) close(fA) Loop: read(fA) write(fB, CONST) fB: open(CONST) close(fA) write(fB, CONST) close(fB) close(fB)
  110. Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) open() < read() open() < close() read(fA) read() < read() fA: open(CONST) close(fA) read() < close() Loop: read(fA) write(fB, CONST) fB: open(CONST) close(fA) write(fB, CONST) close(fB) close(fB)
  111. Abstract Temporal Source Code Representation Properties fA: open(CONST) fB: open(CONST) open() < read() open() < close() read(fA) read() < read() fA: open(CONST) close(fA) read() < close() Loop: read(fA) write(fB, CONST) fB: open(CONST) open() < write() close(fA) open() < close() write(fB, CONST) write() < write() close(fB) close(fB) write() < close()
  112. thousands of projects
  113. 8,000 6,000 4,000 2,000 0 C projects
  114. 8,000 6,097 6,000 4,000 2,000 0 C projects
  115. 200,000,000 8,000 6,097 150,000,000 6,000 100,000,000 4,000 50,000,000 2,000 0 0 Lines of code C projects
  116. 201,321,237 200,000,000 8,000 6,097 150,000,000 6,000 100,000,000 4,000 50,000,000 2,000 0 0 Lines of code C projects
  117. 6,097 C projects
  118. 201,321,237 lines of code
  119. 5,985,193 functions
  120. 15,803,766 properties (“f < g”)
  121. 6 GB database
  122. 18 hours analysis time single core
  123. 11 million lines of code per hour
  124. 11 seconds per project
  125. static int dcc_listen_init (…) { dcc->sok = socket(…); if (…) { while (…) { … = bind (dcc->sok, …); } /* with a small port range, reUseAddr is needed */ setsockopt (dcc->sok, …, SO_REUSEADDR, …); } listen (dcc->sok, …); }
  126. static int dcc_listen_init (…) { dcc->sok = socket(…); if (…) { while (…) { … = bind (dcc->sok, …); } /* with a small port range, reUseAddr is needed */ setsockopt (dcc->sok, …, SO_REUSEADDR, …); } listen (dcc->sok, …); should be called before bind() }
  127. static int find_file (…) { DIR *dirp; struct dirent *dirinfo; … dirp = opendir("."); if (dirp == NULL) { … } while ((dirinfo = readdir(dirp)) != NULL) { … } rewinddir(dirp); return 1; }
  128. static int find_file (…) { DIR *dirp; struct dirent *dirinfo; … dirp = opendir("."); if (dirp == NULL) { … } while ((dirinfo = readdir(dirp)) != NULL) { … } rewinddir(dirp); return 1; should call closedir() instead }
  129. Platform
  130. Check my Code • Check your code against the wisdom of Linux • Builds on millions of mined speci cations • Detects problems no other tool can detect www.checkmycode.org
  131. Check my Code • Check your code against the wisdom of Linux Dat abase • Builds on millions of ilable mined speci cations ava fo r dow nload • Detects problems no other tool can detect www.checkmycode.org
  132. speci cation crisis
  133. speci cation crisis
  134. microsoft word mobile phones travel booking operating systems airplane control banking systems
  135. microsoft word mobile phones travel booking operating systems airplane control banking systems easy to mine
  136. Challenges
  137. Challenges • Mining complete speci cations
  138. Challenges • Mining complete speci cations • Finding relevant abstractions
  139. Challenges • Mining complete speci cations • Finding relevant abstractions • Producing readable speci cations
  140. Challenges • Mining complete speci cations • Finding relevant abstractions • Producing readable speci cations • Integrating speci cation mining and programming
  141. Andrzej Wasylkowski Christian Lindig Natalie Gruska
  142. Summary
  143. Summary
  144. Summary
  145. Summary
  146. Summary
  147. Summary

×