Models—abstract and simple descriptions of some artifact—are the backbone of all software engineering activities. While writing models is hard, existing code can serve as a source for abstract descriptions of how software behaves. To infer correct usage, code analysis needs usage examples, though; the more, the better.
We have built a lightweight parser that efficiently extracts API usage models from source code—models that can then be used to detect anomalies. Applied on the 200 mil- lion lines of code of the Gentoo Linux distribution, we would extract more than 15 million API constraints. On the web site checkmycode.org, anyone can check his/her code against the “wisdom of Linux”.
15. Some numbers
• ~70 PhD advisors in computer science
• ≥ 300 PhD students in computer science
16. Some numbers
• ~70 PhD advisors in computer science
• ≥ 300 PhD students in computer science
• ~60 new PhD graduates per year
17. Some numbers
• ~70 PhD advisors in computer science
• ≥ 300 PhD students in computer science
• ~60 new PhD graduates per year
• ~60 new MSc graduates per year
18. Some numbers
• ~70 PhD advisors in computer science
• ≥ 300 PhD students in computer science
• ~60 new PhD graduates per year
• ~60 new MSc graduates per year
• 800–1400 € per month as a PhD stipend
(+ laptop & office • starting right after BSc • all courses in English)
54. OP-Miner
Usage Models
Program iter.hasNext () iter.next ()
55. OP-Miner
Usage Models Temporal Properties
hasNext ≺ next
Program hasNext ≺ hasNext
iter.hasNext () iter.next () next ≺ hasNext
next ≺ next
56. OP-Miner
Usage Models Temporal Properties
hasNext ≺ next
Program hasNext ≺ hasNext
iter.hasNext () iter.next () next ≺ hasNext
next ≺ next
Patterns
hasNext ≺ next
hasNext ≺ hasNext
57. OP-Miner
Usage Models Temporal Properties
hasNext ≺ next
Program hasNext ≺ hasNext
iter.hasNext () iter.next () next ≺ hasNext
next ≺ next
Anomalies Patterns
hasNext ≺ next
✓ hasNext ≺ hasNext hasNext ≺ next
hasNext ≺ next hasNext ≺ hasNext
✗ hasNext ≺ hasNext
58. OP-Miner
Usage Models Temporal Properties
hasNext ≺ next
Program hasNext ≺ hasNext
iter.hasNext () iter.next () next ≺ hasNext
next ≺ next
Anomalies Patterns
hasNext ≺ next
✓ hasNext ≺ hasNext hasNext ≺ next
hasNext ≺ next hasNext ≺ hasNext
✗ hasNext ≺ hasNext
59.
60. public Stack createStack () {
Random r = new Random ();
int n = r.nextInt ();
Stack s = new Stack ();
int i = 0;
while (i < n) {
s.push (rand (r));
i++;
}
s.push (-1);
return s;
}
61. public Stack createStack () {
Random r = new Random ();
int n = r.nextInt ();
Stack s = new Stack ();
int i = 0;
while (i < n) {
s.push (rand (r));
i++;
}
s.push (-1);
return s;
}
62. Random r = new Random ();
public Stack createStack () {
Random r = new Random ();
int n = r.nextInt ();
Stack s = new Stack ();
int i = 0;
while (i < n) {
s.push (rand (r));
i++;
}
s.push (-1);
return s;
}
63. Random r = new Random ();
public Stack createStack () {
Random r = new Random ();
int n = r.nextInt (); int n = r.nextInt ();
Stack s = new Stack ();
int i = 0; Stack s = new Stack ();
while (i < n) {
s.push (rand (r));
i++; int i = 0;
}
s.push (-1);
return s;
}
64. Random r = new Random ();
public Stack createStack () {
Random r = new Random ();
int n = r.nextInt (); int n = r.nextInt ();
Stack s = new Stack ();
int i = 0; Stack s = new Stack ();
while (i < n) {
s.push (rand (r));
i++; int i = 0;
}
s.push (-1); i < n
return s; i++;
}
s.push (rand (r));
65. Random r = new Random ();
public Stack createStack () {
Random r = new Random ();
int n = r.nextInt (); int n = r.nextInt ();
Stack s = new Stack ();
int i = 0; Stack s = new Stack ();
while (i < n) {
s.push (rand (r));
i++; int i = 0;
}
s.push (-1); i < n i < n
return s; i++;
}
s.push (-1); s.push (rand (r));
66. Random r = new Random ();
public Stack createStack () {
Random r = new Random ();
int n = r.nextInt (); int n = r.nextInt ();
Stack s = new Stack ();
int i = 0; Stack s = new Stack ();
while (i < n) {
s.push (rand (r));
i++; int i = 0;
}
s.push (-1); i < n i < n
return s; i++;
}
s.push (-1); s.push (rand (r));
67. Random r = new Random ();
int n = r.nextInt ();
Stack s = new Stack ();
int i = 0;
i < n i < n
i++;
s.push (-1); s.push (rand (r));
68. Stack s = new Stack ();
s.push (-1); s.push (rand (r));
87. for (Iterator iter = itdFields.iterator();
iter.hasNext();) {
...
for (Iterator iter2 = worthRetrying.iterator();
iter.hasNext();) {
...
}
}
88. for (Iterator iter = itdFields.iterator();
iter.hasNext();) {
...
for (Iterator iter2 = worthRetrying.iterator();
iter.hasNext();) {
... should be iter2
}
}
90. public void visitNEWARRAY (NEWARRAY o) {
byte t = o.getTypecode ();
if (!((t == Constants.T_BOOLEAN) ||
(t == Constants.T_CHAR) ||
...
(t == Constants.T_LONG))) {
constraintViolated (o, "(...) '+t+' (...)");
}
} should be double quotes
91. Name internalNewName (String[] identifiers)
...
for (int i = 1; i < count; i++) {
SimpleName name = new SimpleName(this);
name.internalSetIdentifier(identifiers[i]);
...
}
...
}
92. Name internalNewName (String[] identifiers)
...
for (int i = 1; i < count; i++) {
SimpleName name = new SimpleName(this);
name.internalSetIdentifier(identifiers[i]);
...
} should stay as is
...
}
93. public String getRetentionPolicy ()
{
...
for (Iterator it = ...; it.hasNext();)
{
... = it.next();
...
return retentionPolicy;
}
...
}
94. public String getRetentionPolicy ()
{
...
for (Iterator it = ...; it.hasNext();)
{
... = it.next();
...
return retentionPolicy;
}
... should be xed
}
134. static int dcc_listen_init (…) {
dcc->sok = socket(…);
if (…) {
while (…) {
… = bind (dcc->sok, …);
}
/* with a small port range, reUseAddr is needed */
setsockopt (dcc->sok, …, SO_REUSEADDR, …);
}
listen (dcc->sok, …);
}
135. static int dcc_listen_init (…) {
dcc->sok = socket(…);
if (…) {
while (…) {
… = bind (dcc->sok, …);
}
/* with a small port range, reUseAddr is needed */
setsockopt (dcc->sok, …, SO_REUSEADDR, …);
}
listen (dcc->sok, …); should be called before bind()
}
136. static int find_file (…)
{
DIR *dirp;
struct dirent *dirinfo;
…
dirp = opendir(".");
if (dirp == NULL)
{
…
}
while ((dirinfo = readdir(dirp)) != NULL)
{
…
}
rewinddir(dirp);
return 1;
}
137. static int find_file (…)
{
DIR *dirp;
struct dirent *dirinfo;
…
dirp = opendir(".");
if (dirp == NULL)
{
…
}
while ((dirinfo = readdir(dirp)) != NULL)
{
…
}
rewinddir(dirp);
return 1; should call closedir() instead
}
142. Check my Code
• Check your code against
the wisdom of Linux
• Builds on millions of
mined speci cations
• Detects problems no
other tool can detect
www.checkmycode.org
143. Check my Code
• Check your code against
the wisdom of Linux
Dat abase
• Builds on millions of
ilable
mined speci cations ava
fo r dow nload
• Detects problems no
other tool can detect
www.checkmycode.org
You talk to these people, and you immediately realize they&#x2019;re smart. They&#x2019;re really smart &#x2013; Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he&#x2019;s the best paid professor of Germany.
You talk to these people, and you immediately realize they&#x2019;re smart. They&#x2019;re really smart &#x2013; Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he&#x2019;s the best paid professor of Germany.
You talk to these people, and you immediately realize they&#x2019;re smart. They&#x2019;re really smart &#x2013; Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he&#x2019;s the best paid professor of Germany.
You talk to these people, and you immediately realize they&#x2019;re smart. They&#x2019;re really smart &#x2013; Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he&#x2019;s the best paid professor of Germany.
You talk to these people, and you immediately realize they&#x2019;re smart. They&#x2019;re really smart &#x2013; Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he&#x2019;s the best paid professor of Germany.
You talk to these people, and you immediately realize they&#x2019;re smart. They&#x2019;re really smart &#x2013; Michael got a MSc in maths and CS at the age of 21, got his PhD with 24, and became a professor at the age of 27. Today, he&#x2019;s the best paid professor of Germany.
They chose to do these other things, not because they are easy, but because they are hard. Hard to verify, this is. Many things are of that kind. However&#x2026; notice that all these problems can be stated in very simple terms.
They chose to do these other things, not because they are easy, but because they are hard. Hard to verify, this is. Many things are of that kind. However&#x2026; notice that all these problems can be stated in very simple terms.
They chose to do these other things, not because they are easy, but because they are hard. Hard to verify, this is. Many things are of that kind. However&#x2026; notice that all these problems can be stated in very simple terms.
They chose to do these other things, not because they are easy, but because they are hard. Hard to verify, this is. Many things are of that kind. However&#x2026; notice that all these problems can be stated in very simple terms.
They chose to do these other things, not because they are easy, but because they are hard. Hard to verify, this is. Many things are of that kind. However&#x2026; notice that all these problems can be stated in very simple terms.
What do I mean by &#x201C;easy to specify&#x201D;? Here&#x2019;s something that&#x2019;s hard to verify &#x2013; sorting.
Tell story of first NORA talk
forall i in {0, dots, |x'|} :&: x'[i] < x'[i + 1] \
|x| = |x'| \
forall i in {0, dots, |x|}:&: iota i' in {0, dots, |x'|}: x[i] = x'[i'] \
forall i' in {0, dots, |x'|}:&: iota i in {0, dots, |x|}: x'[i'] = x[i]
Tell story of first NORA talk
forall i in {0, dots, |x'|} :&: x'[i] < x'[i + 1] \
|x| = |x'| \
forall i in {0, dots, |x|}:&: iota i' in {0, dots, |x'|}: x[i] = x'[i'] \
forall i' in {0, dots, |x'|}:&: iota i in {0, dots, |x|}: x'[i'] = x[i]
We can introduce a vocabulary, and do things incrementally, but the burden remains.
ext{is-sorted}(x') land ext{is-permutation}(x, x')
It&#x2019;s nice to know that MS word won&#x2019;t dereference null pointers, but will it print my text?
Full of functional properties
It&#x2019;s nice to know that MS word won&#x2019;t dereference null pointers, but will it print my text?
Full of functional properties
It&#x2019;s nice to know that MS word won&#x2019;t dereference null pointers, but will it print my text?
Full of functional properties
It&#x2019;s nice to know that MS word won&#x2019;t dereference null pointers, but will it print my text?
Full of functional properties
It&#x2019;s nice to know that MS word won&#x2019;t dereference null pointers, but will it print my text?
Full of functional properties
It&#x2019;s nice to know that MS word won&#x2019;t dereference null pointers, but will it print my text?
Full of functional properties
It&#x2019;s nice to know that MS word won&#x2019;t dereference null pointers, but will it print my text?
Full of functional properties
It&#x2019;s nice to know that MS word won&#x2019;t dereference null pointers, but will it print my text?
Full of functional properties
Why is it that things are hard to specify?
&#x21D2; New language, &#x21D2; Effort duplicated, &#x21D2; Can&#x2019;t abstract from details
and leverage the knowledge of 50 years of programming!
This is what my talk today is about. In fact, it&#x2019;s about mining specifications from 6,000 projects &#x2013; the largest such attempt ever.
and leverage the knowledge of 50 years of programming!
This is what my talk today is about. In fact, it&#x2019;s about mining specifications from 6,000 projects &#x2013; the largest such attempt ever.
Dynamic invariants &#x2013; mined from executions
Work by Michael Ernst &#x2013; my big inspiration
Describe what should hold &#x2013; but not how to get there
API usage &#x2013; as mined from executions
Describe what holds &#x2013; and how to achieve it!
This would be a pattern, if it were not for the missing element
This would be a pattern, if it were not for the missing element
This would be a pattern, if it were not for the missing element
This would be a pattern, if it were not for the missing element
We can detect such gaps by looking at overlapping patterns (concepts)
We can detect such gaps by looking at overlapping patterns (concepts)
We can detect such gaps by looking at overlapping patterns (concepts)
We can detect such gaps by looking at overlapping patterns (concepts)
We can detect such gaps by looking at overlapping patterns (concepts)
Produced in 8 minutes on this machine
On encountering a wrong typecode,
<visitNEWARRAY()> should report the typecode to the user. However,
it fails to do so, as it uses <'+t+'> instead of <"+t+"> when
constructing the second parameter to the <constraintViolated()>
method, causing the string <'+t+'> to be interpreted verbatim---the
message contains <'+t+'> rather than the typecode in <t>.
OPMiner{} reports this as an OP violation: the second parameter of
<constraintViolated()> should be the result of a
<StringBuffer.toString()> method call---i.e. a constructed string
rather than a constant string. The rationale for using a constructed
string is to include some information about the violation.
On encountering a wrong typecode,
<visitNEWARRAY()> should report the typecode to the user. However,
it fails to do so, as it uses <'+t+'> instead of <"+t+"> when
constructing the second parameter to the <constraintViolated()>
method, causing the string <'+t+'> to be interpreted verbatim---the
message contains <'+t+'> rather than the typecode in <t>.
OPMiner{} reports this as an OP violation: the second parameter of
<constraintViolated()> should be the result of a
<StringBuffer.toString()> method call---i.e. a constructed string
rather than a constant string. The rationale for using a constructed
string is to include some information about the violation.
In 48 cases: argument comes from String() constructor;
only in 3 cases: from array
In 48 cases: argument comes from String() constructor;
only in 3 cases: from array
Code smell &#x2192; does not result in errors, but may cause maintainability problems
Defects &#x2192; reported & verified
Code smell &#x2192; does not result in errors, but may cause maintainability problems
Defects &#x2192; reported & verified
44% holds for AspectJ; same for other projects
Lots of subtle defects in production code
Unclear whether these would be found by other means
and leverage the knowledge of 50 years of programming!
This is what my talk today is about
and leverage the knowledge of 50 years of programming!
This is what my talk today is about
and leverage the knowledge of 50 years of programming!
This is what my talk today is about
Die einleitende Geschichte erz&#xE4;hlt von Francis Galtons &#xDC;berraschung, dass Besucher einer Vieh-Ausstellung im Rahmen eines Gewinnspiels das Schlachtgewicht eines Rindes genau sch&#xE4;tzten, wenn man als Sch&#xE4;tzwert der Gruppe den Mittelwert aller Sch&#xE4;tzungen annahm. (Die Sch&#xE4;tzung der Gruppe war sogar besser als die jedes einzelnen Teilnehmers, darunter manche Metzger.)
First thing we needed was a lightweight parser
Wir m&#xFC;ssen daher in der Lage sein, gro&#xDF;e Mengen Code zu analysieren &#x2013; am besten Quellcode.
Wir m&#xFC;ssen daher in der Lage sein, gro&#xDF;e Mengen Code zu analysieren &#x2013; am besten Quellcode.
Next thing we needed was thousands of projects
We have 6097 projects in our reference database. Their size ranges from 7 (for openssl-blacklist_0.4.2 and openvpn-blacklist_0.3) to 5,491,951 (for linux-2.6.29) SLOC (generated using David A. Wheeler's 'SLOCCount'; includes only .c files). Some other statistics:
&#xA0;[first quartile]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;1093
&#xA0;[third quartile]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;16160
&#xA0;[median]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;4162
&#xA0;[mean]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;33020
We have 6097 projects in our reference database. Their size ranges from 7 (for openssl-blacklist_0.4.2 and openvpn-blacklist_0.3) to 5,491,951 (for linux-2.6.29) SLOC (generated using David A. Wheeler's 'SLOCCount'; includes only .c files). Some other statistics:
&#xA0;[first quartile]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;1093
&#xA0;[third quartile]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;16160
&#xA0;[median]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;4162
&#xA0;[mean]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;33020
We have 6097 projects in our reference database. Their size ranges from 7 (for openssl-blacklist_0.4.2 and openvpn-blacklist_0.3) to 5,491,951 (for linux-2.6.29) SLOC (generated using David A. Wheeler's 'SLOCCount'; includes only .c files). Some other statistics:
&#xA0;[first quartile]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;1093
&#xA0;[third quartile]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;16160
&#xA0;[median]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;4162
&#xA0;[mean]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;33020
We have 6097 projects in our reference database. Their size ranges from 7 (for openssl-blacklist_0.4.2 and openvpn-blacklist_0.3) to 5,491,951 (for linux-2.6.29) SLOC (generated using David A. Wheeler's 'SLOCCount'; includes only .c files). Some other statistics:
&#xA0;[first quartile]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;1093
&#xA0;[third quartile]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;16160
&#xA0;[median]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;4162
&#xA0;[mean]: &#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;33020
Defect in Conspire 0.20
Defect in Conspire 0.20
Defect in cksfv-1.3.13
Defect in cksfv-1.3.13
As a special treat to SCAM attendees, we&#x2019;re making all of our database available &#x2013; today!
coming back to the beginning of my talk &#x2013; are we facing a specification crisis? Yes.
coming back to the beginning of my talk &#x2013; are we facing a specification crisis? Yes.
But we can alleviate it
by reusing and abstracting from all the code that&#x2019;s around.
But still, we just scratch the surface of the knowledge that&#x2019;s in there. Plenty of work lies ahead of us.
But still, we just scratch the surface of the knowledge that&#x2019;s in there. Plenty of work lies ahead of us.
But still, we just scratch the surface of the knowledge that&#x2019;s in there. Plenty of work lies ahead of us.
But still, we just scratch the surface of the knowledge that&#x2019;s in there. Plenty of work lies ahead of us.
But with these future challenges, let&#x2019;s not forget past challenges.
My students faced these challenges not because they were easy, but because they were hard. And I am very grateful for the wonderful results they achieved.