7. Acknowledgements
• 4 slides borrowed from Dr. Tudor Gîrba
• 9 slides borrowed from Marco D’Ambros
• 11 slides borrowed from Richard Wettel
• 12 slides borrowed from Romain Robbes
• Thanks to Prof. Zeller for having me here
9. What is Software?
“a program enables a
computer to perform a
specific task”
“A computer program is
a collection of
instructions that
describe a task, or set of
tasks, to be carried out
by a computer”
10. Some Facts About Software
• Society increasingly relies on software
• ...but it is unreliable and of low quality
• Software is regarded like a classical engineering
product
• ...but it is more complex than any other human artifact
• Maintenance is treated as a lowly activity
• ...but 75% - 95% of cost is spent on maintenance
• Software evolves due to business & technology
drivers: systems that do not change are dead
• Software evolution is crucial
11. How Large is Software?
Windows XP: > 45 M
Lines of code (millions)
40 Windows 2000: 40 M
Red Hat 7.1
30 M
30
20
Windows 98: 18 M
Unix V7:
Windows 95: 15 M
10,000 Red Hat 6.2
Solaris 7: 12 M 17 M
10
Windows NT: 4 M
2 Windows 3.1: 3 M
1990 Linux: 10,000 1995 1998 2000
1992
12. How Much Software is There?
• The total volume of software is estimated at
7’000’000’000 function points (FP)
• 1 FP ~ 128 lines of C or 107 lines of COBOL
• This means ca. 1 TLOC (1’000’000’000’000 lines)
• Printed on paper we can wrap the planet 10 times
• In what shape is it?
• On average ca. 5 bugs / FP
• This means ca. 35’000’000’000 bugs (6 per person)
13. How Reliable is Software??
• Empirically
• 1 error / 20 lines
• Saftey-critical Systems
• 1 error / 100 lines
• Wishful Thinking
• 1 error / 1000 lines
• The software that flies
a Jumbo Jet
• 8’000’000 lines: you do
the math...
15. ...and it evolves!
Mozilla:
3 MLOC, more than 1
Milliion changes
performed by hundreds
of developers over more
than 6 years
16. What is Evolution?
“the accumulation of
changes through
succeeding generations
of organisms that results
in the emergence of new
species”
17. Maintenance vs Evolution
Software Evolution
XP sez: a system is
1.0 1.1 1.1a *always* in evolution,
2.0
there is no “for ward t
maintenance engineering” phase only
maintenance steps
activity
“Software Maintenance
Development”
18. Why Analyze Software Evolution?
“Nevertheless, the industrial track record raises the
question, why, despite so many advances, [...]
• satisfactory functionality, performance and quality is
only achieved over a lengthy evolutionary
process
• software maintenance never ceases until a system
is scrapped
• software is still generally regarded as the weakest
link in the development of computer-based
systems”
Lehman et. al, 1997
19. Software Entropy
• Lehman’s “Laws of Software Evolution”
• “Continuing Change”
• “Increasing Entropy/Complexity”
• “Increasing Size”
• Maintenance increases “Software Entropy”
• Erosion of architecture, design, modularization
• Increase if interdependencies between parts (“Coupling”)
• Decrease of separation of concerns (“Cohension”)
21. Software Evolution Analysis
• Goal: Investigate the evolution of a software system
to identify potential shortcomings in its architecture or
logical structure
• Structural shortcomings can the be subjected to
reengineering or restructuring
• Prerequisite: Reverse Engineering
23. Reverse Engineering in Reality
• During WW2 in 1944 3 B-29
Bombers had to land in Russia
• The main US bomber provided the
strategic advantage of reaching
over the Pacific
• Tremendously valuable, unknown
to the Russians, to build from
scratch would have taken 5 years
• Approach: Disassemble, test, run
• One was disassembled, one was
used, one was a training model
24. Software Reverse Engineering
• “The process of analysing a subject system to
• identify the system’s components and their
interrelationships, and
• create representations of the system in another form or
at a higher level of abstraction”
[Chikofsky & Cross, 1990]
• Why? To understand other people’s code
(newcomers in the team, code reviewing, developers
that left, etc.)
• Generating UML diagrams is not reverse
engineering...but it is a valuable support tool
25. Development: Hidden Chaos
fo
rw
ar
d
en
gin
ee
rin
g
{ {
{ { { {
} { {
}
} actual development } }
} { } } }
26. Reengineering: Regaining control
fo
g
rw
rin
ar
ee
d
gin
en
en
gin
e
ee
rs
rin
ve
g
re
{ {
{ { { {
} { {
}
} program transformation } }
} { } } }
27. Creating high level views: reverse engineering
g
rin
ee
gin
en
e
rs
ve
re
{ {
{ {
}
}
}
} { }
28. Coming back to Software Evolution Analysis
• Software systems are not “just there”, they are
evolved over time
• “If you want to know who somebody is, you have to
ask where he comes from”
• Evolution information is the key to a holistic
understanding of software
• The major goals of software evolution analysis are to
• Understand the evolutionary process
• Predict the future evolution
• This is done by mining software repositories
30. Mining Software Repositories?
• Software evolution research relies on software
repositories (think “CVS” or “Subversion”)
• To answer the question “Who did what and when?”
• ...but much more than that:
Code Effort Bugs Tests
...
Changes
e-Mails Navigation
Web Sites Traces Chats People Specs
31. Mining: Tools & Models
• Tools to create Models
• Tools to reason on the created Models
Models
... Effort
Tests People
Documentation
Code Traces
e-Mails
Bugs e-Mails
32. Models & Meta-Models
• A Model is just that: a representation of a system
• A Model always relies on a meta-model
• One challenge: unify and connect meta-models
Package Namespace
packagedIn belongsTo
* *
superclass *
Class Inheritance
subclass *
belongsTo belongsTo Package Namespace
*
History History
*
* invokedBy packagedIn belongsTo
Invocation Method Attribute
* *
* candidate superclass *
accessedIn accesses
Class Inheritance
History subclass *
History
* *
belongsTo belongsTo
Access
* *
* invokedBy
Invocation Method Attribute
History * candidate History History
accessedIn accesses
* *
Access
History
35. Software Visualization
“The use of the crafts of typography, graphic design,
animation, and cinematography with modern
human-computer interaction and computer graphics
technology to facilitate both the human
understanding and effective use of computer
software”
Stasko et.al., 1998
45. The city metaphor
class metric building property
software representation number of methods height
classes buildings number of attributes width
number of attributes length
packages districts
system city
package metric district property
nesting level color saturation
50. What about Software Evolution?
referenceVersion
1 versions *
versionEntity
ModelHistory history EntityVersion MooseModel
referenceVersion
referenceHistory
containingPackageHistory
mooseModel
packageHistories packagedIn
1 *
1 versions *
versionEntity
PackageHistory history EntityVersion FAMIXPackage
extendedInPackages
*
1 1 1
extendedClasses
referenceVersion
definedClasses
classHistories
from here...
referenceHistory
packagedIn
mooseModel
containingPackageHistory
* * * 1
1 versions *
versionEntity
ClassHistory history ClassVersion FAMIXClass
1
1 1 1
methodHistories
referenceVersion
attributeHistories
referenceHistory
methods
attributes
mooseModel
belongsTo
containingClassHistory
*
*
1 versions *
versionEntity
MethodHistory history MethodVersion FAMIXMethod
referenceVersion
referenceHistory
mooseModel
belongsTo
containingClassHistory
*
to here!
*
1 versions *
versionEntity
AttributeHistory history EntityVersion FAMIXAttribute
HISTORY LAYER VERSION LAYER SNAPSHOT LAYER
51. Same problem, more data
System ArgoUML JHotDraw Jmol
Packages 144 72 105
Classes 2’542 998 1’032
Lines of code 137’000 30’000 85’000
Sampling start Oct 2002 Oct 2000 Jan 2000
Sampling end Feb 2007 Apr 2005 Aug 2007
Sampling period variable 1 week 8 weeks
Samples 9 57 50
Revisions 13’535 267 8’065
53. ArgoUML Age map
org.argouml.language.cpp STDCTokenTypes FacadeMDRImpl
NOA 152, NOM 0, AGE 4
Facade NOA 3, NOM 351, AGE 4
org.argouml.language.php NOA 1, NOM 339, AGE 5
org.argouml.language.csharp CPPParser
NOA 85, NOM 204, AGE 4
org.argouml.language.java
org.argouml.model
JavaRecognizer
NOA 24, NOM 91, AGE 9
JavaTokenTypes
NOA 146, NOM 0, AGE 9
org.argouml.uml.reveng.java
JavaTokenTypes
NOA 175, NOM 0, AGE 9
JavaRecognizer
NOA 79, NOM 176, AGE 9
59. An ideal bug’s life cycle
Unconfirmed Verified
New Resolved Closed
Assigned
60. A less ideal bug’s life cycle
Unconfirmed Verified
New Resolved Closed
Assigned Reopened
61. A real bug’s life cycle
Unconfirmed Verified
New Resolved Closed
Assigned Reopened
62. Bug history from activities
Bug Bug
Problem Problem
id description id description
product component product component
Criticality Activity Criticality
severity priority severity priority
Involved people Involved people
steve
assignedTo reporter qa
AssignedTo john
assignedTo reporter qa
State steve john State
Status Resolution Status Resolution
... ...
Bug history
. . . .
64. The System Radiography View
“Where (in the system and in its history) are the
open bugs located?”
Visualization
principle • System decomposition
on the y axis
Component 1
Component 2
Product A
• Product :: Component
Color
y position #bugs
Component
x position
Product B
• (x,y) : (time, component)
•
Time Interval
Color: # open bugs
Time
65. Mozilla example [Sep ‘98 - Apr ‘03]
aggiungere transizione
alla prossima slide,
volendo anche nel filmato
Browser
Mailnews
66. The Bug Watch View
“How are bugs characterized with respect to their history?”
Visualization principle
End: 10/16/2001 Beginning: 10/19/1999 • 3 Layers
Time • Status
Status From To
Assigned 10/19/99 12/21/99
Resolved 12/21/99 1/31/00
Reopened 1/31/00 2/6/00
New 2/6/00 6/5/00
... ... ...
• Activity
• Severity
67. Examples from Mozilla
Browser :: Networking [Nov ‘02- Apr ‘03]
tell more about the
clustering dire che ne abbiamo
trovato anche bugs che
passano da resolved a
new o unconfirmed senza
passare da reopened
• Reopened 4 times
• Activities:
dire cosa e’ la grandezza
• •
Developer in charge to fix it
One statusonly (new)
changed 6 times
but many activities
• Many people added in the
CC
• All addition of CC
• Popular bug
73. Date
Class Change Refactorings
15/08/2006
17h17:29
Added class Foo Number of Entities
Additions (Unique Number)
method
m2
Modifications
Method Change
Removals
Duration
74. There is a shift of focus among classes
ChangePerformerTest, AdditionOperation,
MethodNode,
ScopeNode, TreeNode, ChangePerformer, DeletionOperation,
Method, BlockNode,
Argument, Temporary, AdditionOperation, TreeNode,
ChangePerformer ScopeNode TreeNode
Entity DeletionOperation Removal
Let’s look at S ,
a painting and decoration session
75. Added arguments and temporaries to the parse tree
ScopeNode
TreeNode AddEntityChild:
Argument isArgument
Temporary isTemporary
Entity
76. Added arguments and temporaries to Method.
Changed how to handle the children of a method
addArgument:
Method addTemporary:
ChangePerformer allChildren:
allLocalVariables:
77. Changed how source code is generated
MethodNode printTempsOn:
BlockNode printArgumentsOn:
ScopeNode printSourceCodeOn:
83. Mining Repositories to Control Evolution
• “Reverse Engineering with Logical Coupling”, Marco
D’Ambros, Michele Lanza, In Proceedings of WCRE 2006
(13th Working Conference on Reverse Engineering), pp.
189 - 198, IEEE CS Press, 2006
• “Software Bugs and Evolution: A Visual Approach to
Uncover their Relationship”, Marco D’Ambros, Michele
Lanza, In Proceedings of CSMR 2006 (10th European
Conference on Software Maintenance and
Reengineering), pp. 227 - 236, IEEE CS Press, 2006
• “A Bug’s Life: Visualizing a Bug Database”, Marco
D’Ambros, Michele Lanza, Martin Pinzger, In Proceedings
of VISSOFT 2007 (4th IEEE International Workshop on
Visualizing Software For Understanding and Analysis), pp.
113 - 120, IEEE CS Press, 2007
• “The Evolution Radar: Integrating Fine-grained and
Coarse-grained Logical Coupling Information”, Marco
D’Ambros, Michele Lanza, Mircea Lungu, In Proceedings
of MSR 2006 (3rd International Workshop on Mining
Software Repositories), pp. 26 - 32, 2006
• “Fractal Figures: Visualizing Development Effort for CVS
Entities”, Marco D'Ambros, Michele Lanza, Harald Gall, In
Proceedings of VISSOFT 2005 (3rd IEEE International
Workshop on Visualizing Software For Understanding and
Analysis), pp. 46 - 51, IEEE CS Press, 2005
84. Immersive Software Analysis
• “Program Comprehension through Software Habitability”,
Richard Wettel, Michele Lanza, In Proceedings of ICPC
2007 (15th International Conference on Program
Comprehension), pp. 231 - 240, IEEE CS Press, 2007
• “Visualizing Software Systems as Cities”, Richard Wettel,
Michele Lanza, In Proceedings of VISSOFT 2007 (4th
International Workshop on Visualizing Software for
Understanding and Analysis), pp. 92 - 99, IEEE CS Press,
2007
85. Change-based Software Evolution
• “A Change-based Approach to Software Evolution”,
Romain Robbes, Michele Lanza, In ENTCS, vol. 166, pp
93 - 109, Jan 2007, Elsevier Science Direct
• “Characterizing and Understanding Development
Sessions”, Romain Robbes, Michele Lanza, In
Proceedings of ICPC 2007 (15th International Conference
on Program Comprehension), pp. 155 - 164, IEEE CS
Press, 2007
• “An Approach to Software Evolution Based on Semantic
Change”, Romain Robbes, Michele Lanza, Mircea Lungu,
In Proceedings of FASE 2007 (10th ETAPS Conference on
Fundamental Approaches to Software Engineering), pp.
27 - 411, Springer LNCS, 2007
• “Mining a Change-based Repository”, Romain Robbes, In
Proceedings of MSR 2007 (4th International Workshop on
Mining Software Repositories), IEEE CS Press, 2007
• “Change-based Software Evolution”, Romain Robbes,
Michele Lanza, In Proceedings of EVOL 2006 (1st
International Workshop on Software Evolution), pp. 159 -
164, 2006
• “Versioning Systems for Evolution Research”, Romain
Robbes, Michele Lanza, In Proceedings of IWPSE 2005
(8th International Workshop on Principles of Software
Evolution), pp. 155 - 164, IEEE CS Press, 2005
86. Visual Architecture Reconstruction
• “Reverse Engineering Super-Repositories”, Mircea Lungu,
Michele Lanza, Tudor Gîrba, Reinout Heeck, In Proceedings
of WCRE 2007 (14th Working Conference on Reverse
Engineering), to be published, IEEE CS Press, 2007
• “Exploring Inter-Module Relationships in Evolving Software
Systems”, Mircea Lungu, Michele Lanza, In Proceedings of
CSMR 2007 (11th European Conference on Software
Maintenance and Reengineering), pp. 91 - 100, IEEE CS
Press, 2007
• “Package Patterns for Visual Architecture Recovery”, Mircea
Lungu, Michele Lanza, Tudor Gîrba, In Proceedings of CSMR
2006 (10th European Conference on Software Maintenance
and Reengineering), pp. 227 - 236, IEEE CS Press, 2006
• “Interactive Exploration of Semantic Clusters”, Mircea Lungu,
Adrian Kuhn, Tudor Gîrba, Michele Lanza, In Proceedings of
VISSOFT 2005 (3rd International Workshop on Visualizing
Software for Understanding and Analysis), pp. 95 - 100, IEEE
CS Press, 2005
• “A Small Observatory for Super-Repositories”, Mircea Lungu,
Tudor Gîrba, In Proceedings of IWPSE 2007 (10th
International Workshop on Principles of Software Evolution),
pp. 106 - 109, IEEE CS Press, 2007
• “Softwarenaut: Cutting the Edge in Software Visualization”,
Mircea Lungu, Michele Lanza, In Proceedings of Softvis 2006
(3rd International Symposium on Software Visualization), pp.
179 - 180, ACM Press, 2006
87. Take-away
• Software is not written, it’s being evolved (by people)
• Fully understanding software is only possible if one
takes into account evolutionary information
• Evolution analysis = mining software repositories
• Mining software repositories = tools & models
• A myriad of approaches exist
• Long-term goal: holistic understanding of software
• And: assisting the developer
• Last but not least: an exciting, still under-researched,
field of software engineering