2. Systematic determination of a merit, worth and
significance of product
Criteria based on a set of standards
Degree of achievement
objectives v.s. results
Tailored to its context
Assessment of a product quality
Activities
CAPTURE – collecting data
ANALYSIS – interpreting data to identify problems
CRITIQUE – suggesting solutions or improvements to
mitigate problems
2
3. A language is a means of communication
The user interface is a realization of a language
A language is a model that describes the allowed
terms and how to compose them into valid sentences
3
4. General Purpose (programming) Languages (GPLs)
User has to master programming concepts
User has to master domain concepts
Domain Specific (modeling) Languages (DSLs)
Meant to close gap between PROBLEM DOMAIN and
SOLUTION DOMAIN
Reduce the use of computation domain concepts
Focus on the domain concepts
4
5. Verification
Did I build the thing right?
Is the right product functionality provided? (from
language engineer understanding)
Focus is on the language
Validation
Did I build the right thing?
Is end user satisfied with product?
Focus is NOT on the language’s users
Should this be the other way around?
5
6. Increasingly popular
Raise the abstraction level (closer to the domain)
Narrow the design space
Several benefits claimed, in well-defined domains
Productivity gains
Better time to market
Avoid error-prone mappings between domain and software
development concepts
Leverage the expertise of domain experts
6
8. The capability
of a software
product to
enable
specified users
to achieve
specified goals
with:
effectiveness,
productivity,
safety and
satisfaction
in specified
contexts of use
[ISO IEC 25010]
8
9. Dynamic, structured information space that includes the
following entities
a model of the
User
Different knowledge sets
Characteristics chosen are dependent on application domain
the hardware-software
set of computing, sensing, communication, and interaction
resources
e.g. operating systems, memory size, network bandwidth, input and
output interaction devices
the social and physical
Platform
Environment
Where the interaction is actually taking place
Different languages may have different contexts of use
Their users are likely to have different knowledge sets
A minimum set of ontological concepts is required to use the
language
9
10. The user's view of the Quality of a product
Measured in terms of the result of using the
product, rather than its properties
10
11. Formal evaluation
Models and simulations to predict measures of usability
Some can be used before a prototype is available
Automatic evaluation
Automated conformance checking to guidelines and standards
Requires at least a prototype, or an initial version of the full
implementation
Empirical evaluation
Possible at any development stage
Requires users
Formative methods (e.g. think aloud) vs. Summative methods (using
metrics)
Heuristic evaluation
Evaluation conducted by experts (often before userrs are involved)
Without scenarios: reviews, inspections
With scenarios (task based): walkthroughs
11
14. • To evaluate, or not to evaluate
•
(aka “Should we?”)
• Facts are facts, even when
portrayed by statistics
•
(aka “Do we?”)
• How do language engineers
evaluate languages?
•
(aka “How can we?”)
• Language evaluation forensics
•
(aka “Life in the trenches”)
Barišić, Amaral, Goulão, and Barroca:
‘Evaluating the Usability of DomainSpecific Languages’, (IGI Global, 2012)
14
15. DSL development is hard
Requires domain and language development expertise
Many DSL development techniques
which should we use?
Costfull
No systematic aproach
No awarness of Software Language Engineering process
Challenges
Development of training materials
Support
Standardization
Maintenance
15
16. Evaluating candidate DSL
Evaluating candidate DSL
• Building/adopting DSL
• Building/adopting DSL
• Developing evaluation and training
• Developing evaluation and training
materials
materials
• Training/Evaluation
• Training/Evaluation
• Establishing a baseline for
• Establishing a baseline for
comparing performance with the
comparing performance with the
DSL
DSL
Not evaluating candidate DSL
Not evaluating candidate DSL
• Inability to estimate return on
• Inability to estimate return on
investment in the adoption of the
investment in the adoption of the
DSL
DSL
• What is the break even point?
• What is the break even point?
• What is the DSL’s impact on the
• What is the DSL’s impact on the
process quality?
process quality?
• What is the DSL´s impact on the
• What is the DSL´s impact on the
product quality?
product quality?
16
17. Simply NOT true…
e.g. Language Level has been around, and widely used, since 1996
Language evaluation has been a concern for many decades. For
instance,
“…the tools we are trying to use and the language or notation we are
using to express or record our thoughts are the major factors
determining what we can think or express at all! The analysis of the
influence that programming languages have on the thinking habits of
their users … give[s] us a new collection of yardsticks for comparing
the relative merits of various programming languages.”
[Dijkstra 1972]
17
18. But how will I know
Is Perl better than
why Python is better
Python?
than Perl?
The dark side of
If once you start
code
But Code! Yes. Perl.
beware of A
maintainability are
You will know.
down the dark
Terse no... no.
No... syntax...
programmer's
path, forever codeit
When your will
they.
strength flows from
more than one way
Quicker, easier,
you try theyyour
Easily to read six
dominate flow,
more code
to seductive.
do it...
months from now.
destiny, consume
quick to join you
default variables.
maintainability.
when code you
you it will.
write.
http://www.netfunny.com/rhf/jokes/99/Nov/perl.html
18
19. Language Qualities
Clarity, simplicity, and unity of language concept
Clarity of program syntax
Naturalness for the application
Support for data abstraction
Ease of program verification
Programming environment
Portability of programs
Cost of program execution
Cost of program translation
Cost of program creation, testing, and use
Cost of program maintenance
[Pratt 1984]
19
20. Language and its documentation qualities
Completeness of definition
Independence from hardware
Modularization and support for abstraction
Smallness of size
Conciseness and clarity of description
Implementation qualities
Reliability
Compilation speed
Efficiency of code
Predictability of execution cost
Compactness of compiled code
Simple and effective interface to environment
[Wirth 1984]
20
21. Language design and implementation criteria
Is the language formally defined?
Is the language unambiguous?
Human factors criteria
Do programmers easily write correct, understandable code in
the language?
How easy is the language to learn?
Software Engineering criteria
Support for quality attributes such as portability, reliability,
maintainability...
Availability of good tools and experienced programmers
Application domain criteria
How well does the language support programming for specific
applications?
[Howatt 1995]
21
22. Project-specific criteria
Even within a domain, specific projects will have
specific requirements
Criteria should be defined within projects
Criteria should have an evaluation richer than just yes/no,
e.g. (criterium, satisfaction score, importance score)
Relevance
External constraints are also relevant, e.g.
Legacy code
Use what everybody else is using (should be good, right?)
Language availability
Contractual obligations
[Howatt 1995]
22
23. In general, software language engineers do not
evaluate their languages with respect to their impact
in the software development process in which the
DSLs will be integrated
Or, if they do, they are extremely shy about it…
[Gabriel 2010]
23
24. Is there a concrete and detailed evaluation model to
measure DSLs Usability?
Is the DSL community concerned about experimental
evaluation as a mechanism to prevent future
problems emerging from the proposed DSLs?
To what extent does the DSL community present
evidence that the developed DSLs are easy to use and
correspond to end-users needs?
[Gabriel 2010]
24
25. RQ1: Does the paper report the development of a DSL?
RQ2: Does the paper report the DSL development process
with some detail?
RQ3: Does the paper report any experimentation
conducted for the assessment of the DSL?
RQ4: Does the paper report the inclusion of end-users in
the assessment of a DSL?
RQ5: Does the paper report any sort of usability
evaluation?
[Gabriel 2010]
25
27. Few papers (14%) report any sort
of evaluation
Even those provide too few
details
Too much tacit knowledge:
virtually impossible to replicate
evaluations and perform metaanalysis
Predominance of toy examples
Unsubstantiated claims to the
merits of DSLs
Poor characterization of subjects
involved in validation
How representative are they of
real DSL users?
27
34. • Introduce DSLs’ Usability evaluation during DSLs’ life-cycle
•
•
•
•
iterations
Design an effective experimental evaluation of DSLs that will provide
qualitative and quantitative feedback for DSLs developers
Produce user-centered design of DSL
Foresee the Quality of a DSL while in an iterative evolution step
Merge the Software Language development process with the
Usability Engineering process
[Barisic, 2011a]
34
35. Barišić, Monteiro, Amaral, Goulão, Monteiro:
"Patterns for Evaluating Usability of Domain-Specific
Languages“, InProceedings of the 19th Conference on
pattern languages of programs (PLoP), SPLASH 2012
Tucson, Arizona, USA, October 2012
35
36. Barišić, Monteiro, Amaral, Goulão, Monteiro:
"Patterns for Evaluating Usability of Domain-Specific
Languages“, InProceedings of the 19th Conference on
pattern languages of programs (PLoP), SPLASH 2012
Tucson, Arizona, USA, October 2012
36
37. Barišić, Monteiro, Amaral, Goulão, Monteiro:
"Patterns for Evaluating Usability of Domain-Specific
Languages“, InProceedings of the 19th Conference on
pattern languages of programs (PLoP), SPLASH 2012
Tucson, Arizona, USA, October 2012
37
42. Two types of physicists (graduated students) involved
Informed programmers (Inf) – regular users of
programming languages and they are used to program
with the present analysis framework
Uninformed programmers (non-Inf) - regular users of
programming languages and they are not used to
program with the present analysis framework
[Barisic2011b]
42
43. Features we wanted to have evaluated:
query steps in Pheasant vs. C++/BEE
expressing
a decay
specification of filtering conditions
vertexing and the usage of user-defined
functions
aggregation
path expression (navigation queries)
expressing the result set
the expressiveness of user-defined functions
[Barisic2011b]
43
44. Our evaluation technique was tested with two individuals
(two physics experts) in order to verify it and to test the
teaching materials and questionnaires
As time constrants and equipment turn out to be adequat
there was no need to change prepared materials
[Barisic2011b]
44
45. RQ1:Is querying with Pheasant more effective than with C+
+/BEE?
RQ2:Is querying with Pheasant more efficient than with C+
+/BEE?
RQ3:Are participants querying with Pheasant more confident on
their performance than with C++/BEE?
Our goal is to:
analyze the performance of Pheasant programmers plug-ins
for the purpose of comparing it with a baseline alternative (C+
+/BEE)
with respect to the efficiency, effectiveness and confidence of
defying queries in Pheasant
from the point of view of a researcher trying to assess the Pheasant
DSL,
in the context of a case study on selected queries
[Barisic2011b]
45
46. H1null Using Pheasant vs. C++/BEE has no impact on the
effectiveness of querying the analysis framework
H1alt Using Pheasant vs. C++/BEE has a significant impact on
the effectiveness of querying the analysis framework
H2null Using Pheasant vs. C++/BEE has no impact on the
efficiency of querying the analysis framework
H2alt Using Pheasant vs. C++/BEE has a significant impact on
the efficiency of querying the analysis framework
H3null Using Pheasant vs. C++/BEE has no impact on the
confidence of querying the analysis framework
H3alt Using Pheasant vs. C++/BEE has a significant impact on
the confidence of querying the analysis framework
[Barisic2011b]
46
47. We focus on presenting six examples, each focusing in some of the
features we chose to evaluate
Participants are asked to give themselves a mark for feeling of
correctness of their trial
Session take the time needed for each group to understand the examples
[Barisic2011b]
47
48. Every participant has four queries, specified in
English, to be rewritten in previously learned
language
Subjects makes self-assessment of his replay rating
his feeling of correctness
Example:
Build the decay of a D0 particle to a Kaon Pion
[Barisic2011b]
48
49. Query solution in Pheasant
Query solution in C++/BEE
(pseudo code based on real code)
49
50. The participants were asked to judge the
intuitiveness, suitability and effectiveness of the query
language. The goal was to evaluate:
Overall reactions
Query language constructs
Affect to query language was rated by:
Query language constructs
Participants’ comments
[Barisic2011b]
50
51. Results obtained with Pheasant were clearly better then
those with C++/BEE
Pheasant allowed non-programmers to correctly define
their queries.
The evaluation also showed a considerable speedup in the
query definition by all the groups of users that were using
Pheasant
The feed-back obtained from the users was that it is more
comfortable to use Pheasant than with the alternative.
[Barisic2011b]
51
52. Results obtained with Pheasant were clearly better then
those with C++/BEE
Pheasant allowed non-programmers to correctly define
their queries.
The evaluation also showed a considerable speedup in the
query definition by all the groups of users that were using
Pheasant
The feed-back obtained from the users was that it is more
comfortable to use Pheasant than with the alternative.
[Barisic2011b]
52
58. Literature
[Mernik2005] M. Mernik, J. Heering, and A. M. Sloane: When and how to develop
domain-specific languages, 2005, ACM Computing Surveys
[Gabriel2010] Gabriel, P., Goulão, M. & Amaral, V. (2010). Do Software Languages
Engineers Evaluate their Languages? in XIII Congreso Iberoamericano en "Software
Engineering" (CIbSE'2010)
[Barisic2011a] Barišić, A., Amaral, V., Goulão, M., and Barroca, B.: ‘Quality in Use of DSLs:
Current Evaluation Methods’. Proc. 3rd INForum - Simpósio de Informática
(INForum2011), Coimbra, Portugal, September 2011
[Barisic2011b] Barišić, A., Amaral, V., Goulão, M., and Barroca, B.: ‘Quality in Use of
Domain Specific Languages: a Case Study’. Proc. Evaluation and Usability of Programming
Languages and Tools (PLATEAU) Portland, USA, October 2011
[Barisic2011c] Barišić, A., Amaral, V., Goulão, M., and Barroca, B.: ‘How to reach a usable
DSL? Moving toward a Systematic Evaluation’, Electronic Communications of the EASST,
2011
[Barisic2012] Barišić, A., Amaral, V., Goulão, M., and Barroca, B.: ‘Evaluating the Usability
of Domain-Specific Languages’, in Mernik, M. (Ed.): ‘Formal and Practical Aspects of
Domain-Specific Languages: Recent Developments’ (IGI Global, 2012)
[Barisic2013] Barišić, A: ‘Evaluating the Usability of Domain-Specific Languages’, in
Mernik, M. (Ed.): ‘Formal and Practical Aspects of Domain-Specific Languages: Recent
Developments’ (IGI Global, 2012)
58