Testing survey

001 Test generation through programming in UDITA

Milos Gligoric, Tihomir Gvero, Vilas Jagannath, Sarfraz Khurshid, Viktor Kuncak, Darko Marinov.
Test generation through programming in UDITA. Proceedings of the 32nd ACM/IEEE
International Conference on Software Engineering - Volume 1, ICSE 2010, Cape Town, South
Africa, 1-8 May 2010.

摘要

背景

基于规格说明的测试生成（systematic test generation based on specifications），符号执行
（symbolic execution），具体执行的混种（hybrids with concrete executions）。

systematic test generation based on specifications [5, 24]
symbolic execution [8, 26]
hybrids with concrete executions [6, 9, 13, 20, 25, 34, 35, 38]

关键字

test generation

问题

现代的符号执行技术局限于测试单元的代码行数（不能处理成千上万行的单元），产生输入
比 Java 程序源码简单太多（这里隐含的意思是我们不能为编译程序自动生成测试的输入）。

解决

本文提出了一个描述测试的实现，使用了非确定性的测试用例生成程序（test generation
programs）。UDITA，基于非确定性选择符和生成链结构。文章也描述生成具体测试的新算
法，该算法有效地探索了所有非确定性 UDITA 程序执行的空间。

贡献

1. 描述测试的新语言
2. 新的测试生成算法

3. 实现
4. 评价方法

评价

文章使用了黑盒的方法评价 UDITA。
具体的数据有：
1. 数据结构（data structures），与 JPF 的有界测试生成进行对比实验，六组，分别为：
a) DAG
b) HeapArray
c) NQueens
d) RBTree
e) SearchTree
f) SortedList
2. 重构引擎（refactoring engines）
，与 ASTGen[11]进行对比实验；
3. JPF，用 UDITA 测试 UDITA 的某些部分。
4. 比较 UDITA 与 Pex [38]。

同时，使用 UDITA，作者发现了 Eclipse，NetBeans，Sun javac，JPF 中的一些 bug。

原文摘抄

Recent techniques aim to reduce the burden of manual testing using systematic test generation
based on specifications [5, 24] or on symbolic execution [8, 26] and its hybrids with concrete
executions [6, 9, 13, 20, 25, 34, 35, 38]. Modern (hybrid) symbolic execution techniques can
handle advanced constructs of object-oriented programs, but practical application of these
techniques were largely limited to testing units of code much smaller than hundred thousand lines,
or generating input values much simpler than representations of Java programs.

Modern (hybrid) symbolic execution techniques can handle advanced constructs of object-oriented
programs, but practical application of these techniques were largely limited to testing units of code
much smaller than hundred thousand lines, or generating input values much simpler than
representations of Java programs. Automatically handling programs of the complexity of a
compiler remains challenging for current systematic approaches.

We performed several experiments to evaluate UDITA. UDITA is most applicable for black-box
testing. The first set of experiments, on six data structures, compares delayed choice with base JPF
for bounded-exhaustive test generation. The second set of experiments, on testing refactoring
engines, compares UDITA with ASTGen [11]. The third set of experiments uses UDITA to test
parts of the UDITA implementation itself. UDITA can be also used for white-box testing. The

fourth set of experiments compares UDITA with symbolic execution in Pex [38]. We ran the
experiments
on an AMD Turion 1.80GHz laptop with Sun JVM 1.6.0 12, Eclipse 3.3.2, NetBeans 6.5, and Pex
0.19.41110.1.

评述

页数重要性理解度最后阅读
10 一般一般 2010 年 9 月 22 日

Question Result Validation
Method/Means Technique Analysis | Experience

002 Automatic system testing of programs without test

oracles

Murphy, C., Shen, K., and Kaiser, G. 2009. Automatic system testing of programs without test
oracles. In Proceedings of the Eighteenth international Symposium on Software Testing and
Analysis (Chicago, IL, USA, July 19 - 23, 2009). ISSTA '09. ACM, New York, NY, 189-200.
DOI= http://doi.acm.org/10.1145/1572272.1572295

摘要

背景

蜕变测试（Metamorphic Testing）用于保证软件质量，简单有效，并且不必使用标准输出
（oracle），特别是对于哪些随意的输入，我们并不知道标准输出。在蜕变测试中，已存在的
测试用例的输入被修改以产生新的测试用例，同时，也要根据原输出产生新的输出，例如原
输入 x 产生原输出 f(x)；然后，我们创建新的 x’，并根据 f(x)预测出 f(x’)。

问题

蜕变测试常常能加强测试用例。对于大型的输入数据集，测试用例输入的变化常常是非常费
力的，另外对于不是人能够阅读的格式，测试用例输入的手工变化是不可能的。并且对于一
些大型的输出，比较输出结果也容易出错，应该大型输出的格式可能不是确定的。

解决

本文提出自动蜕变系统测试（Automated Metamorphic System Testing）。这包括在系统层面自
动化蜕变测试，通过在一次执行后检查这个应用的蜕变特性（metamorphic properties）。测试
员可以简单的设置和构造蜕变测试，在基于没有手动介入的情况下。测试可以在最小化用户
参与的情况下持续地进行。此外，文章实现启发式蜕变测试（Heuristic Metamorphic Testing），
它可以减少错误的方向然后定位出一些不确定的情况。文章同时也描述了一种实现框架——
Amsterdam，用来表示实验研究的结果，证明我们技术的效力，在实际运用中不使用期待输
出（oracle）。

贡献

1. 提出自动蜕变系统测试（Automated Metamorphic System Testing）。这包括在系统层面自
动化蜕变测试，通过在一次执行后检查这个应用的蜕变特性（metamorphic properties）。
2. 描述了一种实现框架——Amsterdam。
3. 实现启发式蜕变测试（Heuristic Metamorphic Testing），它可以减少错误的方向然后定位
出一些不确定的情况。

4. 在某些不可测试程序上（机器学习领域）
，做了实验性的研究。

评价

在某些不可测试程序上（机器学习领域）
，做了实验性的研究：
1. 分类算法 SVM（Support Vector Machines）[35]，在 Weka [38]中的实现。
2. 算法 C4.5[31]，一种使用决策树（decision tree）的分类算法。
3. MartiRank[13]算法，哥伦比亚大学计算学习中心 CCLS 开发。

原文摘抄

评述

11 重要一般 2010 年 9 月 22 日

Characterization Technique Analysis

003 Detecting Atomic-Set Serializability Violations in

Multithreaded Programs through Active Randomized

Testing

Lai, Z., Cheung, S. C., and Chan, W. K. 2010. Detecting atomic-set serializability violations in
multithreaded programs through active randomized testing. In Proceedings of the 32nd
ACM/IEEE international Conference on Software Engineering - Volume 1 (Cape Town, South
Africa, May 01 - 08, 2010). ICSE '10. ACM, New York, NY, 235-244. DOI=
http://doi.acm.org/10.1145/1806799.1806836

摘要

背景

并行程序缺陷难以被侦测，这是由于线程执行顺序组合数巨大。同时只有极少的执行顺序能
够揭示程序缺陷。

问题

原子集可串行性(Atomic-set Serializability)代表了很大一部分的并行缺陷，包括数据种类
（data races）和原子违背（atomicity violations）
。

解决

本文试图构造一种两步测试技术，它可以有效的侦测原子集可串行性（ Atomic-set
Serializability）违背。
第一步，我们的技术推断并没有出现在具体执行中的潜在的违背，并减少没有违背
（violation-free）的线程交叉（interleavings）。
第二步，本文的技术主动控制一个线程的任务管理器（scheduler），枚举在第一步中标记出
的潜在方案，用以查看真实的违背。
本文实现了一个名为 ASSETFUZZER 的原型，优于之前的 RACEFUZZER 和 ATOMFUZZER。

贡献

1. 开发出了一种动态分析技术，可以用来推断潜在的原子操作的串行违背。
2. 主动随机测试技术，用以跟踪程序执行中违反原子可串行性的行为。
3. 在 Java 多线程程序中做了实验。

评价

在 13 个 Java 多线程程序中做了实验：
StringBuffer ArrayList LinkedList HashSet TreeSet LinkedHashSet moldyn raytracer montecarlo
cache4j hedc weblech jigsaw

ASSETFUZZER (AsF)与以下技术做了对比：
1. Normal
2. RM，a runtime monitoring technique [9] [28]
3. RACEFUZZER(RF) [27]
4. ATOMFUZZER (AtF) [22]

原文摘抄

评述

10 不重要一般 2010 年 9 月 22 日

Method/Means Technique Analysis

004 From behaviour preservation to behaviour modification:

constraint-based mutant generation

Steimann, F. and Thies, A. 2010. From behaviour preservation to behaviour modification:
constraint-based mutant generation. In Proceedings of the 32nd ACM/IEEE international
Conference on Software Engineering - Volume 1 (Cape Town, South Africa, May 01 - 08, 2010).
ICSE '10. ACM, New York, NY, 425-434. DOI= http://doi.acm.org/10.1145/1806799.1806862

摘要

背景

变异分析（mutation analysis）的效力很大程度基于它变异程序（mutate programs）的能力，
经过变异的程序仍然要可执行并且呈现出异常的行为（deviating behaviour）。

问题

形成器（former）需要知道此编程语言的语法与静态语义，甚至需要知道它的动态语义，例
如，表达式怎样被计算。

解决

文章提出了一种智能的实现，能够产生语法以及语义都是正确并且表现出不同行为的程序。
本实现基于作者之前基于约束的重构工具（constraint-based refactoring tools）
，还有破除行为

保存约束（negating behaviour-preserving constraints）的工作。为了证明概念，文章拓展了访
问修饰符变化算子（Access Modifier Change Operator），它会产生了不能编译或者行为未改
变的变异程序。
虽然文章不能保证生成的变异程序是非等价的，但是文章证明了在可接受范围内的减少变异
程序，会带来实质上暂时的节约。

评价

使用在几个开源程序上：
1. JUnit
2. JHotDraw
3. Draw2D
4. Jaxen
5. HTMLParser

原文摘抄

评述

10 重要一般 2010 年 9 月 22 日

Method/Means Technique Analysis, Persuasion

005 Is operator-based mutant selection superior to random

mutant selection?

Zhang, L., Hou, S., Hu, J., Xie, T., and Mei, H. 2010. Is operator-based mutant selection superior
to random mutant selection?. In Proceedings of the 32nd ACM/IEEE international Conference on
Software Engineering - Volume 1 (Cape Town, South Africa, May 01 - 08, 2010). ICSE '10. ACM,
New York, NY, 435-444.

摘要

背景

由于编译和执行大量变异程序需要代价，在变异测试和分析中，常常需要选择变异程序的一
个子集。

问题

大多数变异测试选择（mutant selection）的研究关注基于算子的变异测试选择，例如，确定
一系列足够的变异算子，选择被这些变异算子产生的变异程序。近期，研究人员开始补充
（leverage）统计的分析（statistical analysis）去确定足够充分的变异算子，基于变异程序的
执行信息。但是，使用诡异技术（sophisticated techniques）选择的变异程序是否优于随机选
择的变异程序还是一个未知问题。

解决

本文通过实验的方法调研了这个开放问题。对比了三种代表性的基于算子的编译程序选择技
术，还有两种随机技术。

贡献

1. 研究评价了近期的三个基于算符的变异选择技术（mutant-selection techniques）
a) Offutt et al. [31]
b) Barbosa et al. [4]
c) Siami Namin et al. [37]
2. 第一个关注基于算符的变异选择稳定性（stability of operator-based mutant selection）的
实验研究
3. 研究了新的两步随机技术（two-round random technique）。
4. 实验集比之前随机变异选择都要大，因为变异选择技术（mutant-selection techniques）
的实验开销大，Siemens 程序集已经是目前最大的实验集了。

评价

实验结果显示基于算符突变选择并不比随机突变选择来得优秀。结果显示随机变异选择可以
是一种更好的选择，基于个别变异的变异选择值得进一步的调查。实验基于 The Siemens
programs。

原文摘抄

The subjects used in our study are the Siemens programs. The Siemens programs include seven C
programs whose numbers of net lines of code (not counting whitespace or commenting lines)
range from 137 to 513. Hutchins et al. [20] first introduced the Siemens programs in 1994, and
since then many researchers (e.g., Rothermel et al. [34, 35], Elbaum et al. [14], Li et al. [24], Jones
et al. [22], and Andrews et al. [3, 37]) used the Siemens programs as subjects in testing
experiments. In particular, a recent study on mutant selection by Siami Namin et al. [37] used only
the Siemens programs as subjects.
Similar to Siami Namin et al. [37], we considered the following three reasons when choosing our
subjects. First, the Siemens programs contain typical structures that also appear in various large
programs in C. Second, there is a large test pool for each of the Siemens programs. As measuring
the effectiveness of selected mutants relies on the use of different test suites (see Section 2.5 for
the details of measurement in our study), a large test pool allows us to construct a large number of
test suites containing different test cases. Third, as Proteum generates a large number of mutants
for even a small program, using programs significantly larger than the Siemens programs as
subjects may result in huge computational cost.

评述

文章还对变异测试选择（mutant selection）的实验评价方法进行了非常详细而有意义的综述。
在 2.4 Subject Programs 一节。
我们从此综述中可以了解到正在做变异测试选择（mutant selection）研究的学者。

10 重要一般 2010 年 9 月 22 日

Characterization Analytic Model Analysis

006 Using symbolic evaluation to understand behavior in

configurable software systems.

Reisner, E., Song, C., Ma, K., Foster, J. S., and Porter, A. 2010. Using symbolic evaluation to
understand behavior in configurable software systems. In Proceedings of the 32nd ACM/IEEE
international Conference on Software Engineering - Volume 1 (Cape Town, South Africa, May 01
- 08, 2010). ICSE '10. ACM, New York, NY, 445-454. DOI=
http://doi.acm.org/10.1145/1806799.1806864

摘要

背景

许多现代软件系统被实现成高可配置的，提升了灵活性。

问题

但是高可配置让程序难以测试，分析，理解。

解决

本文首次提出关于配置选项影响程序行为的实验研究。文章推测，在某个抽象层次，配置空
间会远远小于最坏情况，并且空间中每一个配置都是独特的。

评价

文章同时研究了三种可配置的软件系统来评价这个推测，它们是 vsftpd, ngIRCd, and grep。
在实验中，使用符号化执行（symbolic evaluation）来发掘配置选项如何影响代码行，基本
块，边，条件（line, basic block, edge, and condition coverage）的覆盖信息。
结果显示，课题程序，测试集，配置选项（subject programs, test suites, and configuration
option），在上面的四个覆盖标准，配置空间实际上小于组合学（combinatorics）中的推导，
同时配置空间有效地由许多细小的，自包含的选项组组成。

原文摘抄

评述

10 不重要一般 2010 年 9 月 21 日

Characterization Analytic Model Analysis

007 Maintaining and evolving GUI-directed test scripts.

Grechanik, M.; Qing Xie; Chen Fu; , "Maintaining and evolving GUI-directed test scripts,"
Software Engineering, 2009. ICSE 2009. IEEE 31st International Conference on , vol., no.,
pp.408-418, 16-24 May 2009

摘要

背景

手动测试 GUI 应用程序 (GAPs)乏味而又费力。
测试工程师使用脚本来实现 GUI 测试自动化，脚本操作在 GUI 对象（GUI objects）之上，
重复的运行。

问题

新的 GAPs 发布后，因为 GUIs 的修改，导致脚本失效。

解决

文章提出了一种新实现，用以维护和更新测试脚本，因此就能测试新版本的 GAPs。

评价

Case Study——45 个专业程序员与测试工程师来评价此系统
统计学上，结果显著。以手动测试作为基准（baseline），对比业界主流软件，使用者在使用
我们的工具时，更容易找到错误，并且错误报告更少。
本文的工具是轻量级的，能够在少于 8 秒钟的情况下分析大约 1KLOC 的测试脚本。

原文摘抄

评述

问题

这里是自动生成更新的测试脚本么？

10 一般一般 2010 年 9 月 21 日


008 MINTS: A general framework and tool for supporting

test-suite minimization.

Hwa-You Hsu; Orso, A.; , "MINTS: A general framework and tool for supporting test-suite
minimization," Software Engineering, 2009. ICSE 2009. IEEE 31st International Conference on ,
vol., no., pp.419-429, 16-24 May 2009

摘要

背景

测试集最小化技术（Test-suite minimization techniques）旨在消除多余的测试用例。这种消除
通常基于某些标准：例如说覆盖信息，错误侦测能力。

问题

大多数现存的技术有两个限制：仅仅基于一个标准；产生不理想的结果。

解决

本文提出了一种测试集最小化的框架，克服了以上限制。
测试员可以：
1. 简单地编码各种的最小化问题的特征（a wide spectrum of test-suite minimization
problems）
2. 处理任意数量的标准
3. 计算最佳结果，使用了杠杆现代整数线性编程解答器（leveraging modern integer linear
programming solvers）
作者在一个工具 MINTS 中实现了这个框架，它是免费可用，并且提供许多新式求解器的接
口。

贡献

1. 通用的测试集最小化框架，适合任意数量的标准（criteria）
。
2. 一个实现此框架的原型工具，可以与 ILP 解释器无缝的衔接。
3. 实验研究评价了大量的程序，测试用例，最小化问题，求解器。

评价

实验评价显示 MINTS 用于测试集最小化技术（Test-suite minimization techniques）可以有效
的找到理想的结果。
基于三个问题：
1. 在可接受的时间内，MINTS 找到最优解的频率是多少？
2. MINTS 与一种启发式实现对比，效果怎样？
3. 使用某一种的求解器对 MINTS 的效果会产生多大的影响？
数据集：
1. 七个程序来自西门子的数据集（Siemens suite）
2. 三个真实程序代码及其测试集：flex, LogicBlox, and Eclipse

原文摘抄

评述

11 重要较理解 2010 年 9 月 21 日


009 WISE: Automated test generation for worst-case

complexity

Burnim, J.; Juvekar, S.; Sen, K.; , "WISE: Automated test generation for worst-case complexity,"
Software Engineering, 2009. ICSE 2009. IEEE 31st International Conference on , vol., no.,
pp.463-473, 16-24 May 2009

摘要

背景

程序分析和自动测试用例生成首先用来查找那些关乎正确性的缺陷。

问题

是否可以自动化测试其余软件特性呢，例如复杂度？

解决

文章提出了复杂度测试（complexity testing），一种新颖的测试生成技术用以找到性能上缺陷
（performance bugs）。作者的复杂度测试算法 WISE，基于符号执行的最坏输入（worst-case
inputs from symbolic execution）用来控制接受任意输入的程序。
，对于每一个输入大小，WISE
尝试构造能够展现出最坏计算复杂度的执行。WISE 对使用测试生成技术（test generation）
产生小型输入，并把其执行数据导入一个输入产生器（input generator）。这个产生器随后用
来产生更大输入，用以实现最坏情况。

评价

在标准的数据结构和算法上做实验。只能在其中几个实现最坏输入。
Benchmark 包含（来自 JDK，或者第三方）：
1. Sorted Linked-List insert
2. Heap insert、
3. Red-Black Tree search
4. Quicksort
5. Binary Search Tree
6. Mergesort
7. Bellmn-Ford
8. Dijkstra’s
9. Traveling Salesman

原文摘抄

评述

Introduction 部分有 test generation 文献分类。

11 重要一般 2010 年 9 月 22 日

Characterization Technique Analysis

010 Taint-based directed whitebox fuzzing

Ganesh, V.; Leek, T.; Rinard, M.; , "Taint-based directed whitebox fuzzing," Software Engineering,
2009. ICSE 2009. IEEE 31st International Conference on , vol., no., pp.474-484, 16-24 May 2009

摘要

背景

白盒模糊测试技术（white box fuzzing technique）
。

问题

标准的模糊测试技术，随机地改变部分程序输入，并不知道输入的句法结构（syntactic
structure）。

解决

本文提出了一种新型的白盒测试模糊技术，BuzzFuzz。BuzzFuzz 实现了动态的污点跟踪
（dynamic taint tracing）用以自动定位原始输入中，
，对程序关键部位有影响的某些输入局部。
通过对这些标记出的局部进行模糊，BuzzFuzz 自动产生新的模糊输入（fuzzed test input
files）。因为此技术保护了潜在的输入句法结构，产生的输入更容易通过软件中初始解析输
入的组件（the initial input parsing components），这样就能检验软件更深层更核心的部分。

贡献

技术上，新型的基于污点的模糊测试技术。
结果上，基于两个开源应用，发布出描述本文技术实验结果。

评价

作者使用 BuzzFuzz 自动的找到了两个开源 C 程序（Swfdec，MuPDF）的错误。
作者使用的 Flash，PDF 集来自于：
Pedram Amini: Adobe Flash files
Adam Kiezun of MIT: Adobe PDF files

结果显示 BuzzFuzz 更能暴露大型软件更深层的错误。更适合有复杂输入格式的应用。

原文摘抄

评述

11 重要一般 2010 年 9 月 22 日


011 Testing pervasive software in the presence of context

inconsistency resolution services

Lu, H., Chan, W., and Tse, T. 2008. Testing pervasive software in the presence of context

inconsistency resolution services. In Proceedings of the 30th international Conference on Software
Engineering (Leipzig, Germany, May 10 - 18, 2008).

摘要

背景

普适计算软件（Pervasive computing software）可以根据上下文来改变它的行为。上下文常
常很多噪音，因此需要上下文的不一致性解析。

问题

普适软件中的错误，即在解析上下文中产生了错误的结果。

解决

本文研究普适软件中的错误怎样被服务影响的。作者建了一套模型，持续解析上下文不一致
性，开发出了一套数据流公式，去分析潜在影响，形式化了此类软件测试充分性标准。

贡献

1. 提出了研究 CIR 服务与上下文相关应用的数据流公式框架。
2. 提出了一种新的测试充分性标准，适合基于上下文与 CIR 服务的应用。
3. 构造了第一个评价此类研究的实验。

评价

与随机测试做对比。
使用 Cabot [23, 24]作为测试平台

原文摘抄

评述

10 一般不理解 2010 年 9 月 22 日

Method/Means Technique, Analytic Model Analysis

012 ARTOO: adaptive random testing for object-oriented

software

Ciupa, I., Leitner, A., Oriol, M., and Meyer, B. 2008. ARTOO: adaptive random testing for

object-oriented software. In Proceedings of the 30th international Conference on Software
Engineering (Leipzig, Germany, May 10 - 18, 2008). ICSE '08.

摘要

背景

评价测试的标准包含了测试用例找到多少错误，速度多快。在随机测试中，如果产生的输入
平均分布在输入域中，随机测试的效力就会提高。
自适应随机测试 ART（Adaptive Random Testing）适合类似于数字的输入，不同输入的距离
可以计算出来，随机生成的输入就能平均的分布在输入域中。

问题

ART 的思想可以拓展到测试面向对象软件中么？

解决

本文提出基于一种对象之间的距离（distance between objects）的方法 ARTOO。

贡献

1. ARTOO 的实现，在 AutoTest 工具中。
2. 与朴实方法比，做实验对比评价。
3. 提出一些想法，推广对象距离（object distance）的概念，例如对象聚类，对象集成（object
clustering and integration）。

评价

与朴实的随机方法 RAND 比较，ARTOO 减少了第一个错误被发现之前的测试次数，一些例
子中较少了两个数量级。同时，ARTOO 比朴实随机方法 RAND 发现了更多错误。
实验集为 EiffelBase 库[2]中的类：ACTION_SEQUENCE，ARRAY，ARRAYED_LIST，
BOUNDED_STACK，FIXED_TREE，HASH_TABLE，LINKED_LIST，STRING。

原文摘抄

评述

10 一般一般 2010 年 9 月 22 日


013 Time will tell: fault localization using time spectra

Yilmaz, C., Paradkar, A., and Williams, C. 2008. Time will tell: fault localization using time
spectra. In Proceedings of the 30th international Conference on Software Engineering (Leipzig,
Germany, May 10 - 18, 2008). ICSE '08. ACM, New York, NY, 81-90.

摘要

背景

调试（debugging），错误定位（fault localization technique）
。

问题

时间特性（Time spectra）过去用于性能调试，那能不能用于功能性正确性的调试呢？例如，
查看代码的片段是否花费了“可疑”的时间。

解决

收集失败和成功运行的时间特性信息，用成功运行的信息对程序行为建模，再用失败运行的
信息标记错误疑似度。

评价

在三个真实项目中（nanoXML, XMLsecurity, and ant）实验，有效减少了疑似错误的数目。
评价指标如摘抄。

原文摘抄

The columns of this table depict the subject application used, the average runtime overhead of
collecting and persisting time spectrum for a run, the average number of method invocations
encountered in passing and failing runs, the average time needed to sample data tables, the
average time needed to create observed behavior model for a method, and the average time needed
to diagnose a failure, respectively. Note that model creation times are given per method whereas
the diagnosing times are given per failure.

评述

10 一般一般 2010 年 9 月 23 日


014 Precise memory leak detection for java software using

container profiling

Xu, G. and Rountev, A. 2008. Precise memory leak detection for java software using container
profiling. In Proceedings of the 30th international Conference on Software Engineering (Leipzig,
Germany, May 10 - 18, 2008). ICSE '08. ACM, New York, NY, 151-160.

摘要

背景

Java 程序中的内存泄漏（memory leak）是指，当对象引用不再被使用，对象却被不必要地
维护的内存中。

问题

静态分析不能准确识别多余的引用，动态分析难以解释，准确率有限。

解决

作者提出一种基于容器的堆跟踪技术（container-based heap-tracking），基于观察 Java 程序中
由于容器保持着引用的内存泄漏。此技术分两步：1.只跟踪容器，直接标记泄漏源码；而不
是分析未使用的引用；2.根据内存消费和元素的陈旧度的组合，为每个容器计算置信度。

贡献

1. 动态分析每个容器置信度，提供容器的标记和排序。
2. 基于置信度的 Java 的内存泄漏探测技术。
3. 内存泄露标记和运行时表现的实验研究。

评价

Sun 报告的两个 bugs 和 SPECjob 中的一个著名 bug。
首先两个 bugs 来自于 The Sun bug database [23],Java AWT/Swing bugs，JDK 中大约有一半的
bugs 来自于 AWT 和 Swing。这两个 bugs 编号为#6209673 和 #6559589。

评述

10 一般一般 2010 年 9 月 23 日


015 The effect of program and model structure on mc/dc test

adequacy coverage

Rajan, A., Whalen, M. W., and Heimdahl, M. P. 2008. The effect of program and model structure
on mc/dc test adequacy coverage. In Proceedings of the 30th international Conference on Software
Engineering (Leipzig, Germany, May 10 - 18, 2008). ICSE '08. ACM, New York, NY, 161-170.

摘要

背景

航空以及其他关键领域，测试集的充分性（adequacy of test suites）可以用 MC/DC 矩阵衡量。
MC/DC 矩阵建立在源码或者模型驱动开发的模型之上。

问题

MC/DC 矩阵与其程序实现的结构相关，
可能会误导测试充分性标准（test adequacy criterion）。

解决

本文使用实验方法调查了关于程序结构影响的猜想。

评价

作者使用了 6 个实际的民用航空（civil avionics domain）系统和两个玩具程序。
对于每个系统，我们使用了它的两个版本：使用和不使用内联（inline）。为了估计 MC/DC
的敏感性，作者首先生成满足非内联实现的 MC/DC 的测试集，然后再把测试集拿去有内联
的程序上跑。结果内联版本的 MC/DC 降低了 29.5%，在 5%的统计显著性级别（statistical
signicance level）。

评述

10 一般一般 2010 年 9 月 23 日

Evaluation Analytic Model Analysis

016 Static detection of cross-site scripting vulnerabilities

Wassermann, G. and Su, Z. 2008. Static detection of cross-site scripting vulnerabilities. In
Proceedings of the 30th international Conference on Software Engineering (Leipzig, Germany,
May 10 - 18, 2008). ICSE '08. ACM, New York, NY, 171-180.

摘要

背景

互联网应用（web application）有许多缺陷，例如跨站攻击脚本 XSS（cross-site scripting），
一个攻击者利用浏览器端对服务器的信任，让浏览器执行某些被注入的脚本。2006 年起，
XSS 成为了最流行的攻击。

问题

XSS 漏洞是由于对于输入的验证不足。最常见的扫描 XSS 漏洞的方法是基于污点
（taint-based），常常会漏掉真实的漏洞，又有许多错误报告。

解决

本文提出一种静态分析查找 XSS 漏洞方法，直接定位出脆弱或者不充分的输入验证。作者
的工作结合了污点信息流与字符串分析。正确的输入验证非常复杂，因为有多种方法去调用
JavaScript 解释器。作者基于 W3C 的建议，Firefox 的源码，以及闭源浏览器的网上教程，
形式化了一套策略来定位漏洞。

贡献

1. 提出了一种查找 XSS 漏洞的实现，用以检测不充分的输入验证。
2. 提出了一种基于布局引擎行为的算法，用以检测 HTML 文档中的不被信任的脚本。
3. 评价了某些真实的 PHP 互联网应用，证明此工具能够用于大型互联网应用，找到了已
知的或者未知的验证不足错误。

评价

作者首先提出三个问题：大型互联网应用，检测验证代码，输入错误共性。
然后使用七个开源项目来做实验： Claroline，FishCart，GecBBLite，PhPetition，PhPoll，Warp，
Yapig。实验统计了时间，是否找到错误。

原文摘抄

In evaluating our implementation, we sought to answer the following questions:
1. How well does it scale on large, real-world web applications?
2. How well does it check manually written input validation code?
3. How common are manual input validation errors?

评述

10 一般一般 2010 年 9 月 23 日


017 Korat: automated testing based on Java predicates

Boyapati, C., Khurshid, S., and Marinov, D. 2002. Korat: automated testing based on Java
predicates. In Proceedings of the 2002 ACM SIGSOFT international Symposium on Software
Testing and Analysis (Roma, Italy, July 22 - 24, 2002). ISSTA '02. ACM, New York, NY, 123-133.

摘要

关键字

test generation

背景

基于规格说明的测试生成（test generation based on specifications）。

问题

能够根据规格说明来自动化测试么？

解决

提出一套规范，基于前置条件生成若干用例，根据后置条件预测输出，自动执行测试用例。
该技术的核心在于测试生成（test generation），它根据前置条件，生成输入的范围生成测试。
为了提高效率，该技术还通过检测用例的执行，对搜索空间进行剪枝。

贡献

产生非同构的测试输入，使用了回溯搜索，监听并剪枝的方法。

评价

实验基于若干数据结构，其中一些来自 Java Collections Framework：BinaryTree，HeapArray，
LinkedList，TreeMap，HashSet，AVTree。并于另一个测试框架 Alloy Analyzer 对比。Korat
产生测试用例更快（很奇怪，这里只评价了产生测试的速度，测试用例的评价还有其他方法
么？）。

评述

11 重要比较理解 2010 年 9 月 23 日

Method/Means | Evaluation | Technique | Analytic Model Analysis | Persuasion |
Characterization Experience

Testing survey

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Viewers also liked

Viewers also liked (9)

Similar to Testing survey

Similar to Testing survey (20)

More from Tao He

More from Tao He (9)

Testing survey