automated software transplantation

28
Automated Software Transplantation QRS 2016 Talk by Mark Harman PhD work by Alexandru Marginean CREST, University College London Collaborators Earl Barr, Yue Jia, Justyna Petke

Upload: others

Post on 12-Nov-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Automated Software Transplantation

QRS 2016 Talk by Mark Harman

PhD work by Alexandru Marginean

CREST, University College London

Collaborators Earl Barr, Yue Jia, Justyna Petke

Automated Software Transplantation

QRS 2016 Talk by Mark Harman

PhD work by Alexandru Marginean

CREST, University College London

Collaborators Earl Barr, Yue Jia, Justyna Petke

Video Player

Start from scratch

Why Autotransplantation?Check open

source repositories Why not handle H.264?

~100 players

Why Autotransplantation?

Kate

Start from scratch

Check open source repositories

C Call Graph?

C Indentation?

BBC World Service Interview!WIRED article!Many shares on Social Media!ACM Distinguished Paper Award at ISSTA ‘15

Recognition for!Our Tool μScalpel

AutotransplantationAutomatic Error Fixing

In-Situ Code ReusalManual Code Transplants

Related Work

Clone Detection Code Migration Dependence Analysis

Feature Location Code Salvaging Feature Extraction

In-Situ Code Reuse

Synchronising Manual

Transplants

Automatic Replay Copy-

Paste

AutotransplantationAutomatic Error Fixing

Related WorkManual Code TransplantsIn-Situ Code Reusal

Running Program

Debugger

Binary Organ

Miles et al.: In situ reuse of logically extracted functional components

Autotransplantation

Manual Code Transplants In-Situ Code ReusalAutomatic Error Fixing

Related Work

Host Donors

Sidiroglou-Douskos et al.: Automatic Error Elimination by Multi-Application Code Transfer

Related WorkManual Code Transplants In-Situ Code Reusal

Automatic Error Fixing

Autotransplantation

Human Organ Transplantation

Automated Software Transplantation

Host

Donor !!

OrganOrgan

ENTRY

Implantation Point

V

Organ Test Suite

Manual Work:!!

Organ Entry !!Organ’s Test Suite !!!Implantation Point

μTransHost

Donor

Stage 1: Static Analysis

Host Beneficiary

Stage 2: Genetic

Programming

Stage 3: Organ

Implantation

Organ Test Suite

Implantation Point

Organ Entry

Stage 1 — Static Analysis Donor

OEENTRY

VeinOrgan

Matching Table

Dependency Graph

Host

Implantation Point

Stm: x = 10; -> Decl: int x;Donor: int X -> Host: int A, B, C

Stage 2 — GPS1 S2 S3 S4 S5 … Sn

Matching Table !

V3H V4H

Donor Variable ID

Host Variable ID (set)

V1D

V2D

V1H V2H

V5H

Individual

Var

Mat

chin

gSt

atem

ent

s

V1D V1H

V2D V4H

S1 S7 S73

M1:M2:

…Genetic Programming

Compilation

Week !Proxies

Strong !Proxies

Fitness Function:

Stage 2 - Gp Operators

Individual

Var

Mat

chin

gSt

atem

ent

s

V1D V1H

V2D V4H

S1 S7 S73

M1:M2:

Replace Mapping

Matching Table !Donor

Variable IDHost Variable

ID (set)

V1D V1H V2HV2H

Stage 2 - Gp Operators

Individual

Var

Mat

chin

gSt

atem

ent

s

V1D V1H

V2D V4H

S1 S7 S73

M1:M2:

Replace StatementS1 S2 S3 S4 S5 … Sn

S7

S3

Stage 2 - Gp Operators

Individual

Var

Mat

chin

gSt

atem

ent

s

V1D V1H

V2D V4H

S1 S7 S73

M1:M2:

S1 S2 S3 S4 S5 … SnRemove

Statement

Stage 2 - Gp Operators

Add StatementS1 S2 S3 S4 S5 … Sn

Individual

Var

Mat

chin

gSt

atem

ent

s

V1D V1H

V2D V4H

S1 S7 S73

M1:M2:

S3

Stage 2 - Gp Operators

Crossover Operator

Individual 1

S1 S7

M1 M2

Individual 2

S3 S9

M3 M4

Offspring 1

Offspring 2

Offspring 3

Random Mapping Selection

M1

S1

M4

S9

M3

S3

M2

S7S1 S7

S3 S9

M3

M2

Host

Organ

Donor

RQ1: Do we break the initial functionality?

RQ2: Have we really added new functionality?

RQ3: How about the computational effort?

RQ4: Is autotransplantation useful?

Research QuestionsRegression TestsAcceptance Tests

Research QuestionsRQ1: Do we break the initial functionality?

RQ3: How about the computational effort?

RQ4: Is autotransplantation useful?

RQ2: Have we really added new functionality?

Empirical Study !

15 Transplantations 300 Runs 5 Donors 3 Hosts

Case Studies: !

H.264 Encoding Transplantation;

Kate - call graph generation & C indentation;

ValidationRegression

Tests

Augmented Regression

Tests

Donor Acceptance

Tests

Acceptance Tests

Manual Validation

RQ1.1

RQ1.2 RQ2

Host Beneficiary

Subjects

Minimal size: 0.4k

Max size: 422k

Average Donor:16k

Average Host: 213k

Feedback-Controlled Random Test

Generation

Kohsuke Yatoh1⇤, Kazunori Sakamoto2, Fuyuki Ishikawa2, Shinichi Honiden12

1University of Tokyo, Japan,2National Institute of Informatics, Japan

{k-yatoh, exkazuu, f-ishikawa, honiden}@nii.ac.jp

ABSTRACTFeedback-directed random test generation is a widely usedtechnique to generate random method sequences. It lever-ages feedback to guide generation. However, the validity offeedback guidance has not been challenged yet. In this pa-per, we investigate the characteristics of feedback-directedrandom test generation and propose a method that exploitsthe obtained knowledge that excessive feedback limits thediversity of tests. First, we show that the feedback loopof feedback-directed generation algorithm is a positive feed-back loop and amplifies the bias that emerges in the candi-date value pool. This over-directs the generation and limitsthe diversity of generated tests. Thus, limiting the amountof feedback can improve diversity and e↵ectiveness of gener-ated tests. Second, we propose a method named feedback-controlled random test generation, which aggressively con-trols the feedback in order to promote diversity of generatedtests. Experiments on eight di↵erent, real-world applicationlibraries indicate that our method increases branch cover-age by 78% to 204% over the original feedback-directed al-gorithm on large-scale utility libraries.

Categories and Subject DescriptorsD.2.5 [Software Engineering]: Testing and Debugging—Testing tools

General TermsAlgorithms, Reliability, Verification

KeywordsRandom testing, Test generation, Diversity

⇤The author is currently a�liated with Google Inc., Japan,and can be reached at [email protected].

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

ISSTA’15 , July 12–17, 2015, Baltimore, MD, USA

Copyright 2015 ACM 978-1-4503-3620-8/15/07 ...$15.00.

1. INTRODUCTIONFeedback-directed random testing [17] is a promising tech-

nique to automatically generate software tests. The tech-nique can create random method sequences using publicmethods from the classes of a system-under-test (SUT). Itis a general and test oracle independent technique to gen-erate software tests. Due to its generality and flexibility,many researchers have used feedback-directed random test-ing. Some researchers leveraged feedback-directed randomtesting as a part of their proposed methods [5, 25]. Othersused feedback-directed random testing to prove their the-ories on random testing [11, 12]. There is an interestingstudy that mined SUT specifications by analyzing the dy-namic behavior of SUT observed during feedback-directedrandom testing [18]. In addition, feedback-directed randomtesting has already been adopted by industries and under-gone intensive use [19].Despite its importance, characteristics of feedback-directed

random testing have seldom been studied. To the best ofour knowledge, some studies have proposed extensions tofeedback-directed random testing [14, 27], but they failedto analyze the nature of feedback-directed random testing.Specifically, the idea of feedback guidance had never beenchallenged. In this paper we investigate characteristics offeedback-directed random testing by using a model SUT andpropose a new technique that exploits the obtained knowl-edge that excessive feedback over-directs generation, ampli-fies bias, and limits the diversity of generated tests.We address two research questions in this paper.

RQ1: Why does the test e↵ectiveness stop increasing atdi↵erent points depending on random seeds?

RQ2: Can our proposed technique lessen the dependencyon random seeds and improve the overall performanceof test generation?

The resulting test e↵ectiveness of feedback-directed randomtesting should di↵er because of its randomness. However,the observed di↵erence is much larger than expected. Forexample, the interquartile range marks 10% in our prelim-inary experiment on the model SUT. This spoils the credi-bility of feedback-directed random testing.There are three contributions in this paper.

• We hypothesize that feedback guidance over-directs thegeneration and limits the diversity of generated testsand show that both average score and variance of teste↵ectiveness improve by limiting the amount of feed-back.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specific permission and/or a

fee. Request permissions from [email protected].

Copyright is held by the owner/author(s). Publication rights licensed to ACM.

ISSTA’15, July 13–17, 2015, Baltimore, MD, USA

ACM. 978-1-4503-3620-8/15/07

http://dx.doi.org/10.1145/2771783.2771805

�������

��� ������� �����

��������������

��

����� �

���������

���������

������ �

���

316

Subjects Type Size KLOC

Reg. Tests

Organ Tests

Idct Donor 2.3 - 3-5Mytar Donor 0.4 - 4Cflow Donor 25 - 6-20

Webserver Donor 1.7 - 3TuxCrypt Donor 2.7 - 4-5

Pidgin Host 363 88 -Cflow Host 25 21 -SoX Host 43 157 -

Case StudyVLC Host 422 27 -Kate Host 50 238 -x264 Donor 63 - 5Cflow Donor 22 - 13Indent Donor 26 - 7

Experimental Methodology and Setup

μSCALPEL

Host

Implantation Point

Donor !!

OEOrgan Test Suite

Host Beneficiary

Implantation Point

Organ

Count LOC — CLOC

Count LOC CLOC

x 20

GNU Time

Validation Test Suites

!Coverage Information:

Gcov

64 bit Ubuntu 14.10 16 GB RAM 8 threads

Alexandru Marginean — Automated Software Transplantation

Donor Host All Passed Regression Regression++ AcceptanceIdct Pidgin 16 20 17 16

Mytar Pidgin 16 20 18 20Web Pidgin 0 20 0 18Cflow Pidgin 15 20 15 16Tux Pidgin 15 20 17 16Idct Cflow 16 17 16 16

Mytar Cflow 17 17 17 20Web Cflow 0 0 0 17Cflow Cflow 20 20 20 20Tux Cflow 14 15 14 16Idct SoX 15 18 17 16

Mytar SoX 17 17 17 20Web SoX 0 0 0 17Cflow SoX 14 16 15 14Tux SoX 13 13 13 14

TOTAL 188/300 233/300 196/300 256/300 RQ1.1 RQ1.2 RQ2

Empirical Study !RQ1,2

Feedback-Controlled Random Test

Generation

Kohsuke Yatoh1⇤, Kazunori Sakamoto2, Fuyuki Ishikawa2, Shinichi Honiden12

1University of Tokyo, Japan,2National Institute of Informatics, Japan

{k-yatoh, exkazuu, f-ishikawa, honiden}@nii.ac.jp

ABSTRACTFeedback-directed random test generation is a widely usedtechnique to generate random method sequences. It lever-ages feedback to guide generation. However, the validity offeedback guidance has not been challenged yet. In this pa-per, we investigate the characteristics of feedback-directedrandom test generation and propose a method that exploitsthe obtained knowledge that excessive feedback limits thediversity of tests. First, we show that the feedback loopof feedback-directed generation algorithm is a positive feed-back loop and amplifies the bias that emerges in the candi-date value pool. This over-directs the generation and limitsthe diversity of generated tests. Thus, limiting the amountof feedback can improve diversity and e↵ectiveness of gener-ated tests. Second, we propose a method named feedback-controlled random test generation, which aggressively con-trols the feedback in order to promote diversity of generatedtests. Experiments on eight di↵erent, real-world applicationlibraries indicate that our method increases branch cover-age by 78% to 204% over the original feedback-directed al-gorithm on large-scale utility libraries.

Categories and Subject DescriptorsD.2.5 [Software Engineering]: Testing and Debugging—Testing tools

General TermsAlgorithms, Reliability, Verification

KeywordsRandom testing, Test generation, Diversity

⇤The author is currently a�liated with Google Inc., Japan,and can be reached at [email protected].

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

ISSTA’15 , July 12–17, 2015, Baltimore, MD, USA

Copyright 2015 ACM 978-1-4503-3620-8/15/07 ...$15.00.

1. INTRODUCTIONFeedback-directed random testing [17] is a promising tech-

nique to automatically generate software tests. The tech-nique can create random method sequences using publicmethods from the classes of a system-under-test (SUT). Itis a general and test oracle independent technique to gen-erate software tests. Due to its generality and flexibility,many researchers have used feedback-directed random test-ing. Some researchers leveraged feedback-directed randomtesting as a part of their proposed methods [5, 25]. Othersused feedback-directed random testing to prove their the-ories on random testing [11, 12]. There is an interestingstudy that mined SUT specifications by analyzing the dy-namic behavior of SUT observed during feedback-directedrandom testing [18]. In addition, feedback-directed randomtesting has already been adopted by industries and under-gone intensive use [19].Despite its importance, characteristics of feedback-directed

random testing have seldom been studied. To the best ofour knowledge, some studies have proposed extensions tofeedback-directed random testing [14, 27], but they failedto analyze the nature of feedback-directed random testing.Specifically, the idea of feedback guidance had never beenchallenged. In this paper we investigate characteristics offeedback-directed random testing by using a model SUT andpropose a new technique that exploits the obtained knowl-edge that excessive feedback over-directs generation, ampli-fies bias, and limits the diversity of generated tests.We address two research questions in this paper.

RQ1: Why does the test e↵ectiveness stop increasing atdi↵erent points depending on random seeds?

RQ2: Can our proposed technique lessen the dependencyon random seeds and improve the overall performanceof test generation?

The resulting test e↵ectiveness of feedback-directed randomtesting should di↵er because of its randomness. However,the observed di↵erence is much larger than expected. Forexample, the interquartile range marks 10% in our prelim-inary experiment on the model SUT. This spoils the credi-bility of feedback-directed random testing.There are three contributions in this paper.

• We hypothesize that feedback guidance over-directs thegeneration and limits the diversity of generated testsand show that both average score and variance of teste↵ectiveness improve by limiting the amount of feed-back.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specific permission and/or a

fee. Request permissions from [email protected].

Copyright is held by the owner/author(s). Publication rights licensed to ACM.

ISSTA’15, July 13–17, 2015, Baltimore, MD, USA

ACM. 978-1-4503-3620-8/15/07

http://dx.doi.org/10.1145/2771783.2771805

�������

��� ������� �����

��������������

��

����� �

���������

���������

������ �

���

316

All Passed

188/300

Regression

233/300RQ1.1

Regression++

196/300RQ1.2

Acceptance

256/300 RQ2

Alexandru Marginean — Automated Software Transplantation

Execution Time (minutes)Donor HostIdct Pidgin 5 7 97

Mytar Pidgin 3 1 65Web Pidgin 8 5 160Cflow Pidgin 58 16 1151Tux Pidgin 29 10 574Idct Cflow 3 5 59

Mytar Cflow 3 1 53Web Cflow 5 2 102Cflow Cflow 44 9 872Tux Cflow 31 11 623Idct SoX 12 17 233

Mytar SoX 3 1 60Web SoX 7 3 132Cflow SoX 89 53 74Tux SoX 34 13 94

Total

Empirical Study !RQ3

Average

334 (min)

Std. Dev. Total

72 (hours)10 (Average)

Feedback-Controlled Random Test

Generation

Kohsuke Yatoh1⇤, Kazunori Sakamoto2, Fuyuki Ishikawa2, Shinichi Honiden12

1University of Tokyo, Japan,2National Institute of Informatics, Japan

{k-yatoh, exkazuu, f-ishikawa, honiden}@nii.ac.jp

ABSTRACTFeedback-directed random test generation is a widely usedtechnique to generate random method sequences. It lever-ages feedback to guide generation. However, the validity offeedback guidance has not been challenged yet. In this pa-per, we investigate the characteristics of feedback-directedrandom test generation and propose a method that exploitsthe obtained knowledge that excessive feedback limits thediversity of tests. First, we show that the feedback loopof feedback-directed generation algorithm is a positive feed-back loop and amplifies the bias that emerges in the candi-date value pool. This over-directs the generation and limitsthe diversity of generated tests. Thus, limiting the amountof feedback can improve diversity and e↵ectiveness of gener-ated tests. Second, we propose a method named feedback-controlled random test generation, which aggressively con-trols the feedback in order to promote diversity of generatedtests. Experiments on eight di↵erent, real-world applicationlibraries indicate that our method increases branch cover-age by 78% to 204% over the original feedback-directed al-gorithm on large-scale utility libraries.

Categories and Subject DescriptorsD.2.5 [Software Engineering]: Testing and Debugging—Testing tools

General TermsAlgorithms, Reliability, Verification

KeywordsRandom testing, Test generation, Diversity

⇤The author is currently a�liated with Google Inc., Japan,and can be reached at [email protected].

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

ISSTA’15 , July 12–17, 2015, Baltimore, MD, USA

Copyright 2015 ACM 978-1-4503-3620-8/15/07 ...$15.00.

1. INTRODUCTIONFeedback-directed random testing [17] is a promising tech-

nique to automatically generate software tests. The tech-nique can create random method sequences using publicmethods from the classes of a system-under-test (SUT). Itis a general and test oracle independent technique to gen-erate software tests. Due to its generality and flexibility,many researchers have used feedback-directed random test-ing. Some researchers leveraged feedback-directed randomtesting as a part of their proposed methods [5, 25]. Othersused feedback-directed random testing to prove their the-ories on random testing [11, 12]. There is an interestingstudy that mined SUT specifications by analyzing the dy-namic behavior of SUT observed during feedback-directedrandom testing [18]. In addition, feedback-directed randomtesting has already been adopted by industries and under-gone intensive use [19].Despite its importance, characteristics of feedback-directed

random testing have seldom been studied. To the best ofour knowledge, some studies have proposed extensions tofeedback-directed random testing [14, 27], but they failedto analyze the nature of feedback-directed random testing.Specifically, the idea of feedback guidance had never beenchallenged. In this paper we investigate characteristics offeedback-directed random testing by using a model SUT andpropose a new technique that exploits the obtained knowl-edge that excessive feedback over-directs generation, ampli-fies bias, and limits the diversity of generated tests.We address two research questions in this paper.

RQ1: Why does the test e↵ectiveness stop increasing atdi↵erent points depending on random seeds?

RQ2: Can our proposed technique lessen the dependencyon random seeds and improve the overall performanceof test generation?

The resulting test e↵ectiveness of feedback-directed randomtesting should di↵er because of its randomness. However,the observed di↵erence is much larger than expected. Forexample, the interquartile range marks 10% in our prelim-inary experiment on the model SUT. This spoils the credi-bility of feedback-directed random testing.There are three contributions in this paper.

• We hypothesize that feedback guidance over-directs thegeneration and limits the diversity of generated testsand show that both average score and variance of teste↵ectiveness improve by limiting the amount of feed-back.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specific permission and/or a

fee. Request permissions from [email protected].

Copyright is held by the owner/author(s). Publication rights licensed to ACM.

ISSTA’15, July 13–17, 2015, Baltimore, MD, USA

ACM. 978-1-4503-3620-8/15/07

http://dx.doi.org/10.1145/2771783.2771805

�������

��� ������� �����

��������������

��

����� �

���������

���������

������ �

���

316

Case Study !VLC

Transplant Time & Test Suites

Time (hours) Regression Regression++ Acceptance

H.264 26 1 1 1

Feedback-Controlled Random Test

Generation

Kohsuke Yatoh1⇤, Kazunori Sakamoto2, Fuyuki Ishikawa2, Shinichi Honiden12

1University of Tokyo, Japan,2National Institute of Informatics, Japan

{k-yatoh, exkazuu, f-ishikawa, honiden}@nii.ac.jp

ABSTRACTFeedback-directed random test generation is a widely usedtechnique to generate random method sequences. It lever-ages feedback to guide generation. However, the validity offeedback guidance has not been challenged yet. In this pa-per, we investigate the characteristics of feedback-directedrandom test generation and propose a method that exploitsthe obtained knowledge that excessive feedback limits thediversity of tests. First, we show that the feedback loopof feedback-directed generation algorithm is a positive feed-back loop and amplifies the bias that emerges in the candi-date value pool. This over-directs the generation and limitsthe diversity of generated tests. Thus, limiting the amountof feedback can improve diversity and e↵ectiveness of gener-ated tests. Second, we propose a method named feedback-controlled random test generation, which aggressively con-trols the feedback in order to promote diversity of generatedtests. Experiments on eight di↵erent, real-world applicationlibraries indicate that our method increases branch cover-age by 78% to 204% over the original feedback-directed al-gorithm on large-scale utility libraries.

Categories and Subject DescriptorsD.2.5 [Software Engineering]: Testing and Debugging—Testing tools

General TermsAlgorithms, Reliability, Verification

KeywordsRandom testing, Test generation, Diversity

⇤The author is currently a�liated with Google Inc., Japan,and can be reached at [email protected].

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

ISSTA’15 , July 12–17, 2015, Baltimore, MD, USA

Copyright 2015 ACM 978-1-4503-3620-8/15/07 ...$15.00.

1. INTRODUCTIONFeedback-directed random testing [17] is a promising tech-

nique to automatically generate software tests. The tech-nique can create random method sequences using publicmethods from the classes of a system-under-test (SUT). Itis a general and test oracle independent technique to gen-erate software tests. Due to its generality and flexibility,many researchers have used feedback-directed random test-ing. Some researchers leveraged feedback-directed randomtesting as a part of their proposed methods [5, 25]. Othersused feedback-directed random testing to prove their the-ories on random testing [11, 12]. There is an interestingstudy that mined SUT specifications by analyzing the dy-namic behavior of SUT observed during feedback-directedrandom testing [18]. In addition, feedback-directed randomtesting has already been adopted by industries and under-gone intensive use [19].Despite its importance, characteristics of feedback-directed

random testing have seldom been studied. To the best ofour knowledge, some studies have proposed extensions tofeedback-directed random testing [14, 27], but they failedto analyze the nature of feedback-directed random testing.Specifically, the idea of feedback guidance had never beenchallenged. In this paper we investigate characteristics offeedback-directed random testing by using a model SUT andpropose a new technique that exploits the obtained knowl-edge that excessive feedback over-directs generation, ampli-fies bias, and limits the diversity of generated tests.We address two research questions in this paper.

RQ1: Why does the test e↵ectiveness stop increasing atdi↵erent points depending on random seeds?

RQ2: Can our proposed technique lessen the dependencyon random seeds and improve the overall performanceof test generation?

The resulting test e↵ectiveness of feedback-directed randomtesting should di↵er because of its randomness. However,the observed di↵erence is much larger than expected. Forexample, the interquartile range marks 10% in our prelim-inary experiment on the model SUT. This spoils the credi-bility of feedback-directed random testing.There are three contributions in this paper.

• We hypothesize that feedback guidance over-directs thegeneration and limits the diversity of generated testsand show that both average score and variance of teste↵ectiveness improve by limiting the amount of feed-back.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specific permission and/or a

fee. Request permissions from [email protected].

Copyright is held by the owner/author(s). Publication rights licensed to ACM.

ISSTA’15, July 13–17, 2015, Baltimore, MD, USA

ACM. 978-1-4503-3620-8/15/07

http://dx.doi.org/10.1145/2771783.2771805

�������

��� ������� �����

��������������

��

����� �

���������

���������

������ �

���

316

VLC

H264

Donor Host All Passed Organ Test Suite Regression Regression++ Acceptance

Cflow Kate 16 18 20 17 18

Indent Kate 18 19 20 18 19

TOTAL 34/40 37/40 40/40 35/40 37/40

All Passed — RQ1.1 RQ1.2 RQ2

Case Study - KateRegression

40/40

RQ1.1

Regression++

35/40

RQ1.2

AcceptanceAll Passed

37/40

RQ2

34/40

All Passed

Organ Test Suite

37/40

Execution Time (minutes)

Donor Host Average (min) Std. Dev. (min) Total (hours)

Cflow Kate 101 31 33

Indent Kate 31 6 11

Total 132 18.5 44

Std. Dev. (min) Total (hours)

4418.5

Average (min)

132