on the extent and nature of software reuse in open source java

28
On the Extent and Nature of Software Reuse in Open Source Java Projects Lars Heinemann , Florian Deissenboeck, Mario Gleirscher, Benjamin Hummel, Maximilian Irlbeck Technische Universität München ICSR 2011, Pohang, Korea 1

Upload: dangkhuong

Post on 06-Jan-2017

227 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: On the Extent and Nature of Software Reuse in Open Source Java

On the Extent and Nature of Software Reuse in Open

Source Java ProjectsLars Heinemann, Florian Deissenboeck, Mario Gleirscher, Benjamin Hummel,

Maximilian IrlbeckTechnische Universität München

ICSR 2011, Pohang, Korea1

Page 2: On the Extent and Nature of Software Reuse in Open Source Java

Software Reuse• Reuse of existing artifacts for constructing

new software

• Proven benefits

• Increased productivity

• Reduced time to market

• Improved quality

2

Page 3: On the Extent and Nature of Software Reuse in Open Source Java

• Tremendous reuse opportunities

• Class Libraries (e.g. Apache Commons)

• Frameworks (e.g. Eclipse: 40 MLOC)

• Open source code (Google Code Search: several GLOC)

• Internet serves as reuse repository

3

Software Reuse

Page 4: On the Extent and Nature of Software Reuse in Open Source Java

Research Problem• Unclear how software projects make use of

available reuse opportunities

• Lack of data on amount of reuse in software projects

• Assessing success of software reuse difficult

4

Page 5: On the Extent and Nature of Software Reuse in Open Source Java

Contribution• Empirical knowledge about extent and

nature of software reuse in OSS

• Quantitative data on software reuse in 20 open source projects

• Substantiates discussion of success/failure of software reuse

• Provides practioners with benchmark

5

Page 6: On the Extent and Nature of Software Reuse in Open Source Java

Terms• Software reuse: Using code developed by

third parties (excluding OS/platform)

• White-box reuse: Code incorporated in source form (internals exposed, potentially modified)

• Black-box reuse: Code incorporated in binary form (internals hidden, no modifications)

6

Page 7: On the Extent and Nature of Software Reuse in Open Source Java

Study Design (GQM)

7

We analyze open source projects

for the purpose of understanding the state of the practice in

software reuse with respect to its

extent and nature from the viewpoint of the

developers and maintainers in the context of

Java open source software.

Page 8: On the Extent and Nature of Software Reuse in Open Source Java

8

Question Metric

RQ1: Do open source projects reuse software?

existence of software reuse

RQ 2: How much white-box reuse occurs?white-box reuse

rate

RQ 3: How much black-box reuse occurs?black-box reuse

rate

Study Design (GQM)

Page 9: On the Extent and Nature of Software Reuse in Open Source Java

Reuse Rate

Reused source code [LOC]Overall source code [LOC]

White-box

Reused binary code [bytes]

Overall binary code [bytes]Black-box

Reused code

Project‘s own codeOverall code ofsoftware system

Page 10: On the Extent and Nature of Software Reuse in Open Source Java

Study Objects• 20 Java projects from

• Criteria: Production/Stable, Standalone app, pure Java, Java SE platform, source download available

• All among 50 most downloaded

• sourcecode size: 0.4 to 790 kLOC, bytecode size: 17 to 22,761 KB

• Test code excluded with heuristics (e.g. folders named test/tests)

10

Page 11: On the Extent and Nature of Software Reuse in Open Source Java

Study Implementation

• White-box reuse = copied code

• Can be detected automatically by clone detectors

• Clone detection against 22 commonly used Java libraries (~ 6MLOC)

• Detection of reuse of statement sequences with > 15 statements

11

a) Detecting white-box reuse

Page 12: On the Extent and Nature of Software Reuse in Open Source Java

Study Implementation

• In addition: manual inspection of source directory tree

• Clues: file/package names

• Source of files identified via header comments/web search

• Detection of reuse of whole files/directories, not limited to fixed set of libraries

12

a) Detecting white-box reuse

Page 13: On the Extent and Nature of Software Reuse in Open Source Java

Study Implementation

• Byte-code based static analysis

• Aggregates byte code size of all library types referenced by project‘s source code

• Traverses type dependency graph using Java Constant Pool (type usages and method calls)

• Includes transitive dependencies

13

b) Detecting black-box reuse

Page 14: On the Extent and Nature of Software Reuse in Open Source Java

Study Implementation

• Although not covered by reuse definition, potential variations in use of Java API interesting

• Black-box reuse baseline of empty Java program: 5 MB (2,082 types)

• Object → Class → ClassLoader ... (Reflection API / Collections API)

14

b) Detecting black-box reuse

Page 15: On the Extent and Nature of Software Reuse in Open Source Java

Results RQ 1

• 18 of the 20 projects (90%) reuse software from third parties

• Exceptions: HSQLDB (relational database engine), Youtube Downloader (video download utility)

15

Do open source projects reuse software?

Page 16: On the Extent and Nature of Software Reuse in Open Source Java

Results RQ 2

16

How much white-box reuse occurs?

• Clone detection found 791 clones, 11,701 copied LOC in 7 study objects

• Clones found: complete files with minor modifications (e.g. different version)

• Manual inspection found additionally whole copied libraries in 4 study objects

• Overall: white-box reuse found for 9 of 20 projects

• Reuse rates: 0% - 10%

Page 17: On the Extent and Nature of Software Reuse in Open Source Java

0

10

20

30

40

50

60

70

iRep

ort-D

esig

ner

soap

UI

RO

DIN

SQ

uirr

eL S

QL

Clie

nt

Azu

reus

/Vuz

e

Ope

nPro

j

TV-B

row

ser

DrJ

ava

Sw

eet H

ome

3D

JabR

ef

Mob

ile A

tlas

Cre

ator

Jedi

t

Bud

di

Dav

Mai

l

Free

Min

d

HS

QLD

B

PD

F S

plit

and

Mer

ge

Med

iath

ek V

iew

subs

onic

You

Tube

Dow

nloa

der

Java APIJava API Baseline

3rd partyown

Results RQ 3

Absolute bytecode size distribution (MB)

17

How much black-box reuse occurs?

3rd party: 0 - 42 MBJava API: 13 - 17 MB

Page 18: On the Extent and Nature of Software Reuse in Open Source Java

Relative bytecode size distribution (%)

0

20

40

60

80

100PD

F Sp

lit an

d M

erge

YouT

ube

Down

load

er

DavM

ail

Med

iath

ek V

iew

Budd

i

Mob

ile A

tlas

Crea

tor

subs

onic

HSQ

LDB

Free

Min

d

Ope

nPro

j

Swee

t Hom

e 3D

iRep

ort-D

esig

ner

JabR

ef

soap

UI

RODI

N

Jedi

t

TV-B

rows

er

DrJa

va

SQui

rreL

SQL

Clie

nt

Azur

eus/

Vuze

Java API 3rd Party own18

Results RQ 3How much black-box reuse occurs?

3rd party: 0 - 62%Java API: 23 - 99%Combined: 41 - 99%

Page 19: On the Extent and Nature of Software Reuse in Open Source Java

Relative bytecode size distribution (%) without Java API

0

20

40

60

80

100PD

F Sp

lit an

d M

erge

iRep

ort-D

esig

ner

DavM

ail

Budd

i

soap

UI

Ope

nPro

j

RODI

N

Mob

ile A

tlas

Crea

tor

SQui

rreL

SQL

Clie

nt

DrJa

va

Swee

t Hom

e 3D

TV-B

rows

er

JabR

ef

Free

Min

d

Med

iath

ek V

iew

JEdi

t

subs

onic

Azur

eus/

Vuze

HSQ

LDB

YouT

ube

Down

load

er

3rd Party own19

Results RQ 3How much black-box reuse occurs?

Page 20: On the Extent and Nature of Software Reuse in Open Source Java

Discussion

• Software reuse common among Java OSS

• On average: high black-box reuse rates

• Expected to have significant impact on development effort

• Black-box reuse rates considerably varying

20

a) Extent of reuse

Page 21: On the Extent and Nature of Software Reuse in Open Source Java

Discussion

• Lee&Litecky found a negative influence of project size on reuse rate (survey of 500 Ada professionals)

• Without Java API: Spearman correlation of 0.05 (two tailed p-value 0.83)

• With Java API: Spearman -0.93 (p-value < 0.0001) → significant and strong negative correlation

21

b) Influence of project size on reuse rate

Page 22: On the Extent and Nature of Software Reuse in Open Source Java

Discussion

• Categorization of reused libraries (e.g. networking, text/xml, rich client platforms)

• No predominant category found

• Nearly all projects reuse software from more than one category

• No significant insights, except reuse diverse w.r.t. types of functionality

22

c) Types of reused functionality

Page 23: On the Extent and Nature of Software Reuse in Open Source Java

Threats to internal validity

• False-positives from clone detection

• mitigated by manual inspection of results

• Unclear if code was copied into study objects or from them

• mitigated by manual inspection

• Black-box analysis considers a whole class as the element of reuse

23

a) overestimation of reuse

Page 24: On the Extent and Nature of Software Reuse in Open Source Java

Threats to internal validity

• Fixed set of libraries in clone detection

• False-negatives in clone detection

• Manual inspection for copied code inherently incomplete

• Black-box analyses misses calls via reflection, boundaries by Java interfaces

• Other forms of component interaction

24

a) underestimation of reuse

Page 25: On the Extent and Nature of Software Reuse in Open Source Java

Threats to external validity• Unclear how representative study objects

are for all Java OSS

• Transferability to other PL or commercial development unclear

• Impact of PL is expected to be high

• Availability of reusable code depends on PL (e.g. Java vs. COBOL)

25

Page 26: On the Extent and Nature of Software Reuse in Open Source Java

Conclusions• Early visions of development by plugging

reusable components not realistic

• But: Reuse in form of libraries common in Java OSS

• High black-box reuse rates (9 of 20 projects > 50%)

• Availability of reusable functionality well-established for Java platform

26

Page 27: On the Extent and Nature of Software Reuse in Open Source Java

Future Work• Other programming ecosystems

• Legacy programming languages, e.g. COBOL

• Scripting languages, e.g. Python

• Commercial software development environments

27

Page 28: On the Extent and Nature of Software Reuse in Open Source Java

Thank you.Questions?

28