extracting a unified directory tree to compare similar software products yusuke sakaguchi, takashi...

25
Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi , Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Upload: olivia-turner

Post on 18-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

Extracting a Unified Directory Treeto Compare Similar Software Products

Yusuke Sakaguchi, Takashi Ishio,

Tetsuya Kanda, Katsuro Inoue

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Page 2: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

2Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Background: Similar software products

• Clone-and-own approach– Copying and modifying an existing product– Importing libraries

AVer.1.0

Fix a bug

Clone-and-own

t

AVer.1.1

BVer.1.0

BVer.1.1

CVer.1.0

Clone-and-own

Must fix the bug

Page 3: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

3Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Software Product Line Engineering

• Promising for managing derived software products

• To apply the approach to existing products, developers need to understandthe commonalities and variabilities of them.– Duszynski et al. [1] proposed to compare source code

in corresponding directories among similar products.• Developers must know corresponding directories among

products (which directories should be compared).

[1] S. Duszynski, J. Knodel, and M. Becker“Analyzing the source code of multiple software variants for reuse potential,” Reverse Engineering, 2011

Page 4: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

4Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

The correspondence of directories

• Some existing techniques [2], [3] can extract corresponding directories between two similar products.

• Cannot analyze more than two productsat a time.

[2] K. Yoshimura, D. Ganesan, and D. Muthig“Assessing merge potential of existing engine control systems into a product line” Software Engineering for Automotive Systems, 2006[3] D. Holten and J. J. van Wijk“Visual comparison of hierarchically organized data”Joint Eurographics / IEEE - VGTC Conference on Visualization, 2008

Page 5: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

5Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

The proposed method

• Automatically extract a unified directory tree representing corresponding directories among multiple products

• Key idea

– Corresponding directories have similar source files

Page 6: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

6Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

S1 S2 S3

src src src

a a a

b b b

c c

Our prototype• Input

– Source code of some products

• Output– a unified directory tree

b

a c

src

S1 S2 S3

src src

a b ca b

S1 S2 S3

src src src

a a a

b b b

c c

3

3 2

2

1

Directory trees

Directory graph

Spanning tree

Page 7: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

7Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Node of directory graph

• A node is a set of directories that contain similar source files

• Two directories are represented by a single node

if • sim(d1, d2) : a content similarity metric for two

directories in different products

• th : a predetermined threshold

Page 8: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

8Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Similarity

• the Jaccard similarity coefficient

– a set of lines extracted from all non-binary files directly contained in a directory d

Page 9: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

9Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Example

b

a c

src

S1 S2 S3

src src

a b ca b

• The same color nodes are similar.• White nodes have no files.

Page 10: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

10Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Example

a a a

b b b

c c

Page 11: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

11Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Special Treatment

• Directories without files

A single node represents such directories

if their subdirectories are represented

by the same node

• Root directoriesA root node represents all the root directories

of products, irrespective of product similarity.

Page 12: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

12Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Example

b

a c

src

S1 S2 S3

src src

a b ca b

Page 13: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

13Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Example

S1 S2 S3

src src src

a a a

b b b

c c

Page 14: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

14Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Connecting nodes

• Weighted arcs

– The number of parent-subdirectory relationship

among two nodes. S1 S2 S3

src src src

a a a

b b b

c c

S1 S2 S3

src src src

a a a

b b b

c c

3

3 2

2

1

parent-subdirectory relationship

Page 15: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

15Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Greedy tree extraction

1. Initialize a spanning tree – represents a set of vertices– represents a set of selected arcs

2. Select the arc having the maximum weight among – If the weights are identical,

select that which is closest to

3. Update

4. Repeat Steps 2 and 3 until includes all nodes.

Page 16: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Greedy loop

19

S1 S2 S3

src src src

a a a

b b b

cc

3

3 2

2

1

Page 17: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

17Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Directory Viewer

Tree View• shows the unified directory tree.• If a node contains multiple directories

having different source code,the node is colored in blue.

Different codeincluded

Page 18: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

18Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Directory Viewer

File List View• shows a table of file names and their MD5

hash values contained in a selected node.• A black hash value indicates that the file

content is different from another product.

Differentcontents

Page 19: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

19Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Directory Viewer

File Matrix View• shows the similarity of files having the

same file name selected in a file list view.• A darker (yellow/red) color indicates a

lower similarity.

Page 20: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

20Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Case study• Input products

– Android OS : 4.2– CPU : Qualcomm MSM8974

Product Vendor MobileNetwork Operator Release #Dirs

FJL22 Fujitsu au 2013/11 7,683

301F Fujitsu SoftBank 2013/12 7,708

F-01F Fujitsu NTT DOCOMO

2013/10 7,582

SO-01F Sony NTT DOCOMO

2013/10 5,840

Page 21: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

21Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Output

• A predetermined similarity threshold th = 0.8

• Execution time : 42 minutes– Two Intel Xeon (2.27Ghz, 4cores)

– 24GB RAM

• The resultant unified directory tree comprises 9,037 nodes.

– 673 nodes contain different contents from other products.

Page 22: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

22Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Example: kernel node --------------Fujitsu-------------- --Sony--

Makefile

AndroidKernel.mk

Developer-specific options for the kernel build is mainly written in this file.

Page 23: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

23Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Example: Product-specific directory

(Fujitsu)external

(Sony)external

chronium chronium

external(Fujitsu)chronium

(Sony)chronium

(Fujitsu/Sony)Common

subdirectories

(Sony)Unique

subdirectories

Different

Common

Unique to Sony

• Input• Output

Page 24: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

24Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Conclusion and Future Work

• Our tool visualizes a unified directory tree.• We conducted a case study using four

Android products.– The tool enabled us to quickly focus on

directories having different contents.• Future work: Controlled experiment to

evaluate – The effectiveness for source code comparison

tasks.

Page 25: Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer

25Department of Computer Science, Graduate School of Information Science and Technology, Osaka University