extracting a unified directory tree to compare similar software products yusuke sakaguchi, takashi...
TRANSCRIPT
Extracting a Unified Directory Treeto Compare Similar Software Products
Yusuke Sakaguchi, Takashi Ishio,
Tetsuya Kanda, Katsuro Inoue
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
2Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Background: Similar software products
• Clone-and-own approach– Copying and modifying an existing product– Importing libraries
AVer.1.0
Fix a bug
Clone-and-own
t
AVer.1.1
BVer.1.0
BVer.1.1
CVer.1.0
Clone-and-own
Must fix the bug
3Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Software Product Line Engineering
• Promising for managing derived software products
• To apply the approach to existing products, developers need to understandthe commonalities and variabilities of them.– Duszynski et al. [1] proposed to compare source code
in corresponding directories among similar products.• Developers must know corresponding directories among
products (which directories should be compared).
[1] S. Duszynski, J. Knodel, and M. Becker“Analyzing the source code of multiple software variants for reuse potential,” Reverse Engineering, 2011
4Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
The correspondence of directories
• Some existing techniques [2], [3] can extract corresponding directories between two similar products.
• Cannot analyze more than two productsat a time.
[2] K. Yoshimura, D. Ganesan, and D. Muthig“Assessing merge potential of existing engine control systems into a product line” Software Engineering for Automotive Systems, 2006[3] D. Holten and J. J. van Wijk“Visual comparison of hierarchically organized data”Joint Eurographics / IEEE - VGTC Conference on Visualization, 2008
5Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
The proposed method
• Automatically extract a unified directory tree representing corresponding directories among multiple products
• Key idea
– Corresponding directories have similar source files
6Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
S1 S2 S3
src src src
a a a
b b b
c c
Our prototype• Input
– Source code of some products
• Output– a unified directory tree
b
a c
src
S1 S2 S3
src src
a b ca b
S1 S2 S3
src src src
a a a
b b b
c c
3
3 2
2
1
Directory trees
Directory graph
Spanning tree
7Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Node of directory graph
• A node is a set of directories that contain similar source files
• Two directories are represented by a single node
if • sim(d1, d2) : a content similarity metric for two
directories in different products
• th : a predetermined threshold
8Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Similarity
• the Jaccard similarity coefficient
– a set of lines extracted from all non-binary files directly contained in a directory d
9Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Example
b
a c
src
S1 S2 S3
src src
a b ca b
• The same color nodes are similar.• White nodes have no files.
10Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Example
a a a
b b b
c c
11Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Special Treatment
• Directories without files
A single node represents such directories
if their subdirectories are represented
by the same node
• Root directoriesA root node represents all the root directories
of products, irrespective of product similarity.
12Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Example
b
a c
src
S1 S2 S3
src src
a b ca b
13Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Example
S1 S2 S3
src src src
a a a
b b b
c c
14Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Connecting nodes
• Weighted arcs
– The number of parent-subdirectory relationship
among two nodes. S1 S2 S3
src src src
a a a
b b b
c c
S1 S2 S3
src src src
a a a
b b b
c c
3
3 2
2
1
parent-subdirectory relationship
15Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Greedy tree extraction
1. Initialize a spanning tree – represents a set of vertices– represents a set of selected arcs
2. Select the arc having the maximum weight among – If the weights are identical,
select that which is closest to
3. Update
4. Repeat Steps 2 and 3 until includes all nodes.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Greedy loop
19
S1 S2 S3
src src src
a a a
b b b
cc
3
3 2
2
1
17Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Directory Viewer
Tree View• shows the unified directory tree.• If a node contains multiple directories
having different source code,the node is colored in blue.
Different codeincluded
18Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Directory Viewer
File List View• shows a table of file names and their MD5
hash values contained in a selected node.• A black hash value indicates that the file
content is different from another product.
Differentcontents
19Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Directory Viewer
File Matrix View• shows the similarity of files having the
same file name selected in a file list view.• A darker (yellow/red) color indicates a
lower similarity.
20Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Case study• Input products
– Android OS : 4.2– CPU : Qualcomm MSM8974
Product Vendor MobileNetwork Operator Release #Dirs
FJL22 Fujitsu au 2013/11 7,683
301F Fujitsu SoftBank 2013/12 7,708
F-01F Fujitsu NTT DOCOMO
2013/10 7,582
SO-01F Sony NTT DOCOMO
2013/10 5,840
21Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Output
• A predetermined similarity threshold th = 0.8
• Execution time : 42 minutes– Two Intel Xeon (2.27Ghz, 4cores)
– 24GB RAM
• The resultant unified directory tree comprises 9,037 nodes.
– 673 nodes contain different contents from other products.
22Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Example: kernel node --------------Fujitsu-------------- --Sony--
Makefile
AndroidKernel.mk
Developer-specific options for the kernel build is mainly written in this file.
23Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Example: Product-specific directory
(Fujitsu)external
(Sony)external
chronium chronium
external(Fujitsu)chronium
(Sony)chronium
(Fujitsu/Sony)Common
subdirectories
(Sony)Unique
subdirectories
Different
Common
Unique to Sony
• Input• Output
24Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Conclusion and Future Work
• Our tool visualizes a unified directory tree.• We conducted a case study using four
Android products.– The tool enabled us to quickly focus on
directories having different contents.• Future work: Controlled experiment to
evaluate – The effectiveness for source code comparison
tasks.
25Department of Computer Science, Graduate School of Information Science and Technology, Osaka University