1
Overview and Evaluation ofJava Component Search Syste
m SPARS-J
Reishi Yokomori **, Hideo Nishi**, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**, Shinji Kusumoto **Katsuro Inoue**
*Japan Science and Technology Agency**Osaka University
2Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
OutlineMotivation and research aimSPARS-J
SPARS-J (Outline)Ranking methodSystem architecture
Experimental evaluation for SPARS-JConclusion and Future work
3Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
MotivationA library of software is a fount of wisdom.
Reuse of software components improves productivity and quality.Example of components: source code, document …..
Maintenance activity is more easier with the library.However, a collection of software is not utilized effectively.
A developer doesn’t know an existence of desirable components.Although there are a lot of components, these components are not organized.
We need a system to manage components and to search suitable component.
4Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Research aimWe build a system which have functions as follows
searches component, which is suitable for user’s requestmanages the component information
TargetsIntranet
Closed software development environment inside a company
InternetSource code from a lot of open-source-software community
– Source Forge, Jakarta Project. etc.
5Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
OutlineMotivation and research aimSPARS-J
SPARS-J (Outline)Ranking methodSystem architecture
Experimental evaluation for SPARS-JConclusion and Future work
6Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
SPARS-J(Software Product Archive , analysis and Retrieval System for Java)
SPARS-J is Java Source Code Search System analyzes and extracts components automatically.
Component: a source code of class or interfacebuilds a database based on the analysis.
Use-Relation, Similar Components, Metrics, .....provides keyword-search.
Three ranking methods: KR, CR, KR+CRAnalysis information
– Components using (used by) the component– Package hierarchy
7Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Ranking method1. Component used repeatedly (by important
component)– Ranking based on use relation between components
2. Component suited to a user request– Frequency of word appearance (arranged TF-IDF)– A class-name, a method-name, ..., have special
importance
3. Integrated Ranking– Components prized both in KR and CR are very important – Integration by Borda Count method
Ranking search results
Keyword Rank (KR)
Component Rank (CR)
KR+CR Rank ( KR+CR)
8Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
System architecture of SPARS-J (Building a Database)
Component analysis • extracts components• indexes each appeared word• extracts use-relation• clustering similar components • calculates Component rank
Database
Component retrieval
User interface
Library(Java source files)
store
provide
• Component Information• Indexes • Use-Relation• Clustered Component Graph • Component Rank
9Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
System architecture of SPARS-J (Searching Components)Component analysis
Component retrieval• searches components from Indexes• sorts components by CR, KR, KR+CR
User
User interfaceQuery
Result
•analyzes query•Analysis condition •Keywords•displays search results• Additional Information Source Code Use Relation Similar Components Metrics etc.........
RequestComponents List
Database• Component Information• Indexes • Use-Relation• Clustered Component Graph • Component Rank
Query
Information
10Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Screenshot (Top page)
11Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Screenshot (Search results)
12Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Screenshot (Source code)
13Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Screenshot (Similar components)
14Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Screenshot (Using the component)
15Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Screenshot (Used by the component)
16Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Screenshot (Package browsing)
17Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
OutlineMotivation and research aimSPARS-J
SPARS-J (Outline)Ranking methodSystem architecture
Experimental evaluation for SPARS-J Conclusion and Future work
18Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experimental Evaluation1. Comparison of each ranking method in SPARS-J
We investigate the best ranking methodCR vs. KR vs. CR+KR
2. Comparison with other search enginesWe verify SPARS-J’s effectiveness as a software component search engine.vs. Google, Namazu
3. Application of SPARS-J in actual development environment
We confirm that SPARS-J is useful to management and understanding of software.
19Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 1: Comparison of ranking method in SPARS-J
Purpose of ExperimentWe investigate the best method among 3 ranking method in SPARS-J.
1. CR (Based on Use-relation)2. KR (Based on TF-IDF)3. CR+KR ( Integrating 1 & 2)
Preparation Database from Java source codes publicly available
About 140,000 files from JDK, SourceForge, etc.....Keywords
10 queries assumed development of simple system
20Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 1: Comparison of ranking method in SPARS-J
Criterion of EvaluationPrecision of components in the top 10 Result :
The percentage of suitable components– User tends to look at only a higher ranked results.– High precision means that there are many useful components in ran
ge of user’s visibility.
Ndpm :The percentage of the component pair which differs rank order between two ranking methods.– We define user‘s ideal ranking in advance, and calculate ndpm.
» The quantitative indicator which shows a distance from ideal– Ndpm considers all the components in a search result.
» Its distance becomes large when required components are ranked low.
21Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result (Experiment 1)Keywor
d CR KR CR+KR CR KR CR+K
RA 1 1 1 0.036 0.048 0.037B 1 1 1 0.194 0.261 0.221C 0.5 0.5 0.5 0.133 0.117 0.092D 0.4 0.9 0.8 0.123 0.200 0.189E 0.4 0.4 0.4 0.208 0.192 0.194F 0.2 0.2 0.2 0.184 0.184 0.160G 0.9 1 1 0.081 0.103 0.080H 1 0.8 1 0.047 0.109 0.052I 0.6 0.7 0.7 0.210 0.324 0.267J 0.5 0.7 0.7 0.219 0.243 0.114
Ave. 0.65 0.72 0.73 0.143 0.178 0.141
Precision Ndpm
22Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Consideration (Experiment 1)
By Paired-Difference T-Test, we have confirmed that following difference are significant at the 5% level.
Precision: KR,CR+KR ≫ CRNdpm: CR,CR+KR ≫ KR
Characteristic of each method CR
CR generally ranks components in desirable order. Higher ranked components are important but often have no relevance to keyword.
KRKR generally appreciates components which have strong relevance. In required component, keyword doesn’t always appear with high frequency.
CR+KRCR+KR has good result at both precision and ndpm.CR+KR has the best of both ranking
We use CR+KR as a default ranking method.
23Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 2:Comparison with other search engines
Purpose of ExperimentWe verify SPARS-J’s effectiveness as a software component search engine.
1. SPARS-JDatabase from 140,000 files (Same as Experiment 1)We use CR+KR as ranking method.
2. GoogleFamous web search Engine Input queries to www.google.co.jp
3. NamazuFull-text search system for documents.Namazu uses TF-IDF to rank documents.Database from 140,000 files (Same files as SPARS-J)
Preparation Keywords: 10 queries (Same as Experiment 1)Criterion of Evaluation: Precision of the top 10 Result
24Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result (Experiment 2)keyword SPARS-J Google NamazuA 1 0.7 0.9B 1 0.4 0.6C 0.5 0.3 0.4D 0.8 0.3 0.6E 0.4 0.1 0.3F 0.2 0 0.1G 1 0.3 0.4H 1 0.1 0.2I 0.7 0.4 0.4J 0.7 0.4 0.7Ave. 0.73 0.3 0.46
Precision of the top 10 result
25Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Consideration (Experiment 2)
By Paired-Difference T-Test, we have confirmed that following difference are significant at the 5% level.
PrecisionSPARS-J≫ Namazu ≫ Google(*) SPARS-J (CR, KR, CR+KR) ≫ Namazu
Consideration of ResultsGoogle
In the result, there are many pages other than an explanation of Java source code.Performance depends on how much description there are.
NamazuSince the datasets consists of only source codes, the result is better than Google.Without characteristics of Java programs, we cannot get good results.
For searching software components, SPARS-J is more useful than other search engines.
26Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 3: Application of SPARS-J in actual development environment
Purpose of ExperimentWe confirm that SPARS-J is useful to management and understanding of software resource.
Criterion of EvaluationQualitative evaluation about SPARS-J
Preparation We set up SPARS-J to a company.
7 employees use SPARS-J for two weeks.They are all engaged in the software development and the maintenance activity.
We carry out a questionnaire survey about SPARS-J
27Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result (Experiment 3)
Questionnaire Item \ examinee A B C D E F G Mode
Package Browser 4 5 5 5 4 3 3 5Similar components 4 5 5 2 4 3 5,4Components used by the class 5 5 5 5 5 5 5Components using the class 5 1 5 5 5 5 5Metrics of the class 1 4 1 2 4 5 4,1Download of the class 1 3 5 5 2 5 5Contribution to reduction of time cost 3 5 5 3 4 1 5,3Improvement for software quality 5 3 3 3 4 1 3Understanding of software resource 3 1 5 3 5 2 1 5,3,1View-ability of the component-list view 4 4 5 5 3 3 5 5View-ability of the highlighted source
code 3 5 5 5 5 5 5 5
( [Useful or Used repeatedly] 5 4 3 2 1 [Useless or seldom Used] )
28Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Consideration (Experiment 3)
Highly rated questionnaire itemsReference by package browserReference by similar componentsReference by components using (used by) the classView-ability of the component list view and source code
Activities realized by using SPARS-JListing of applications which uses certain componentImpact analysis at reediting components
29Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Consideration (Experiment 3)
Other commentResponse speed is very quick, and we have felt no stress. Since it is not necessary to install in a client, sharing of software components is easy.
SPARS-J can support maintenance work effectively.
Easier grasp of software components
30Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Conclusion and Future worksConclusion
We construct software component search system SPARS-J.
Search engine for Java source codeRanking components with consideration of characteristics. Provision of useful relevant information.
We verified the validity of SPARS-J based on experimental evaluation.
SPARS-J is useful to search software components. SPARS-J is very helpful to grasp and manage components.
Future worksThe quantitative evaluation other than ranking performanceSupport for other software component
31Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
32Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
33Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
34Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
OutlineMotivation and research aimSPARS-J
OutlineSystem architectureRanking methodEach part
Analysis partRetrieval partUser Interface
ExperimentConclusion and Future work
35Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Component analysis part
Extract component and its information from a Java source fileThe process
Extract a componentIndex the componentExtract use relationsClustering similar componentsRank components based on use relations (CR method)
36Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Extract and index a component
Extracting componentFind class or interface block in a java source file
Location information in the file (start line number, end line number)
IndexingExtract index key from the component
Index key : a word and the kind of itNo reserved words are extracted
Count frequency in use of the word
word kindSort Class namequicksort Commentquicksort Method
namepivot Variable
namequicksort Method call
: :Index key
public final class Sort { /* quicksort */ private static void quicksort(…) { int pivot; : quicksort(…); quicksort(…); }}
11112:
frequency
37Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Extract use relationsExtract use relations among components using semantic analysisMake component graph from use relations
Node: componentEdge: use relation Inheritance
Interfaceimplementati
onVariable type
Instance creation
Field accessMethod callThe kind of use relation
public class Test extend Data{ : public static void main(…) { : Sort.quicksort(super.array); : }}
Sort
Data
TestComponent graph
InheritanceField access
Method call
38Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Similar componentSimilar component is copied component or minor modified componentWe merge similar components into single componentMerged component have use relations that all component before merging have
C
B F
A D
G
EComponent graph
BF
AD E
C G
Clustered component graph
C
B F
A D
G
E
39Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clustering componentsWe measure characteristics metrics to merge componentsThe difference ratio of each component metrics
Metricscomplexity
– The number of methods, cyclomatic, etc. – represent a structural characteristic
Token-composition– The number of appearances of each token– represent a surface characteristic
40Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Ranking based on use relation
Component Rank (CR)Reusable component have many use relation
The example of use is muchGeneral purpose componentSophisticated component
We measure use relation quantitatively, and rank components
The component used by many components is importantThe component used by important component is also important
Katsuro Inoue, Reishi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsushita, Shinji Kusumoto: "Component Rank: Relative Significance Rank for Software Component Search", ICSE, Portland, OR, May 6, 2003.
41Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.34 0.33
0.33
0.17
0.17
0.330.33
Ad-hoc weights are assigned to each node
42Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.33 0.17
0.5
0.175
0.175
0.170.5
The node weights are re-defined by the incoming edge weights
43Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.5 0.175
0.345
0.25
0.25
0.1750.345
We get new node weights
44Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.4 0.2
0.4
0.2
0.2
0.20.4
• We get stable weight assignment next-step weights are the same as previous ones
• Component Rank : order of nodes sorted by the weight
45Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
OutlineMotivation and research aimSPARS-J
OutlineSystem architectureRanking methodEach part
Analysis partRetrieval partUser Interface
ExperimentConclusion and Future work
46Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Component retrieval part
Search components from database, rank components The process
Search componentsRanking suited to a user requestAggregate two ranks (CR and KR)
47Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Search componentsSearch query
Words a user inputThe kind of an index word, package name
Components contain given query are searched from Database
48Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Ranking suited to a user request
Keyword Rank (KR)Components which contain words given by a user are searchedRank components using the value calculated from index word weight Index word weight
– Many frequency in use of a component– A word contained particular components– A word represent the component function such as Class
nameSort the sum of all given word weightTF-IDF weighting using full-text search engine
49Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Calculation of KR valueCalculate weight Wct with component c word t
TFi : The frequency with which a kind i of word t occurs in component c IDF : the total number of components / the number of components containing word tkwi : Weight of a kind i
KR value is the sum of all word Wct
kindall
iict IDFTFkww ) (
the kind of a word
weight
Class name 200Interface name 50Method name 200Package name 50
Import 30Method call 10Field access 10Variable type 10
Instance creation
10
Local var access 1Comment 30
Doc comment 50Line comment 10
String 1
50Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Aggregate two ranksAggregate two ranks KR and CRAggregation method
Borda Count method known a voting systemUse for single or multiple-seat electionsThis form of voting is extremely popular in determining awards
SPARS-JRank components both KR and CRUsing KR and CR, the component that be suitable user’s request, reusable and sophisticated
51Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Borda Count methodThere are 10 voters and 5 candidates (from A to E) Each voter rank candidates1 point for last place, 2 points for second from last place …, and N points for first place1st=5points , 2nd=4points ,…
A : 15+3+6+4=28pointsB : 38pointsC : 38pointsD : 22pointsE : 26points
1st
2nd
3rd
4th
5th
3 A B C D E
3 E B C D A
2 C B A E D
2 C D B A E
1st
1st
3rd
4th
5th
B C A D E
Aggregation
52Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
OutlineMotivation and research aimSPARS-J
OutlineSystem architectureRanking methodEach part
Analysis partRetrieval partUser Interface
ExperimentConclusion and Future work
53Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
User interfaceReceive a user’s query and provide the search results through Web browser
Microsoft Internet Explore, Mozilla, etc.
The processParse query word and the search conditionShow rank ordered resultsShow analyzed information of the component
Used by/Using the componentMetrics
54Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Analyzed informationA component information are as follows
MetricsThe number of method, variableLOC, cyclomaticEtc. (measurable metrics in the component itself)
Components used by/using the componentShow lists of nodes followed use relation
Components that are similar to the component
Show lists of similar components
55Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Package browsingThe naming structure for Java packages is hierarchical
A user can search lists of components in same package of a component easily
56Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
OutlineMotivation and research aimSPARS-J
OutlineSystem architectureRanking methodEach part
Analysis partRetrieval partUser Interface
ExperimentConclusion and Future work
57Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment(1/2)Comparison with Google
Register about 130,000 components get from InternetQuery words ‘calculator applet’ and ‘chat server client’
Calculate relevance ratio of 10 rank higherRelevance: The component is reusable source code
Google is a web search engine…Add ‘java source’ term to the query wordsFollow one link from the result web page
58Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment(2/2)Example 1 :
”calculator applet”SPARS-J
9 hits7 suited components
Example 2 :”chat server client”SPARS-J
69 hits57 suited components
Using SPARS-J, suited component is high order
orderrank componentrelevant ofnumber Theratio relevance
SAPRS-J Google SPARS-J Googleorder
Relevance
Ratio Relevance
Ratio Relevance
Ratio Relevance
ratio
1 ○ 1 ○ 1 ○ 1 × 0
2 ○ 1 × 0.5 ○ 1 × 0
3 ○ 1 ○ 0.67
○ 1 × 0
4 ○ 1 × 0.5 ○ 1 × 0
5 ○ 1 ○ 0.6 ○ 1 × 0
6 × 0.83
○ 0.67
○ 1 × 0
7 ○ 0.86
× 0.57
○ 1 ○ 0.14
8 × 0.75
○ 0.63
○ 1 × 0.13
9 ○ 0.78
× 0.56
○ 1 ○ 0.22
10 - - × 0.5 ○ 1 ○ 0.3
Example1 Example2
59Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Conclusion and Future work
We developed component search engine SPARS-JUsing SPARS-J, retrieval of components used well is enabled easily.
Future workMorphological analysis of Index keywordCollaborative filteringInvestigate best ranking method
The value of weightAggregation ranks
Evaluation of SPARS-JUsability
60Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
End
61Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Component graph
A B
C
ED
F
G
IH
System X System Y
componentuse relation
62Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Weight of nodes
A B
C
ED
F
G
IH
System X System Y
sum of all node weights = 1 ... (1)weight of node represents significance of node
0.10.1
0.2
0.1 0.1
0.1
0.2
0.050.05
1 w(x) 0
63Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Weights of edges
A0.2
0.05
0.05
0.05
0.05
B
0.2
0.05
0.15
0.4
d=1/4
d=1/4d=1/4d=1/4
d: distribution ratio• Node weight is distributed to each outgoing edge• Edge weights are collected at the destination node
sum of all outgoing edge weights = origin node weight ... (2)sum of all incoming edge weights = destination node weight ... (3)
64Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Definition of weightsUnder constraints (1)~(3), we have a simultaneous equation
)(
)()(
2
1
nvw
vwvw
)(
)()(
2
1
nvw
vwvw
t
ddd
dddddd
nnnn
n
n
21
22221
11211
= .
Dt: transposed matrix of distribution ratios
W: node weight vectorThis simultaneous equation can be solved by propagating node weight through edges in the graph
65Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Pseudo use relation
A B C
• Weight computation does not always converge
• Add a pseudo edge from a node to another, if there is no 'real' edge
• Distribution ratios: pseudo edges << real edges
66Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Markov model
• Component rank model can be considered as a Markov Chain of user's focus
• User's focus moves from one component to another along a use relation at a fixed time duration
• Node weight represents the existence probability of the user's focus at infinite future
0.01
0.02 0.01
0.030.05
0.001 0.1
67Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Related WorksMarkov models of documentation traversal
Influence Weight: impact factor of journal publication thought incoming referencesPage Rank: weight of HTML in the Internet through incoming web links
Explicit use relationsNo clustering (important for software products)
Measurement reusability of components or interfaces
Use various characteristic metrics Indirect indicator of reusability Our approach directly reflects usage of components
68Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
1. 利用関係に基づく順位付けComponent Rank ( CR )法
利用関係から部品の利用実績を評価し,順位付けする多くの部品から利用されている部品は重要重要な部品から利用されている部品もまた重要
多く利用される部品や,重要な箇所で利用される部品に大きな評価値が与えられる 使用例が多く,汎用的な部品が上位に来る
69Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
2. キーワード出現頻度に基づく順位付けKeyword Rank ( KR )法
文書検索システムで用いられる TF-IDF 法を改変部品を特徴付ける索引キーに適当な重みを与える
部品に繰り返しあらわれる索引キー希少で,特定の部品に偏ってあらわれる索引キークラス定義名など部品を象徴するトークン種類の索引キー重みの総和を評価値として順位付け
クエリと適合する部品が上位に来る
70Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
3.CR と KR を統合した順位付け 各順位付けの観点1. 部品の使用例の多さ,汎用さ
– CR 法2. クエリと部品内容の適合度
– KR 法順位を統合して,両面で優れている部品を検索結果の上位に表示する
Borda の手法部品検索部で行うため高速な方法が望ましい
71Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
部品群グラフをもとにした繰り返し計算計算手順1. 各頂点に適当な重みを与える
– 重みの総和は 1 2. 各有向辺の重みを求める
– 頂点の重みを,出ていく辺で分配する3. 各頂点の重みを再計算
– 頂点に入ってくる辺の重みの総和を,その頂点の重みとして再定義する4. 頂点の重みが収束するまで, 2.3. を繰り返し計算する5. 収束した頂点の重みを,その頂点に対応する部品群の CR 値とする
– 部品の評価値は属する部品群の CR 値とする
C10.334
C20.333
C30.333
C10.334
C20.333
C30.333
v1×50%
v1×50%v2×100%v3×100%
C1 C2
C3
0.167
0.1670.3330.333
C10.333
C20.167
C30.500
C1 C2
C3
0.1665
0.16650.1670.500
C10.500
C20.1665
C30.3335
C10.400
C20.200
C30.400
0.200
0.2000.2000.400
CR 値の計算
72Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
研究の目的ソフトウェア部品の効率的な検索により,
既存部品を他のシステム開発で用いる再利用支援部品の関連の掌握によるプログラム理解や保守支援
SPARS-J に対して実験的評価を行い,システムの有効性を検証する
73Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
評価にあたっての問題点検索システムの順位付け性能評価
評価用のテストコレクションを用いれば容易ソフトウェア部品検索システムを対象としたテストコレクションは存在していない
言語,部品の単位等の違いにより,テストコレクションの構築が非常に困難
独自の評価実験を適用する
74Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
実験の内容1. 他の検索システムとの比較
– 一般的な検索システムとの比較により, SPARS-Jのソフトウェア部品検索システムとしての有効性を検証する2. SPARS-J の各順位付け手法の比較
– 各順位付け手法を比較し,最も妥当な順位付け手法を調査する3. 実際の開発環境における SPARS-J の適用実験
– 企業において実際に利用してもらうことで,ソフトウェアの管理や理解に有用であることを確認する
75Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
実験1 . 他の検索システムとの比較実験の目的
一般的な検索システムとの比較により, SPARS-J のソフトウェア部品検索システムとしての有効性を検証する比較対象Google : Web ページ検索システムで,様々な目的での検索に利用されるNamazu :信頼性の高い日本語全文検索システム評価尺度適合率:検索結果のうちの適合部品数の割合
割合が高いほど,検索結果に利用できる部品が多く含まれる
76Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
適合率の対象検索結果全てを対象として適合率を求めても意味がない
適合部品数が同じであれば,上位にあっても下位にあっても同じ評価となってしまう検索結果上位の部品について適合率を求めるべきである
Web ページ検索では,検索結果の最初の1ページ( 10 件)目に該当文書が見つからない場合,2ページ目を検索するよりは検索キーワードを変更する傾向がある†検索結果の上位 10 件の部品に対する適合率を求める†Amanda Spink, B. J. Jansen, D. Wolfram, T. Saracevic:”From E-Sex to E-Commerce: Web Search Changes” IEEE Computer,Vol.35,No.3,pp.107-109,Mar(2002).
77Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
実験1 . 他の検索システムとの比較準備データベース
SPARS-J と Namazu に関しては, JDK およびWeb 上で公開されているソースコード(約 14万個のソースファイル)で構築Google は公開されている検索エンジンをそのまま用いる検索キーワード簡単なシステムの開発を想定したクエリを 10個用意する手順
各検索システムの検索結果上位 10 件の部品を対象として,適合率を求める
78Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
実験1の結果keyword SPARS-J Google NamazuA 1 0.7 0.9B 1 0.4 0.6C 0.5 0.3 0.4D 0.8 0.3 0.6E 0.4 0.1 0.3F 0.2 0 0.1G 1 0.3 0.4H 1 0.1 0.2I 0.7 0.4 0.4J 0.7 0.4 0.7Ave. 0.73 0.3 0.46
各検索システムの適合率の比較
79Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
実験1の評価対応のある平均値の差の検定有意水準5%で以下の有意差が見られた
適合率SPARS-J ≫ Namazu ≫ Google
80Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
実験1の考察Google
Web ページ検索システムであり Java ソフトウェア部品に関係するページ以外のものも多く含まれているので,検索結果を絞りきれなかったNamazu
SPARS-J とデータベースは同じGoogle の結果と比較して,検索結果を絞ることができたと考えられる日本語全文検索システムでありソースコードの構文解析を行っておらず, Java ソースコードの特性を考慮していない
SPARS-J は他の検索システムより,ソフトウェア部品を検索するときに有用である
データベースによる差だと考えられる
順位付け性能による差だと考えられる
81Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
実験2 . SPARS-J の各順位付け手法の比較
実験目的各順位付け手法を比較し,最も妥当な順位付け手法を調査する
SPARS-J では3種類の評価手法による順位付け機能を実現1.利用関係に基づく順位付け手法2.キーワードの出現頻度に基づく順位付け手法3.1.2. の手法を統合した順位付け手法
評価尺度
82Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
1. 利用関係に基づく順位付けComponent Rank ( CR )法
利用関係から部品重要度を評価し,順位付けする多くの部品から利用されている部品は重要重要な部品から利用されている部品もまた重要
多く利用される部品や,重要な箇所で利用される部品に大きな評価値が与えられる 使用例が多く,汎用的な部品が上位に来る
83Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
2. キーワード出現頻度に基づく順位付けKeyword Rank ( KR )法
文書検索システムで用いられる TF-IDF 法を改変部品を特徴付ける索引キーに適当な重みを与える
部品に繰り返しあらわれる索引キー希少で,特定の部品に偏ってあらわれる索引キークラス定義名など部品を象徴するトークン種類の索引キー重みの総和を評価値として順位付け
クエリと適合する部品が上位に来る
84Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
3.CR と KR を統合した順位付け各順位付けの観点
1. 部品の使用例の多さ,汎用さ– CR 法
2. クエリと部品内容の適合度– KR 法
順位を統合して,両面で優れている部品を検索結果の上位に表示する部品検索部で行うため高速な方法が望ましい
85Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
3.CR と KR を統合した順位付けBorda の手法順位に対して評価点を割り当て,その合計点をもとに昇順にソート例 ) 部品群 = { A, B, C, D, E }
以降, CR と KR を統合した順位付けを CR+KR と呼ぶことにする
CR
KR
1位 A D
2位 E C
3位 C A
4位 B B
5位 D E
CR
KR 合計点
A 1 3 4B 4 4 8C 3 2 5D 5 1 6E 2 5 7
統合順位1位 A
2位 C
3位 D
4位 E
5位 B
CR と KR の順位 各部品群の評価点 統合された順位
86Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
実験2 . SPARS-J の各順位付け手法の比較
実験目的各順位付け手法を比較し,最も妥当な順位付け手法を調査する
SPARS-J では3種類の評価手法による順位付け機能を実現1.利用関係に基づく順位付け手法( CR )2.キーワードの出現頻度に基づく順位付け手法( KR )3.1.2. の手法を統合した順位付け手法( CR+KR )
評価尺度適合率:検索結果のうちの適合部品数の割合
割合が高いほど,検索結果に利用できる部品が多く含まれる
ndpm 値:ユーザの順位付けとシステムの順位付けの違い値が小さいほど,理想的な順位付けと言える
87Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
ndpm 値の計算方法検索文書集合の全てのペア( d,d’ )における,システムとユーザで順位付けが異なるペア数( m )の割合
文書集合の要素数を n とすると
321 ,, dddD 321 ddd ss
333.02/)13(3
1
ndpm
231 ddd uu
2Cmndpm
n
計算例
88Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
実験2 . SPARS-J の各順位付け手法の比較
準備データベース
(実験1と同じ) 14 万個のソースコード群から構築検索キーワード
(実験1と同じ) 10 個のクエリを用いる
手順各順位付け手法の検索結果上位 10 件の部品を対象として適合率を求める各順位付け手法による検索結果の全ての部品を対象として,ユーザの想定する理想的な順位付けとの ndpm 値求める
89Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
実験2の結果keywor
d CR KR CR+KR CR KR CR+K
RA 1 1 1 0.036 0.048 0.037B 1 1 1 0.194 0.261 0.221C 0.5 0.5 0.5 0.133 0.117 0.092D 0.4 0.9 0.8 0.123 0.200 0.189E 0.4 0.4 0.4 0.208 0.192 0.194F 0.2 0.2 0.2 0.184 0.184 0.160G 0.9 1 1 0.081 0.103 0.080H 1 0.8 1 0.047 0.109 0.052I 0.6 0.7 0.7 0.210 0.324 0.267J 0.5 0.7 0.7 0.219 0.243 0.114
Ave. 0.65 0.72 0.73 0.143
0.178 0.141
適合率 ndpm値
90Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
実験2の評価対応のある平均値の差の検定有意水準5%で以下の有意差が見られた
適合率KR,CR+KR ≫ CR
ndpm 値CR,CR+KR ≫ KR
91Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
実験2の考察CR 法
全体的にユーザの想定した順位で並んでいる傾向がある
KR 法上位には適合部品が多く存在しやすい
CR+KR 法適合率と ndpm 値の両方で優れた順位付けを行うことができたCR 法と KR 法の両方で高順位になったものが上位に順位付けされていると考えられる
92Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
実験3 . 企業内のソフトウェアに対する適用実験実験目的企業において実際に利用してもらうことで,ソフトウェアの管理や理解に有用であることを確認する
実験内容SPARS-J ついての定性的な評価
企業内のソフトウェア開発・保守を行っている従業員7名に対して, SPARS-J についてのアンケートを実施
93Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
アンケートの結果A B C D E F G 最頻値
パッケージブラウザの利用 4 5 5 5 4 3 3 5同グループのクラスの参照 4 5 5 2 4 3 5,4検出されたクラスを利用しているクラスの参照 5 5 5 5 5 5 5
検出されたクラスが利用しているクラスの参照 5 1 5 5 5 5 5
クラスのメトリクス値 1 4 1 2 4 5 4,1ソースコードがダウンロード機能 1 3 5 5 2 5 5時間的コストの削減 3 5 5 3 4 1 5,3ソフトウェア品質の向上 5 3 3 3 4 1 3企業内のソフトウェア把握 3 1 5 3 5 2 1 5,3,1検索結果一覧表示の見やすさ 4 4 5 5 3 3 5 5ハイライト表示の見やすさ 3 5 5 5 5 5 5 5
(良 5 4 3 2 1 悪)
94Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
アンケートの評価特に評価の高かった点パッケージブラウザの利用同グループの参照利用・被利用クラスの参照検索結果一覧・ハイライト表示の見やすさ
SPARS-J を用いることで可能になること部品を利用しているアプリの把握部品改訂時の影響範囲の調査
95Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
実験3の考察ソフトウェア部品の把握が容易となるということを意味しており,保守作業の支援に繋がっていると言えるその他の感想
検索速度が速く,ストレスを感じない前もって索引語を抽出しているためクライアントにインストールする必要がなく,ソフトウェア部品を共有できる共有のデータベースを構築するためSPARS-J はソフトウェア部品の検索に有用であり,また部品の把握や管理にも非常に役に立つシステムである
96Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
まとめと今後の課題ソフトウェア部品検索システム SPARS-J の実験的評価他の検索システムとの比較
Google と Namazu より優れているSPARS-J の各順位付け手法の比較
CR・ KR それぞれ特徴があり,それらを統合することで性能が向上した企業内のソフトウェアに SPARS-J を適用した保守を行う時に非常に有効であった
今後の課題再現率の調査順位付け性能以外の観点からの定量的評価他のソフトウェア部品への対応
97Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
END
The End