icse2014参加報告 (se勉強会 6/12)

SE勉強会

国立情報学研究所

2014年6月12日

国立情報学研究所助教坂本一憲

東京大学修士２年生鈴木貴之

自己紹介

• 坂本一憲– 興味：テスト、プログラム解析、プログラミング言語– 参加セッション

• 1日目) Testing 1, Social Aspects of Software Engineering, Prediction

• 2日目) Testing 2, Search & APIs, Build and Package Management

• 鈴木貴之– 興味：リポジトリマイニング– 参加セッション

• 1日目) Perspectives on Software Engineering, Repair, Prediction• 2日目) Panel2: Analyzing Software Data, Search and APIs, Mining• 3日目) Modeling and Interfaces, Refactoring and Reverse

Engineering

1日目) Testing 1, Social Aspects of Software Engineering, Prediction2日目) Testing 2, Search & APIs, Build and Package Management

2014/6/13 3

Code Coverage for Suite Evaluation by Developers

• Rahul Gopinath et al. (Oregon State University, USA )

• カバレッジ指標を比較する研究は存在するが、対象プロジェクトが少ない

• Statement, Block, Branch, Path(AIMP) CoverageとMutation analysis

• Coverage：Cobertura, Emma, CodeCover, Jmockit; Mutation：PIT

• プロジェクトに存在する・Randoopで生成したテストスイートで2回実験

• 各指標で250件程度が測定に成功、CoverageとMutationの相関性を計算

• Statementカバレッジ(!)が最も高い相関性(相関係数R2, Kendall correlations)

• 所感1：PITは2013年にリリースされたOSSで、学術会で有効性が未検証

• 所感2：Test Oracleについて評価がされていないため、妥当性について疑問*

*) Staats et al. Programs, tests, and oracles: the foundations of testing revisited, ICSE, pp. 391-400, 2011.

79ページのTable 7

Preprintはこちら

http://dl.acm.org/citation.cfm?id=2568278

http://research.engr.oregonstate.edu/hci/sites/research.engr.oregonstate.edu.hci/files/papers/gopinath2014code_1.pdf

Coverage Is Not Strongly Correlated with Test Suite Effectiveness (Distinguished Paper)

• Laura Inozemtseva et al. (University of Waterloo, Canada)

• 5種類のOSSを対象（Apache POI, Closure, HSQLDB, JFreeChart, Joda Time）

• Randoopで1000テストスイート作成: 3,10,30,100 メソッド

• CodeCoverでStatement, Decision, Modified Condition Coverageを測定

• PITでミューテーション解析を実施（欠陥の検出率を測定）

• どのカバレッジ指標も欠陥数と弱い相関性しかなかった！

• 所感：前の論文と同様にテストオラクルの評価は一切なし！

441ページのFigure 3



http://www.linozemtseva.com/research/2014/icse/coverage/coverage_paper.pdf

Time Pressure: A Controlled Experiment of Test-Case Development and Requirements Review

概要: Time Pressureによって、簡単なタスク（要求仕様レビューとテスト作成）の品質を下げずにコスト削減、その他デメリットもない





http://www.soberit.hut.fi/mmantyla/Mantyla_ICSE2014_Time_pressure_pre_print.pdf

Software Engineering at the Speed of Light: How Developers Stay Current using Twitter

• Grounded Theory-based studyを実施：GitHubで活動する開発者271名にExploratory Survey、27名にインタビュー、1,207名にValidation Surveyを実施

• RQ 1: どうやってTwitterから人・流行・プラクティスの情報を得るか？(*) 77%/9%：指導者的な人をフォロー、62%/17%：プロジェクトや技術発展に有用

– a) other developers, b) projects, c) news curators, d) thought leaders

• RQ 2: どのようにTwitterはソフトウェア開発の知識を拡張するか？ 62%/14%：習得すべき技術を認知できる、32%/38%：より良い開発者になれる

– 専門家をフォローして質問、キャリアに関する定常的な調査や偶然の学習

• RQ 3: どのようにTwitterは開発者間の関係を育むか？ 44%/28%：コミュニティを育成できる、69%/14%：面白い開発者を発見できる、28%/42%：求職できる

– 自身の公のイメージ・評判を管理、面白い開発者を発見、良好な関係の形成

• RQ 4: Twitterでどんな課題に直面するか？どうやって対処するか？ 72%/11%：フォローする人を注意深く選ぶ、 65%/17%試しにフォロー・アンフォローする

– 継続的にネットワークを維持・管理、情報量が多すぎるのでフィルタリングなどで対処

• RQ 5: Twitterを利用しない開発者の理由は？

– Too much noisy、140文字制限が嫌い（誤解を生みやすい）、他と比べ会話しにくい

*) xx%/yy%: Validation Surveyでxx%の人がagreeでyy%の人がdisagree



http://leif.me/papers/Singer2014.pdf

Micro Execution

• Patrice Godefroid (Microsoft Research)• Micro execution（任意のコード断片をテストドライバ及び入力データな

しで実行する能力）という概念を提案• x86のバイナリファイルに対応したプロトタイプVMを開発• 入力データ生成方法

– Zero mode、Random mode、File mode、Process-dump mode、SAGE mode（過去に開発したSymbolic Executionツールでできるだけ多くのパスを通るように生成）

• 利用例– Automated API fuzzing：DLLが提供する関数を叩いてクラッシュを探す– Packet parser isolation and fuzzing：従来はシステムテストで頑張っていた

が、パケットのパーサーやパース結果の処理部分を個別にテスト– Targeted fuzzing：Excelなどにある複雑なパーサを結合テストするのは大変

なので、サブパーサを個別にテスト– Unit verification：同様に複雑なコンポーネントを個別にテスト– Malware detection：実際に動かして怪しい挙動を検出（パケット送信等）


http://dl.acm.org/citation.cfm?id=2568225.2568273

http://research.microsoft.com/en-us/um/people/pg/public_psfiles/icse2014.pdf

Unit Test Virtualization with VMVM(Distinguished Paper)

• Jonathan Bell et al. (Columbia University)

• Junitはテストケース毎にJVM全体を初期化

• 提案では副作用のあったクラスのみ初期化

– なお、TestNGやNunitはそもそも初期化しない

• テストケースの実行時間を大幅に削減

• AntとMavenで利用可能なツールを公開！


557ページのTable 4



http://jonbell.net/publications/vmvm

CodeHint: Dynamic and Interactive Synthesis of Code Snippets (Awarded as Prof. R. Narasimhan Lecture)

• Joel Galenson et al.(University of California at Berkeley, USA)

• 従来手法では静的解析（型情報）でコード補完

• 提案手法では動的解析でコード補完（コード生成）

1. デバッガとブレークポイントで実際に実行&停止

2. 停止したコンテキストで利用可能な変数を列挙

3. メソッド呼び出しや演算子で条件を満たす式を作成

– 条件はホスト言語（Java）で記述できる任意の内容

• o’ instanceof JMenuBar や o’.toString().contains(“Alice”) など

• 動画: http://www.cs.berkeley.edu/~joel/codehint/

661ページのFigure 3661ページのFigure 2



http://www.cs.berkeley.edu/~joel/codehint/

http://www.cs.berkeley.edu/~joel/papers/icse2014.pdf

その他ざっくりと紹介

• Two‘s Company, Three’s a Crowd: A Case Study of Crowdsourcing Software Development (Preprint, リンク切れ?)

– TechPlatform Inc (TopCoderを利用) でクラウドソーシングしたら想像以上に大変だった！（お金はかかるし品質は悪い）

• Comparing Static Bug Finders and Statistical Prediction(Preprint)– 欠陥検出の静的解析手法と欠陥予測手法の比較を実現– 欠陥予測手法で欠陥のありそうなモジュールから順に、

静的解析手法で警告が出た箇所を確認すると効果的

• How Do API Documentation and Static Typing Affect API Usability? (Preprint)– ドキュメント有/無と静的/動的型付け言語がAPI使用性にどのよ

うな影響を与えるか実験で検証(従来はドキュメントなしのみ)– いずれにせよ静的型付けの方が動的型付け言語より良い– ドキュメントがあったほうが静的型付け言語がさらに有利


http://staff.lero.ie/stol/files/2014/03/stol_fitzgerald_icse2014_crowdsourcing_preprint.pdf


http://macbeth.cs.ucdavis.edu/bodes-main.pdf


http://users.dcc.uchile.cl/~rrobbes/p/ICSE2014-docstypes.pdf

1日目) Perspectives on Software Engineering, Repair, Prediction

2日目) Panel2: Analyzing Software Data, Search and APIs, Mining

3日目) Modeling and Interfaces, Refactoring and Reverse Engineering

2014/6/13 12

A Study and Toolkit for Asynchronous Programming in C# (Distinguished Paper)

C#において並行処理を実現するasync/awaitキーワードの誤用を調査

誤用パターン割合

打ちっぱなし 19%

不必要なasync 14%

時間がかかる可能性のある処理 5%

不必要なコンテキストキャプチャ 74%

☞これらの誤用パターンを修正するツールを開発



https://ideals.illinois.edu/bitstream/handle/2142/45837/okur-2014-icse.pdf?sequence=3

Effects of Using Examples on Structural Model Comprehension: A Controlled Experiment

ドメイン知識の伝達において具体例が有効であることを示した

いくつかの要求の下でポイントカードのモデルのオブジェクトダイアグラムを作成

Controlクラスダイアグラムの概要のみEDMクラスダイアグラムの概要+他のポイントカードの具体例


Preprintはなし


Mining Billions of AST Nodes to Study Actual and Potential Usage of Java Language Features

Javaの新機能の使われ方に関する調査

1. 新機能はリリース前に使われるか？

使われているがコンパイラのサポート状況に大きく依存する

2. 新機能はどのぐらい使われているのか？

アノテーション，拡張for，ジェネリクスは多く使われているがその他は多くない

3. 開発者はどのように適応していくのか？

新機能を使ったコードのほとんどは少数の開発者によって書かれている

4. 新機能を使うべきなのに使っていないコードはあるか？

たくさんある

5. 古いコードは新機能で置き換えられているか？

徐々に置き換えられつつある



http://design.cs.iastate.edu/papers/ICSE-14/icse14.pdf

Cowboys, Ankle Sprains, and Keepers of Quality: How Is Video Game Development

Different from Software Development?

ゲーム開発とその他のソフトウェア開発における慣習の差を調査

ゲーム開発とその他ソフトウェアの開発経験の両方がある開発者14人にインタビュー，MSの開発者にアンケート

• 自動化されたユニットテストよりも人手によるテストがよく用いられる

– 状態空間が大きすぎてテストを書くのが大変

• ゲーム開発ではアジャイル開発がよく用いられる

– ……とはいうものの決まったプロセスが無いことをアジャイルと言っているだけ

• その他のソフトウェア開発に比べて多様な能力を要求される

☞ゲーム以外の開発での常識が通用しない



http://research.microsoft.com/pubs/210047/murphyhill-icse-2014.pdf

Mining Fine-Grained Code Changes to Detect Unknown Change Patterns

コミット単位よりも細かい単位での変更履歴から変更パターンをマイニング

抽出されたパターン

void addPerson(Person p) {…

}

Nullチェック

void addPerson(Person p) {if (p == null) {return;}…

}

Enum要素の追加

switch(e){case START:

}

switch(e){case START:…case STOP:…

}

if (e.isStart())… if (e.isStart())…if (e.isStop())…



http://cope.eecs.oregonstate.edu/papers/NegaraICSE2014.pdf

icse2014参加報告 (se勉強会 6/12)

Technology