cross language clone analysis team 2 october 13, 2010
TRANSCRIPT
Presentation 4Cross Language Clone Analysis
Team 2October 13, 2010
• Current Tasks• Spike – GOLD Parser • Demo• Project Layout• Team Collaboration• Path Forward
Agenda
2
Allen Tucker Patricia Bradford Greg Rodgers Brian Bentley Ashley Chafin
Our Team
3
Current TasksWhat we are tackling…
4
Current tasks created for the first user story “Source Code Load & Translate”:◦ Load & parse C# source code.◦ Load & parse JAVA source code.◦ Load & parse C++ source code.◦ Translate the parsed C# source code to
CodeDOM.◦ Translate the parsed JAVA source code to
CodeDOM.◦ Translate the parsed C++ source code to
CodeDOM.◦ Associate the CodeDOM to the original source
code.
Current Tasks (Review)
5
UML Model – Load & Parse
6
UML Model – Translate
7
UML Model – Associate
8
GOLD Parsing SystemSpike
9
Topics To Discuss What is it? How does it work? What can we use it for? How can we extend it?
10
What Is GOLD? GOLD is a free parsing system that you can
use to develop your own programming languages, scripting languages and interpreters. It strives to be a development tool that can be used with numerous programming languages and on multiple platforms. – www.devincook.com/goldparser
11
How It Works (Block Structure)
Grammar Builder
Compiled Grammar
Table (*.cgt)
Engine
Source Code
Parsed
Data
12
How It Works (Components)
Grammar Builder
Compiled Grammar
Table (*.cgt)
Engine
Source Code
Parsed
Data
Three Major Components1. Builder – Reads a source
grammar to construct a Compiled Grammar Table
2. Compiled Grammar Table – Stores LALR and DFA parse tables
3. Engine – Performs actual parsing
13
How It Works (Process)
Grammar Builder
Compiled Grammar
Table (*.cgt)
Engine
Source Code
Parsed
Data
Step 1• Write the grammar for the
language being implemented. (GOLD-Meta Language)• Rules: Backus-Naur Form• Terminals: Regular Expressions• Character sets: Set Notation
14
How It Works (Process)
Grammar Builder
Compiled Grammar
Table (*.cgt)
Engine
Source Code
Parsed
Data
Step 2• Analyze Grammar• Construct LALR and DFA parse
tables which are saved in a Compiled Grammar Table file.
15
How It Works (Process)
Grammar Builder
Compiled Grammar
Table (*.cgt)
Engine
Source Code
Parsed
Data
Step 3• Analyze source text with parser
engine and construct parse tree• Engine can be implemented in
any number of programming languages
16
Usage within CloneDigger
Compiled Grammar
Table (*.cgt)
Engine
Source Code
Parsed
Data
CodeDOM Conversion• Need to write routine to move
data from Parsed Tree to CodeDOM• Parsed data trees from parser
are stored in consistent data structure, but are based on rules defined within grammars
CodeDOM Conversi
on
AST
17
Task Understanding Three Step Process
• Step 1 Code Translation
• Step 2 Clone Detection
• Step 3 Visualization
Source Files
TranslatorCommon
Model
Common Model
InspectorDetected Clones
Detected Clones
UIClone
Visualization
18
Extension and Enhancements
Grammar Builder
Compiled Grammar
Table (*.cgt)
Engine
Source Code
Parsed
Data
Enhance Grammars• Update Java• Update C#• Define C++
• Share among other classmates with similar interest
• Share with greater community
19
Grammars What is a grammar?
◦ A set of rules of a specific kind, for forming strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context —only their form.
20
Gold Parser Grammars Gold Parser uses context-free grammars
that can be used to do Lookahead Left-to-Right (LALR) parsing.
LALR compliant grammars that we already have:◦ C#◦ Java◦ Visual Basic .Net
21
Grammar Example
22
C++ Grammar Issue Currently no LALR compliant C++ grammar
exists due to the overall complexity.
Other C++ parsers exist, but give an output format different than the other languages we already have grammars for using Gold Parser.
We are still searching for C++ parsing solutions.
23
We plan to use GOLD Parsing System. Tasks we have to complete:
◦ Update JAVA grammer◦ Update C# grammer◦ Research “Define C++ grammer”◦ Create a CodeDOM conversion to move data from
Parsed Tree to CodeDOM
GOLD Parser Conclusion
24
DemonstrationsGOLD Parsing System
25
Project LayoutKey Points, Architecture, & Unit Test
26
Key Architecture Points Multilanguage support
Configurable for different platforms◦ Stand-along application◦ plug-in◦ backend service
Extendable
27
Architecture
C# Service
Java Service
C++ Service
User Interface
Communication Layer
Code Model
Clone Detection Algorithms
Core
API
Language Service Interface
28
Visual Studio Solution
29
Core Unit Code Model
◦ Stores the code in common format Application Programming Interface
◦ Used to embed clone detection in applications Language Service Interface
◦ Communication layer between the core and the specific language services
Code ModelClone Detection
Algorithms
Core
API
Language Service Interface
30
Core
31
Core - API
32
Language Service
33
Language Service
34
Language Service
35
App Configuration
36
Unit Testing
37
Team CollaborationTeam 2 & Team 4
38
Team Collaboration Due to Team 4’s team size, we have taken
responsibility of gathering & sharing grammers.
Both Teams will…◦ Use the same grammers & engines
We will both have limitations based on this. Ex: JAVA grammer is based off 1.4 -> we are limited to
using JAVA 1.4
◦ Test the same grammers & engines We will have two test beds.
39
Team Collaboration Method of collaboration:
◦ Google code project site: http://code.google.com/p/uah-studio-2010-2011/ Team 4 team members have access to this site.
◦ Meetings◦ Email
What does our google code project contain?◦ Source control for grammers & engines◦ Bugs/Issues
Team 4 will have ability to document new bugs.◦ Documents/Artifacts
40
Path ForwardNext Iteration & Schedule
41
Finalize Iteration 1 Iteration 2 Planning/Elaboration
Path Forward
Schedule