distributed data mining system in java group member 王春笙,林俊甫,王慧芬

36
Distributed Data Mining System in Java Group Member 王王王 王王王王 王王王 ,,

Upload: agnes-stewart

Post on 28-Dec-2015

232 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Distributed Data Mining System in Java

Group Member

王春笙,林俊甫,王慧芬

Page 2: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Overview of Project Overview of Project

• Project participants– 王春笙,林俊甫,王慧芬

Page 3: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Project Programming Tasks Project Programming Tasks

• D92725002 林俊甫– Polling and reply Multicast between client and server– Client/Server Socket programming– Client dynamic join and leave mechanism– Multi-thread programming – Synchronization mechanism– Data chunks maintenance and dispatching mechanis

m– Client/Server communication link control

Page 4: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Project Programming Project Programming Tasks(cont’d)Tasks(cont’d)

– Client failure handling• Reassign backup server, if failure client is backup• Restore failure client works (with 王春笙 )

– Server failure handling• Backup Server designate mechanism and logic design

– RMI mechanism (with 王春笙 )– Basic GUI

Page 5: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

System Infrastructure System Infrastructure

• System diagram

LAN

Server/Coordinator

Client Client Client

...

Mining data chunk

Mining result

Page 6: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Basic OperationBasic Operation

Server Client1. Polling on port 4444 Group 230.0.0.1@: who is server?

2. Servername: I am the server

3. Connect to <servername, port 4445>

4. Client do: filechunk#

5. ok

6. Client do: next filechunk#

7…..8…..….

Time Time

Listen multicastGroup query and reply Server found;

Connect to the Server

Fork thread to Handle client connection

Receive server’sInstruction, ivokeRMI to get file chunk

Wait for client’sProcessed result,Order client to getAnother file chunk

Page 7: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Port AssignmentPort Assignment

• Port 4444: for multicast

• Port 4445: for TCP/IP socket connection

• Port 4446: for RMI services

Page 8: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Finding A ServerFinding A Server

• Once a client start up, it will query periodically every 3 sec. over the multicast group 230.0.0.1 port 4444 by sending 1 byte string “@” to locating the server host.

• Once a server start up, it will fork a thread to dealing with the query

6. Server failure detect -> if I am backup

go to backup serverprocedure, otherwise

go to step.1.

3.Connect to Server on port

4445

2. Listen forserver response

1. Client Query: who is the Server now?

4. Use RMI Get file chunk from

Server

5. Process data mining and return

result to server

Page 9: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

File DispatchingFile Dispatching

• Server maintain a file chunk pool .

• Server will find a available file chunk for client, set it to 1 and order client to get this file chunk by RMI file chunk will be update to 2 when client return result.

• Recovery: When server detects client’s link-broken, it will restore file chunk allocate to client to 0.

• File chunk class is declared as Serializable for RMI message passing to backup server

• File chunk class use Synchronization for concurrent control

FileChunks …………

-1: empty, 0: available, 1: using, 2:used

Page 10: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Backup Server SelectionBackup Server Selection

• Server maintains and assigns unique id for each individual client.

• Unique id is incremented as serial number.

• Client with smallest id is assigned as backup server

• When client failure, server will check if it is the backup server to restart the selection process or not.

Page 11: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Nodes MaintenanceNodes Maintenance

• Server maintain connected client’s records in an ArrayList

• ArrayList is compound with class Nodes, which records client’s detail information.

Key Value

Id Address Port Work on Status

ArrayList: ht

Nodes

Page 12: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

RMI ServicesRMI Services

• RMI services is written in independent program because server and client (which acts as backup server) will use it.

• RMI services provides:– Backup server data to backup-server.– Get file chunk from server– Return mining result to server– Receive nodes information from server

Page 13: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Client FailureClient Failure

• Server’s action took:– Recovery– Reassignment – Redo backup server selection if failure nodes

is backup

• Client’s action– Do nothing except one is told by server to act

as backup

Page 14: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Server Failure Server Failure Server S Client BTime Time

Server run backupSelection choose AAs backup

TimeClient A

1.A is told by S thatIt is the backupA invoke RMI to get all Server data

A: Do backup

RMI Get file

RMI reply

2. A periodically Get server services,File chunk data do reply

Client do #

Client do #

do reply

1. B receives instruction as discuss before

Server CrashX X3. Comm.link brokenIs detected, start ServerAction class

2. Comm.Link Broken is detected, multicast query who is the server now?

B Polling @: who is server?4. Create server Socket at 4445, fork threadTo listen to query And wait for connection

A reply: I am the server3. B know A is the backup, re-connect to A

Connect to A:4445

Page 15: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Server/Client Life CycleServer/Client Life Cycle

Server Client

ServerNormal/AbnormalTermination

Normal/AbnormalTermination

evolve

Page 16: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Project Programming Tasks Project Programming Tasks

• D91725001 王春笙– Web log file preprocessing and separating– Web pages traversal sequences parsing– Page items transferring and mapping– Web pages sequential patterns mining – Mining results maintenance – RMI mining results transfer– Mining results lookup and display

Page 17: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Project Programming Project Programming Tasks(cont’d)Tasks(cont’d)

– Backup mechanism • Separate thread backup server files and memory data • Restore failure client works (with 林俊甫 )

– RMI mechanism (with 林俊甫 )– GUI global states refreshment– System integration

• Testing and debugging

Page 18: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Web Log File FormatWeb Log File Format

• User IP

• Date

• Time

• Web pages URL

Page 19: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Web File PreprocessingWeb File Preprocessing

• Select *.htm and *.html pages

• First sort by user ID

• Second sort by time

• Pages sequences separated by time– more than 30 seconds

Page 20: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Chunk Data FilesChunk Data Files• Part*.ppp

• Items.ppp

6023 2 1 1 2 86024 1 1 2066025 7 1 1 1 1 1 1 1 2 5 17 18 19 20 116026 3 1 1 1 144 145 3386027 2 1 1 2 96028 3 1 1 1 2 8 3

/~visualdep/htm/p5b.htm 168/~businessdep/student/picture.html 169/~comedu/inde.htm 170/~account/91tuition.htm 171/~stuaffair/life/procedure-17.htm 172/~stuaffair/life/procedure-25.htm 173

Page 21: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Apriori algorithmApriori algorithm

• 1:find all L1

• 2:generate C2 from L1

• 3:count C2 and find all L2

• 4:k=3

• 5:generate & prune Ck from Lk-1

• 6:count Ck and find all Lk

• 7:if Lk not empty then k++, goto 5

Page 22: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Apriori algorithm Apriori algorithm (cont’d)(cont’d)

• join phase:s1 join s2 if s1(drop first) = s2(drop last)

– s1 join s2 =>

• prune phase:delete a k candidate if any k-1 sub sequence not large

• C & L are stored in hash data structure

},{},,{ 21 absbas

},,{ aba

Page 23: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Mining Result DisplayMining Result Display• Client frequent patterns

– Web page ID– Support– Saved as *.pppl files

• Client frequent patterns– Web page ID– Support– Web page name

Page 24: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Backup MechanismBackup Mechanism

• When backup server selected, that client start a backup thread

• Backup thread loop every 0.5 second

• RMI data transfer– Chunk data file(part*.ppp,items.ppp)– Client information– File chunk information

• determine MaxID and set “in use” to “available”

– Frequent patterns information

Page 25: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

System IntegrationSystem Integration

• Java class integration– Server component– Client component– Data mining component– GUI component

• Testing

• Debugging

Page 26: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

Project Programming TasksProject Programming Tasks

• D92725001 王慧芬

– Graphical User Interface• Since this is a system working on data mining task

in a distributed way, its GUI provides four panels :– A system console– A result window– A connection table– A graphical network configuration

Page 27: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

GUIGUI

• The system console shows how system proceeds

Page 28: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

GUI (cont’d)GUI (cont’d)

• The result window displays the progress and results of data mining

Page 29: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

GUI (cont’d)GUI (cont’d)• A connection table lists all of the on-line

client connection information

Page 30: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

GUI (cont’d)GUI (cont’d)• A connection table consists of 5 fields

– NO: client-server connection id– IP address: client’s IP address– Port: client’s port number– Status: connection status, it could be

• 0: offline 1: online• 2: file transfer from server to client• 3: client is doing data mining• 4: client returns value back to server if data mining finished• 5: client is doing the backup and data mining at the same time

– # chunk works on: if data mining and backup, it indicates the chuck number that the connection works on

Page 31: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

GUI (cont’d)GUI (cont’d)• A graphical network configuration follows the

connection table to depict the dynamic network configuration

Page 32: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

GUI (cont’d)GUI (cont’d)

• In the dynamic network configuration, we use different client GIFs to express the status :– Offline On-line

– Data mining

– Backup and mining

Page 33: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

GUI interfaceGUI interface• mw.showMsg()

– provided by GUI for server/client module to show the console message

• mw.showResultString()– provided by GUI for server/client module to show the re

sults of data mining

• Connection table– modified by server/client module for connection inform

ation– read by GUI every 0.01 second to depict the dynamic n

etwork configuration

Page 34: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

GUI designGUI design

• Java swing is used to generate label, text, scrollbar, and table, etc..

• Java AWT 2D painting is used to form the animation of the connection lines in the dynamic configuration panel

• ‘Photo Impact’ and ‘GIF animator’ are used to generate the node icons

• EasyRGB used to tune the color harmonies.

Page 35: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

GUI design (cont’d)GUI design (cont’d)• A new thread is forked from the GUI task to work on the

animation of the connection lines in the dynamic configuration panel,

– to read the table

every 0.03 second and

to show the connection

status with a moving

ball.

GUI

Generateconnection

table

Generateresult panel

Generatesystemconsole

Generateconnection

table

animation

Page 36: Distributed Data Mining System in Java Group Member 王春笙,林俊甫,王慧芬

InstallationInstallation

• 以執行一個 server ,兩個 client 為例– 建立三個資料夾,此三資料夾 Ser(Server),Cli(Client1),Cli2(Client

2)– 將附檔解壓至 Ser 資料夾,此資料夾內要下載 weblog10.zip 檔,

並解壓– 將附檔解壓至 Cli 與 Cli2 的空資料夾– 開啟二個 dos 視窗 (1,2 號視窗 ) ,進入 Ser 資料夾– 開啟三個 dos 視窗 (3,4,5 號視窗 ) , 3,4 號進入 Cli 資料夾, 5 號

進入 Cli2 資料夾– 1 號視窗執行 compile.bat 批次檔,再執行 rmi.bat– 2 號視窗執行 server.bat 批次檔– 3 號視窗執行 compile.bat 批次檔,再執行 rmi.bat– 4 號視窗執行 client.bat 批次檔– 5 號視窗執行 compile.bat 批次檔,再執行 client.bat 批次檔