hanh an lol

21
Khai Phá Dữ Liệu Nguyễn Nhật Quang quangnn-fi[email protected] Trường Đại học Bách Khoa Hà Nội Viện Công nghệ Thông tin và Truyền thông Năm học 2011-

Upload: nguyendongnam

Post on 11-Nov-2015

218 views

Category:

Documents


5 download

DESCRIPTION

bai tap lon

TRANSCRIPT

Microsoft PowerPoint - L2-Gioi_thieu_WEKA.ppt [Compatibility Mode]

WEKA Explorer: Lut kt hp

WEKA Explorer: Hin th d liu

WEKA Explorer: La chn thuc tnh

Khai Ph D LiuNguyn Nht [email protected] i hc Bch Khoa H Ni Vin Cng ngh Thng tin v Truyn thng Nm hc 2011-2012Ni dung mn hoc: Gii thiu v Khai ph d liu Gii thiu v cng c WEKAA I I M I y Tin x l d liu Pht hin cc lut kt hp Cc k thut phn lp v d on Cc k thut phn nhm2Khai Ph D Liu

Khai Ph D Liu3

WEKA l mt cng c phn mm vit bng Java, phc v lnh vc hc my v khai ph d liu Cc tnh nng chnh Mt tp cc cng c tin x l d liu, cc gii thut hc my, khai ph d liu, v cc phng php th nghim nh gi Giao din ha (gm c tnh nng hin th ha d liu) Mi trng cho php so snh cc gii thut hc my v khai ph d liuC th ti v t a ch:http://www.cs.waikato.ac.nz/ml/weka/ Simple CLIGiao din n gin kiu dng lnh (nh MS-DOS) Explorer (chng ta s ch yu s dng mi trng ny!)Mi trng cho php s dng tt c cc kh nng ca WEKA khm ph d liu ExperimenterMi trng cho php tin hnh cc th nghim v thc hin cc kim tra thng k (statistical tests) gia cc m hnh hc my KnowledgeFlowMi trng cho php bn tng tc ha kiu ko/th thit k cc bc (cc thnh phn) ca mt th nghim

WEKA Cc mi trng chnh

WEKA Mi trng Explorer

WEKA Mi trng Explorer

Khai Ph D Liu4

Khai Ph D Liu#

Khai Ph D Liu5

Preprocess chn v thay i (x l) d liu lm vic Classify hun luyn v kim tra cc m hnh hc my (phn loi, hoc hi quy/d on) Cluster hc cc nhm t d liu (phn cm) Associate khm ph cc lut kt hp t d liu Select attributes xc nh v la chn cc thuc tnh lin quan (quan trng) nht ca d liu VisualizeWEKA Mi trng Explorer

WEKA Explorer: Tin x l d liu

xem (hin th) biu tng tc 2 chiu i vi d liuKhai Ph D Liu6

Khai Ph D Liu8

Khai Ph D Liu9

ca tp d liu

WEKA Khun dng

WEKA ch lm vic vi cc tp tin vn bn (text) c khun dng ARFF V d ca mt tp d liuTn ca tp

relation .weather/d liu

attribute .outlook {sunny, overcast, rainyiv^Thuc tnh

attribute temperature realattribute "humidity realkiu nh danh

attribute windy {TRUE, FALSE} attribute" play {yes, no}';^Thuc tnh kiu s Thuc tnh phn lp

@datasunny,8 5,8 5,FALSE,no(mc nh l thuc tnh cui cng)

overcast,83,86,FALSE,yes |ICc vi d (instances)

D liu c th c nhp vo (imported) t mt tp tin c khun dng: ARFF, CsV Dliucngcth c c votmt ach URL,hoctmt c s d liu thng qua JDBC Cc cng c tin x l d liu ca WEKA c gi l filters Ri rc ha (Discretization) Chun ha (Normalization) Ly mu (Re-sampling) La chn thuc tnh (Attribute selection) Chuyn i (Transforming) v kt hp (Combining) cc thuc tnh ^Hy xem giao din ca WEKA Explorer... Cc b phn lp (Classifiers) ca WEKA tng ng vi cc m hnh d on cc i lng kiu nh danh (phn lp) hoc cc i lng kiu s (hi quy/d on) Cc k thut phn lp c h tr bi WEKA Nave Bayes classifier and Bayesian networks Decision trees Instance-based classifiers Support vector machines Neural networks^ Hy xem giao din ca WEKA Explorer... La chn mt b phn lp (classifier) La chn cc ty chn cho vic kim tra (test options) Use training set. B phn loi hc c s c nh gi trn tp hc Supplied test set. S dng mt tp d liu khc (vi tp hc) cho vic nh gi Cross-validation. Tp d liu s c chia u thnh k tp (folds) c kch thc xp x nhau, v b phn loi hc c s c nh gi bi phng php cross-validation Percentage split. Ch nh t l phn chia tp d liu i vi vic nh gi More options... Output model. Hin th b phn lp hc c Output per-class stats. Hin th cc thng tin thng k v precision/recall i vi mi lp Output entropy evaluation measures. Hin th nh gi hn tp(entropy) ca tp d liu Output confusion matrix. Hin th thng tin v ma trn li phn lp (confusion matrix) i vi phn lp hc c Store predictions for visualization. Cc d on ca b phn lp c lu li trong b nh, c th c hin th sau Output predictions. Hin th chi tit cc d on i vi tp kim tra Cost-sensitive evaluation. Cc li (ca b phn lp) c xc nh da trn ma trn chi ph (cost matrix) ch nh Random seed for XVal / % Split. Ch nh gi tr random seed c s dng cho qu trnh la chn ngu nhin cc v d cho tp kim tra Classifier output hin th cc thng tin quan trng Run information. Cc ty chn i vi m hnh hc, tn ca tp d liu, s lng cc v d, cc thuc tnh, v f.f. th nghim Classifier model (full training set). Biu din (dng text) ca b phn lp hc c Predictions on test data. Thng tin chi tit v cc d on ca b phn lp i vi tp kim tra Summary. Cc thng k v mc chnh xc ca b phn lp, i vi f.f. th nghim chn Detailed Accuracy By Class. Thng tin chi tit v mc chnh xc ca b phn lp i vi mi lp

WEKA Explorer: Cc b phn lp (1)

WEKA Explorer: Cc b phn lp (1)

Confusion Matrix. Cc thnh phn ca ma trn ny th hin s lng cc v d kim tra (test instances) c phn lp ng v b phn lp sai

Khai Ph D Liu12

Khai Ph D Liu13

Result list cung cp mt s chc nng hu ch Save model. Lu li m hnh tng ng vi b phn lp hc c vo trong mt tp tin nh phn (binary file) Load model. c li mt m hnh c hc trc t mt tp tin nh phn Re-evaluate model on current test set. nh gi mt m hnh (b phn lp) hc c trc i vi tp kim tra (test set) hin ti Visualize classifier errors. Hin th ca s biu th hin ccI _>_ _ I _ /V _ I_kt qu cua vic phn lpWEKA Explorer: Cc b phn cm (1)

WEKA Explorer: Cc b phn cm (1)

Cc v d c phn lp chnh xc s c biu din bng k hiu bi du cho (x), cn cc v d b phn lp sai s c biu din bng k hiu vung ()Khai Ph D Liu#

Khai Ph D Liu16

Khai Ph D Liu15

Cc b phn cm (Cluster builders) ca WEKA tng ng vi cc m hnh tm cc nhm ca cc v d tng-h r A < AIVI At i vi mt tp d liu Cc k thut phn cm c h tr bi WEKA Expectation maximization (EM) k-Means Cc b phn cm c th c hin th kt qu v so snh vi cc cm (lp) thc t^Hy xem giao din ca WEKA Explorer ... La chn mt b phn cm (cluster builder) La chn ch phn cm (cluster mode) Use training set. Cc cm hc c s c kim tra i vi tp hc Supplied test set. S dng mt tp d liu khc kim tra cc cm hc c Percentage split. Ch nh t l phn chia tp d liu ban u cho vic xy dng tp kim tra Classes to clusters evaluation. So snh chnh xc ca cc cmhc c i vi cc lp c ch nh Store clusters for visualization^ Lu li cc b phn lp trong b nh, c th hin th sau Ignore attributes^ La chn cc thuc tnh s khng tham gia vo qu trnh hc cc cm La chn mt m hnh (gii thut) pht hin lut kt hp Associator output hin th cc thng tin quan trng Run information. Cc ty chn i vi m hnh pht hin lut kt hp, tn ca tp d liu, s lng cc v d, cc thuc tnh Associator model (full training set). Biu din (dng text) ca tp cc lut kt hp pht hin c h tr ti thiu (minimum support) tin cy ti thiu (minimum confidence) Kch thc ca cc tp mc thng xuyn (large/frequent itemsets) Lit k cc lut kt hp tm c^ Hy xem giao din ca WEKA Explorer... xc nh nhng thuc tnh no l quan trng nht Trong WEKA, mt phng php la chn thuc tnh (attribute selection) bao gm 2 phn: Attribute Evaluator. xc nh mt phng php nh gi mc ph hp ca cc thuc tnhVd: correlation-based, wrapper, information gain, chi- squared,... Search Method. xc nh mt phng php (th t) xt cc thuc tnhVd: best-first, random, exhaustive, ranking,.^ Hy xem giao din ca WEKA Explorer... Hin th d liu rt cn thit trong thc tGip xc nh mc kh khn ca bi ton hc WEKA c th hin th Mi thuc tnh ring l (1-D visualization) Mt cp thuc tnh (2-D visualization) Cc gi tr (cc nhn) lp khc nhau s c hin th bng cc mu khc nhau Thanh trt Jitter h tr vic hin th r rng hn, khi c qu nhiu v d (im) tp trung xung quanh mt' r AI A^ Av tr trn biu Tnh nng phng to/thu nh (bng cch tng/gim gi trca PlotSize v PointSize)^Hy xem giao din ca WEKA Explorer...Khai Ph D Liu17

Khai Ph D Liu18

Khai Ph D Liu19