data mining

15
1 Data Mining

Upload: kuri

Post on 24-Feb-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Data Mining. SAS Enterprise Miner. User : sasdemo1 , sasdemo2, … , sasdemo24 Password : aboi0rajee Server: asas2 Data: use your sgh login as Project name. Process diagram flow. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data  Mining

1

Data Mining

Page 2: Data  Mining

2

SAS Enterprise Miner

User: sasdemo1, sasdemo2, … , sasdemo24

Password: aboi0rajee

Server: asas2

Data:

use your sgh login as Project name

Page 3: Data  Mining

3

Process diagram flow

Businesthe data mining process is driven by a process flow diagram that you create by dragging nodes from a toolbar that is organized by SEMMA categories and dropping them onto a diagram workspace.

Page 4: Data  Mining

4

The SAS EM Grafical User Interface1. Toolbar shortcut buttons

2. Project Panel

3. Properties Panel

4. Property

Help Panel

7. Diagram

Navigation Tollbar

6. Diagram

Workspace

5. Toolbar

Page 5: Data  Mining

5

The SAS EM Grafical User Interface

Toolbar Shortcut Buttons to perform common computer functions and frequently used

SAS Enterprise Miner operations. Move the mouse pointer over any shortcut button to see the text name. Click on a shortcut button to use it.

Project Panel to manage and view data sources, diagrams, results, and

project users. Properties Panel

to view and edit the settings of data sources, diagrams, nodes, and users.

Property Help Panel The Property Help Panel displays a short description of any

property that you select in the Properties Panel. Extended help can be found from the Help main menu.

Page 6: Data  Mining

6

The SAS EM Grafical User Interface

Toolbar a graphic set of node icons that you use to build process flow

diagrams in the Diagram Workspace. Drag a node icon into the Diagram Workspace to use it. The icon remains in place in the Toolbar, and the node in the Diagram Workspace is ready to be connected and configured for use in the process flow diagram.

Diagram Workspace to build, edit, run, and save process flow diagrams. In this

workspace, you graphically build, order, sequence, and connect the nodes that you use to mine your data and generate reports.

Diagram Navigation Toolbar to organize and navigate the process flow diagram.

http://support.sas.com/documentation/onlinedoc/miner/

Page 7: Data  Mining

7

ROC Curves

0,00

0,10

0,20

0,30

0,40

0,50

0,60

0,70

0,80

0,90

1,00

0,00 0,10 0,20 0,30 0,40 0,50 0,60 0,70 0,80 0,90 1,00

Fn(X|Y=0)

Fn(X|Y=1)

F(S|Y=0) = 1- Specifity

F(S|

Y=1)

= S

ensiv

ity

(F(S|Y=0); F(S|Y=1))

Styczna do krzywej ROC

1

0

)( dxxfAUROC ROC

k

iiininin YXFYXFYXFYXFAUROC

0

1|1|0|0|21

Stanowi wizualizację „separacji” rozkładów warunkowych zmiennej: Można potraktować pole pod krzywą ROC jako miarę zależności stochastycznej

Page 8: Data  Mining

Classification error

Page 9: Data  Mining

9

ROC Curve

Page 10: Data  Mining

10

Classification Errors

Page 11: Data  Mining

11

Classification Errors

Page 12: Data  Mining

12

SAS/BASE & SAS/STAT

Page 13: Data  Mining

13

PROC step

libname data „path";

libname data "C:\Users\Andrzej\Desktop";

proc logistic data=data.German_credit desc;model default=duration credit_amt instalment age /outroc=roc;run;

proc gplot data=roc;Title "ROC Curve";symbol i=join;plot _sensit_ * _1mspec_;run;

Page 14: Data  Mining

14

Dimension Reduction – PROC VARCLUS

The VARCLUS procedure divides a set of numeric variables into disjoint or hierarchical clusters. Associated with each cluster is a linear combination of the variables in the cluster. This linear combination can be either the first principal component (the default) or the centroid component (if you specify the CENTROID option).

proc varclus data=data.German_credit outtree=Tree maxclusters=10 noprint; var duration credit_amt instalment age;run;

proc tree data=tree;proc tree data=tree lineprinter;axis1 order=(0 to 1 by 0.2);proc tree data=Tree horizontal haxis=axis1; height _PROPOR_;run;

Page 15: Data  Mining

15

_NAME_ the name of the cluster _PARENT_ the parent of the cluster _NCL_ the number of clusters _VAREXP_ the amount of variance explained by the

cluster _PROPOR_ the proportion of variance explained by the

clusters at the current level of the tree diagram _MINPRO_ the minimum proportion of variance explained

by a cluster _MAXEIGEN_ the maximum second eigenvalue of a

cluster