(1)sql server 2005 data mining 的企業data mining 的企業應用範疇與方法論

88
111/06/11 1 Data Mining 的的的的的的的的的的的 ---SQL Server 2005 的的的 的的的的的的的的的的 的的 的的的的的的的的 的的的 [email protected]. tw WWW.CDMS.ORG.TW

Upload: philipzheng

Post on 27-Jul-2015

337 views

Category:

Documents


1 download

TRANSCRIPT

Data Mining ---SQL Server 2005 [email protected] WWW.CDMS.ORG.TW08/21/09 1

08/21/09

Technology Review 2002 1

2

(Data mining) (Biometrics) (Microphotonics) (Untangling code) (Microfluidics)3

08/21/09

The nontrivial process of identifying valid novel potentially useful, and ultimately understandable pattern in data Fayyad,1996

08/21/09

Fayyad et al. 1996

4

Data Mining

08/21/09

5

Data Mining Database Theory Artificial Intelligence

Data Warehousing

Statistics

Machine Learning

Data Mining08/21/09 6

Data Mining Customer Life CycleNot customer Churn Acquisition: Customer profiling Target marketing Segmentation Market basket analysis Cross-selling

Matured Customer

New Customer Maintenance (II): Customer Loyalty & Retention Life-time Value Profitability 08/21/09

Maintenance (I): Cross-selling Segmentation Risk Management 7

Data Mining Model Patterns Relations

08/21/09

8

Classification Regression Time Series Clustering Association Sequence

08/21/09

9

Data Mining Business KnowledgeSQL Server 2005

Reports ( & Ad hoc) Reports ( ) OLAP

Data Mining

Easy

Difficult

5 algorithms 08/21/09 12 viewer

BI Web & Office 10 :Microsoft Taiwan

:Microsoft Taiwan

Decision Trees

Clustering

Time Series

Sequence Clustering

Association

Nave Bayes

08/21/09 Logistic Regression

Neural Net

11 Linear Regression

DMX Define a model: CREATE MINING MODEL

Train a model: INSERT INTO dmmTraining Data

Data Mining Management System (DMMS)

Prediction using a model: SELECT FROM dmm PREDICTION JOIN Prediction Input Data08/21/09

Mining Model

:Microsoft Taiwan

12

Data Mining T-SQL JOIN SELECT FROM dmm PREDICTION JOIN ON WHERE

08/21/09

:Microsoft Taiwan

13

DMX

PredictProbability PredictTimeSeries PedictAssociation RedictSequence Cluster

PredictHistogram

TopCount TopSum 08/21/09 TopPercent

PMML

:Microsoft Taiwan

14

SQL 2005

08/21/09

:Microsoft Taiwan

15

DMX VBA/EXCEL , , , , ,

plug-in API

UDF (User-defined function) MDX OLAP 08/21/09 :Microsoft Taiwan

16

Text Mining 90% Text Mining SQL 2005 SSIS/ AS IBM Intelligent Miner for Text SAS Enterprise Miner for Text SPSS Clementine for Text :Microsoft Taiwan

08/21/09

17

Term Extract, Term Lookup

Fuzzy lookup, Fuzzy Grouping Error-Tolerant Index

08/21/09

:Microsoft Taiwan

18

Preproce ss Data Task Definition And GoalKnowledge Document Clustering/ Language Categorization Feature Extraction Document Repository Preprocessed Data Lexical Analysis Semantic Evaluation Semantic Analysis Text DatabaseKnowledge Based

Data

Extracting Gathering Cleansing

Selection

Mining

Visualization

Data

Transferring Organizing

Browsing

Data

Loading

Data 08/21/09 Database Tools 19

SQL Server 2005 Data Mining --

08/21/09

20

Data Mining

08/21/09

21

Data Mining

08/21/09 22

u E x tra c t T ra n s fo rm Load x / D a ta S o u rc e

z M IS

M e ta d a ta

T e m p la t e s

y z M e ta D a ta

08/21/09

R D a ta M in in g O LAP

D e c is io n M a k in g CRM M a r k e tin g C a m p a ig n23

Panel

ETL Data warehousing

Data Mining OLAP Statistics

CRM

08/21/09

24

08/21/09

25

08/21/09

26

Data mining Logistic Regression

08/21/09

27

08/21/09

28

(Cluster Analysis)

08/21/09

29

Logistic Regression 1 0 Logistic Curve S 0 1

08/21/09

30

08/21/09

31

(x) = E(y|x),

(x)

08/21/09

32

CHAID

08/21/09

33

08/21/09 34

SQL Server Management Studio

08/21/09

35

08/21/09

Training Data Testing Data SQL Server Integration Services Virual Studio

36

Data Mining

08/21/09

37

08/21/09

38

9 Data Mining

08/21/09

39

08/21/09

40

Data Mining

08/21/09

41

Data Mining

08/21/09

42

Cluster

Vabiable1

Vabiable1

1 2 3 4 5Vabiable2

Vabiable1 Vabiable3 Vabiable2 Vabiable3 Vabiable2

Vabiable1

08/21/09

Vabiable1

43

Cluster

Vabiable1

Vabiable1

Vabiable2

Vabiable1

Vabiable2

Vabiable3

Vabiable1=1 Vabiable2 Cluster Vabiable3 Vabiable1 Vabiable4

Vabiable2

Vabiable5

Vabiable1

Vabiable6

08/21/09

Vabiable1

44

=1, =2, =1

1=1 59.59% 1=0 40.40% =1 =2 08/21/09 =1

45

46

08/21/09

08/21/09

47

2 0-1 Miles 3 0-1 Miles 74%08/21/09 48

value1 value2 value3 value4 value5 missing

08/21/09

2 value1 0.4% value2 25.7% value3 13.6% value4 49 32% value5 28.3%

Variables

08/21/09

50

Age 47.7~95 2 100 24~47.7 1 100

08/21/09

51

0.5%

08/21/09

52

56.00% 54.00% 52.00% 50.00% 48.00% 46.00% 44.00% 42.00% 40.00% 45.89% 46.80% 54.12% 53.20%

08/21/09

53

25.00% 20.00%

15.00%

17.97 17.66

22.34 22.37 18.23 19.37

19.32 18.58

10.00%

11.84 11.53

10.30 10.49

5.00%

0.00%

20

20-29

30-39

40-49

50-59

60

08/21/09

54

45.00% 40.00% 35.00% 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% 31.58%32.64% 27.18%26.95% 41.24%40.41%

08/21/09

55

80.00%78.22% 77.78%

70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 9.52% 10.40% 10.30% 9.92% 1.96% 1.90%

08/21/09

56

30.00% 25.00% 20.00%18.28 25.40 26.64

15.00%12.82

17.66 13.69 14.35 13.41 7.73 2.32 2.58 13.42

10.00% 5.00% 0.00% 12.14 5.40

5.12 0.34 0.37

8.34

08/21/09

57

12.00%11.65% 11.42%

10.00%8.01%

8.00% 6.00% 4.00% 2.00%

8.38% 6.50% 6.86%

1.24% 1.29% 0.76%

1.03%

2.82% 2.86% 2.44%

2.17% 2.44%

2.76%

0.00% 08/21/09

58

8.00% 7.00% 6.00% 5.00% 4.00% 3.00% 2.00% 1.00% 0.00% 4.12% 2.77% 2.84% 3.51% 4.90% 5.79% 6.08% 6.37% 7.87% 7.99%

08/21/09

59

9.00% 8.00% 7.00% 6.00% 5.00% 4.00% 3.00% 2.00% 1.00% 0.00% 4.76% 4.27% 3.95% 4.28% 7.25% 6.36% 6.19% 5.62% 8.48% 7.98%

08/21/09

60

1.80% 1.60% 1.40% 1.20% 1.00% 0.80% 0.60% 0.40% 0.20% 0.00% 0.22% 0.20% 1.71% 1.56% 1.41% 1.29%

08/21/09

61

Kmeans F 08/21/09 111.801 94.828 122.101 161.522 164.152 5.918 3910.058 5.144 102.354 162.667 195.884 90.124 45.413 56.357 212.466 252.438 142.466 5.228 2550.775 1.928 47.872 17.568 31.362 562.990 70.408 71.916 187.742 238.219 181.745 3.848 1951.107 1.622 73.517 103.174 155.297 305.317 67.826 68.875 170.479 194.914 155.755 7.156 2825.233 6.138 65.956 56.036 83.205 237.374 61.356 78.983 179.111 160.409 137.811 4.949 1715.410 3.342 58.504 72.058 112.220 237.767 56.862 62.052 122.012 148.782 112.530 5.124 1484.845 7.986 47.268 74.160 128.002 309.723 62

Total08/21/09

Percen t 100.00 97.93 91.42 99.78 98.50 97.80

200 0 0 0 0 200

0 284 11 0 0 295

0 4 213 1 0 218

0 2 9 462 2 475

0 0 0 0 131 13163

08/21/09

64

105 52.50% 95 47.50% 200 179 61.72% 111 38.28% 290 133 57.08% 100 42.92% 233 231 49.89% 232 50.11% 463 95 71.43% 38 28.57% 133 743 576 1319

08/21/09 65

20 20-29 30-39 40-49 50-59 60 08/21/09

13 6.50% 80 40.00% 64 32.00% 38 19.00% 4 2.00% 1 0.50% 200

18 6.21% 85 29.31% 115 39.66% 52 17.93%

34 14.59% 77 33.05% 61 26.18% 42 18.03%

36 7.78% 142 30.67% 149 32.18% 109 23.54% 5.40%

9 6.77% 42 31.58% 44 33.08% 33 24.81%

110 426 433 274

16 2029 13 25 3 5.52% 5.58% 2.26%

61 15 66

4 6 2 2 3039 1.38% 2.58% 0.43% 1.50% 290 233 463 133

1319

20 10.00% 57 28.50% 123 61.50% 200 48 16.55% 122 42.07% 120 41.38% 290 50 21.46% 98 42.06% 85 36.48% 233 49 10.58% 186 40.17% 228 49.24% 463 13 9.77% 41 30.83% 79 59.40% 133 180 504 635 1319

08/21/09

67

35 17.50% 151 75.50% 13 6.50% 1 0.50% 200 24 8.28% 236 81.38% 30 10.34% 0 0.00% 290 12 5.15% 186 79.83% 35 15.02% 0 0.00% 233 44 9.50% 331 71.49% 79 17.06% 9 1.94% 463 22 16.54% 91 68.42% 15 11.28% 5 3.76% 133 1319 15 172 995 137

08/21/09 68

24 12.00% 14 7.00% 17 8.50% 64 32.00% 0 0.00% 51 25.50% 24 12.00% 15 5.17% 24 8.28% 18 6.21% 81 27.93% 0 0.00% 102 35.17% 32 11.03%

44 43 7 18.88% 9.29% 5.26% 22 53 2 9.44% 1.50% 11.45%16 41 17.60% 0 0.00% 75 32.19% 29 12.45% 62 10 38 28.57% 7 5.26% 37 27.82% 19 14.29% 6.87% 13.39% 110 23.76% 3 0.65% 77 16.63% 84 18.14% 7.52%

133 115 123 334 10 342 188

08/21/09

7 42 5 4 24 2 2.50% 2.41% 1.72% 5.18% 1.50% 1 11 2 7 11 32 0.50% 3.79% 0.86% 1.51% 8.27% 200 290 233 463 133 131969

86 43.00% 54 27.00% 59 29.50% 1 0.50% 200 130 44.83% 50 17.24% 79 33.91% 65 27.90% 209 45.14% 111 23.97% 70 52.63% 21 15.79% 301 574

08/21/09

78 137 38 419 107 36.90% 33.48% 29.59% 28.57% 3 11 6 4 25 1.03% 1.30% 3.01% 4.72% 290 233 463 133 1319 70

08/21/09

71

63 82

47 20~29 38 29

08/21/09

WAP 72

50 50

33 36 WAP 20 46 50~59 33 100 20~29 44 40~49 92

08/21/09

73

57 39 42 30 39 48 40~49 100 30~39 60

74

08/21/09

79 50 65 74 64 36 54 92 88 44 20 100

75

08/21/09

43 56 50 56

WAP 76

08/21/09

(ERP)

(CRM)

(ROI)

(EC)08/21/09 77

08/21/09

78

(Customer Profile Analysis) ( )

(Structure Analysis)

Customer Pyramid Migration)

(Pareto Analysis) 08/21/09 79

(Product Position Analysis)

08/21/09

80

( )

08/21/09

81

08/21/09

82

(Business Performance Analysis)

08/21/09

83

CRM /

08/21/09

84 3G ..

Data Mining- UsefulAnalysis Tool

08/21/09

85

= + Data Mining Find the MODEL

08/21/09

86

Data Mining Its New ! Its Hot !

What are you waiting for ?

87

08/21/09

Your issue. Our solution.08/21/09 88