(1)sql server 2005 data mining 的企業data mining 的企業應用範疇與方法論
TRANSCRIPT
Data Mining ---SQL Server 2005 [email protected] WWW.CDMS.ORG.TW08/21/09 1
08/21/09
Technology Review 2002 1
2
(Data mining) (Biometrics) (Microphotonics) (Untangling code) (Microfluidics)3
08/21/09
The nontrivial process of identifying valid novel potentially useful, and ultimately understandable pattern in data Fayyad,1996
08/21/09
Fayyad et al. 1996
4
Data Mining
08/21/09
5
Data Mining Database Theory Artificial Intelligence
Data Warehousing
Statistics
Machine Learning
Data Mining08/21/09 6
Data Mining Customer Life CycleNot customer Churn Acquisition: Customer profiling Target marketing Segmentation Market basket analysis Cross-selling
Matured Customer
New Customer Maintenance (II): Customer Loyalty & Retention Life-time Value Profitability 08/21/09
Maintenance (I): Cross-selling Segmentation Risk Management 7
Data Mining Model Patterns Relations
08/21/09
8
Classification Regression Time Series Clustering Association Sequence
08/21/09
9
Data Mining Business KnowledgeSQL Server 2005
Reports ( & Ad hoc) Reports ( ) OLAP
Data Mining
Easy
Difficult
5 algorithms 08/21/09 12 viewer
BI Web & Office 10 :Microsoft Taiwan
:Microsoft Taiwan
Decision Trees
Clustering
Time Series
Sequence Clustering
Association
Nave Bayes
08/21/09 Logistic Regression
Neural Net
11 Linear Regression
DMX Define a model: CREATE MINING MODEL
Train a model: INSERT INTO dmmTraining Data
Data Mining Management System (DMMS)
Prediction using a model: SELECT FROM dmm PREDICTION JOIN Prediction Input Data08/21/09
Mining Model
:Microsoft Taiwan
12
Data Mining T-SQL JOIN SELECT FROM dmm PREDICTION JOIN ON WHERE
08/21/09
:Microsoft Taiwan
13
DMX
PredictProbability PredictTimeSeries PedictAssociation RedictSequence Cluster
PredictHistogram
TopCount TopSum 08/21/09 TopPercent
PMML
:Microsoft Taiwan
14
SQL 2005
08/21/09
:Microsoft Taiwan
15
DMX VBA/EXCEL , , , , ,
plug-in API
UDF (User-defined function) MDX OLAP 08/21/09 :Microsoft Taiwan
16
Text Mining 90% Text Mining SQL 2005 SSIS/ AS IBM Intelligent Miner for Text SAS Enterprise Miner for Text SPSS Clementine for Text :Microsoft Taiwan
08/21/09
17
Term Extract, Term Lookup
Fuzzy lookup, Fuzzy Grouping Error-Tolerant Index
08/21/09
:Microsoft Taiwan
18
Preproce ss Data Task Definition And GoalKnowledge Document Clustering/ Language Categorization Feature Extraction Document Repository Preprocessed Data Lexical Analysis Semantic Evaluation Semantic Analysis Text DatabaseKnowledge Based
Data
Extracting Gathering Cleansing
Selection
Mining
Visualization
Data
Transferring Organizing
Browsing
Data
Loading
Data 08/21/09 Database Tools 19
SQL Server 2005 Data Mining --
08/21/09
20
Data Mining
08/21/09
21
Data Mining
08/21/09 22
u E x tra c t T ra n s fo rm Load x / D a ta S o u rc e
z M IS
M e ta d a ta
T e m p la t e s
y z M e ta D a ta
08/21/09
R D a ta M in in g O LAP
D e c is io n M a k in g CRM M a r k e tin g C a m p a ig n23
Panel
ETL Data warehousing
Data Mining OLAP Statistics
CRM
08/21/09
24
08/21/09
25
08/21/09
26
Data mining Logistic Regression
08/21/09
27
08/21/09
28
(Cluster Analysis)
08/21/09
29
Logistic Regression 1 0 Logistic Curve S 0 1
08/21/09
30
08/21/09
31
(x) = E(y|x),
(x)
08/21/09
32
CHAID
08/21/09
33
08/21/09 34
SQL Server Management Studio
08/21/09
35
08/21/09
Training Data Testing Data SQL Server Integration Services Virual Studio
36
Data Mining
08/21/09
37
08/21/09
38
9 Data Mining
08/21/09
39
08/21/09
40
Data Mining
08/21/09
41
Data Mining
08/21/09
42
Cluster
Vabiable1
Vabiable1
1 2 3 4 5Vabiable2
Vabiable1 Vabiable3 Vabiable2 Vabiable3 Vabiable2
Vabiable1
08/21/09
Vabiable1
43
Cluster
Vabiable1
Vabiable1
Vabiable2
Vabiable1
Vabiable2
Vabiable3
Vabiable1=1 Vabiable2 Cluster Vabiable3 Vabiable1 Vabiable4
Vabiable2
Vabiable5
Vabiable1
Vabiable6
08/21/09
Vabiable1
44
=1, =2, =1
1=1 59.59% 1=0 40.40% =1 =2 08/21/09 =1
45
46
08/21/09
08/21/09
47
2 0-1 Miles 3 0-1 Miles 74%08/21/09 48
value1 value2 value3 value4 value5 missing
08/21/09
2 value1 0.4% value2 25.7% value3 13.6% value4 49 32% value5 28.3%
Variables
08/21/09
50
Age 47.7~95 2 100 24~47.7 1 100
08/21/09
51
0.5%
08/21/09
52
56.00% 54.00% 52.00% 50.00% 48.00% 46.00% 44.00% 42.00% 40.00% 45.89% 46.80% 54.12% 53.20%
08/21/09
53
25.00% 20.00%
15.00%
17.97 17.66
22.34 22.37 18.23 19.37
19.32 18.58
10.00%
11.84 11.53
10.30 10.49
5.00%
0.00%
20
20-29
30-39
40-49
50-59
60
08/21/09
54
45.00% 40.00% 35.00% 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% 31.58%32.64% 27.18%26.95% 41.24%40.41%
08/21/09
55
80.00%78.22% 77.78%
70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 9.52% 10.40% 10.30% 9.92% 1.96% 1.90%
08/21/09
56
30.00% 25.00% 20.00%18.28 25.40 26.64
15.00%12.82
17.66 13.69 14.35 13.41 7.73 2.32 2.58 13.42
10.00% 5.00% 0.00% 12.14 5.40
5.12 0.34 0.37
8.34
08/21/09
57
12.00%11.65% 11.42%
10.00%8.01%
8.00% 6.00% 4.00% 2.00%
8.38% 6.50% 6.86%
1.24% 1.29% 0.76%
1.03%
2.82% 2.86% 2.44%
2.17% 2.44%
2.76%
0.00% 08/21/09
58
8.00% 7.00% 6.00% 5.00% 4.00% 3.00% 2.00% 1.00% 0.00% 4.12% 2.77% 2.84% 3.51% 4.90% 5.79% 6.08% 6.37% 7.87% 7.99%
08/21/09
59
9.00% 8.00% 7.00% 6.00% 5.00% 4.00% 3.00% 2.00% 1.00% 0.00% 4.76% 4.27% 3.95% 4.28% 7.25% 6.36% 6.19% 5.62% 8.48% 7.98%
08/21/09
60
1.80% 1.60% 1.40% 1.20% 1.00% 0.80% 0.60% 0.40% 0.20% 0.00% 0.22% 0.20% 1.71% 1.56% 1.41% 1.29%
08/21/09
61
Kmeans F 08/21/09 111.801 94.828 122.101 161.522 164.152 5.918 3910.058 5.144 102.354 162.667 195.884 90.124 45.413 56.357 212.466 252.438 142.466 5.228 2550.775 1.928 47.872 17.568 31.362 562.990 70.408 71.916 187.742 238.219 181.745 3.848 1951.107 1.622 73.517 103.174 155.297 305.317 67.826 68.875 170.479 194.914 155.755 7.156 2825.233 6.138 65.956 56.036 83.205 237.374 61.356 78.983 179.111 160.409 137.811 4.949 1715.410 3.342 58.504 72.058 112.220 237.767 56.862 62.052 122.012 148.782 112.530 5.124 1484.845 7.986 47.268 74.160 128.002 309.723 62
Total08/21/09
Percen t 100.00 97.93 91.42 99.78 98.50 97.80
200 0 0 0 0 200
0 284 11 0 0 295
0 4 213 1 0 218
0 2 9 462 2 475
0 0 0 0 131 13163
08/21/09
64
105 52.50% 95 47.50% 200 179 61.72% 111 38.28% 290 133 57.08% 100 42.92% 233 231 49.89% 232 50.11% 463 95 71.43% 38 28.57% 133 743 576 1319
08/21/09 65
20 20-29 30-39 40-49 50-59 60 08/21/09
13 6.50% 80 40.00% 64 32.00% 38 19.00% 4 2.00% 1 0.50% 200
18 6.21% 85 29.31% 115 39.66% 52 17.93%
34 14.59% 77 33.05% 61 26.18% 42 18.03%
36 7.78% 142 30.67% 149 32.18% 109 23.54% 5.40%
9 6.77% 42 31.58% 44 33.08% 33 24.81%
110 426 433 274
16 2029 13 25 3 5.52% 5.58% 2.26%
61 15 66
4 6 2 2 3039 1.38% 2.58% 0.43% 1.50% 290 233 463 133
1319
20 10.00% 57 28.50% 123 61.50% 200 48 16.55% 122 42.07% 120 41.38% 290 50 21.46% 98 42.06% 85 36.48% 233 49 10.58% 186 40.17% 228 49.24% 463 13 9.77% 41 30.83% 79 59.40% 133 180 504 635 1319
08/21/09
67
35 17.50% 151 75.50% 13 6.50% 1 0.50% 200 24 8.28% 236 81.38% 30 10.34% 0 0.00% 290 12 5.15% 186 79.83% 35 15.02% 0 0.00% 233 44 9.50% 331 71.49% 79 17.06% 9 1.94% 463 22 16.54% 91 68.42% 15 11.28% 5 3.76% 133 1319 15 172 995 137
08/21/09 68
24 12.00% 14 7.00% 17 8.50% 64 32.00% 0 0.00% 51 25.50% 24 12.00% 15 5.17% 24 8.28% 18 6.21% 81 27.93% 0 0.00% 102 35.17% 32 11.03%
44 43 7 18.88% 9.29% 5.26% 22 53 2 9.44% 1.50% 11.45%16 41 17.60% 0 0.00% 75 32.19% 29 12.45% 62 10 38 28.57% 7 5.26% 37 27.82% 19 14.29% 6.87% 13.39% 110 23.76% 3 0.65% 77 16.63% 84 18.14% 7.52%
133 115 123 334 10 342 188
08/21/09
7 42 5 4 24 2 2.50% 2.41% 1.72% 5.18% 1.50% 1 11 2 7 11 32 0.50% 3.79% 0.86% 1.51% 8.27% 200 290 233 463 133 131969
86 43.00% 54 27.00% 59 29.50% 1 0.50% 200 130 44.83% 50 17.24% 79 33.91% 65 27.90% 209 45.14% 111 23.97% 70 52.63% 21 15.79% 301 574
08/21/09
78 137 38 419 107 36.90% 33.48% 29.59% 28.57% 3 11 6 4 25 1.03% 1.30% 3.01% 4.72% 290 233 463 133 1319 70
08/21/09
71
63 82
47 20~29 38 29
08/21/09
WAP 72
50 50
33 36 WAP 20 46 50~59 33 100 20~29 44 40~49 92
08/21/09
73
57 39 42 30 39 48 40~49 100 30~39 60
74
08/21/09
79 50 65 74 64 36 54 92 88 44 20 100
75
08/21/09
43 56 50 56
WAP 76
08/21/09
(ERP)
(CRM)
(ROI)
(EC)08/21/09 77
08/21/09
78
(Customer Profile Analysis) ( )
(Structure Analysis)
Customer Pyramid Migration)
(Pareto Analysis) 08/21/09 79
(Product Position Analysis)
08/21/09
80
( )
08/21/09
81
08/21/09
82
(Business Performance Analysis)
08/21/09
83
CRM /
08/21/09
84 3G ..
Data Mining- UsefulAnalysis Tool
08/21/09
85
= + Data Mining Find the MODEL
08/21/09
86
Data Mining Its New ! Its Hot !
What are you waiting for ?
87
08/21/09
Your issue. Our solution.08/21/09 88