data mining techniques for crm group 1 組員 9534608 謝岱高 9634524 廖鎔熠 9634532 黃雅莉...
TRANSCRIPT
Data Mining Data Mining Techniques for CRMTechniques for CRM
Group 1組員
9534608 謝岱高9634524 廖鎔熠9634532 黃雅莉9634543 郭奇龍
Data Mining in CRMData Mining in CRM ... ...
“ ...through data mining – the extraction of hidden predictive information from large databases – organizations can identify valuable customers, predict future behaviors, and enable firms to make proactive, knowledge-driven decisions.”
AgendaAgenda
Introduction, Definition: Paul
The Evolution & Apps. of Data Mining: Eneida
Internal Considerations & Data mining techniques:
Ximena
Data mining and CRM – relationship & customer
privacy:
Lester
Case Studies (Neural Networks, CHAID): JPG
CHAID vs neural nets; Conclusions: Edith
IntroductionIntroduction
Product-oriented view VS. Customer-oriented view
Design-build-sell VS. sell-build-redesign One-on-one marketing VS. mass marketing Goal of revolution: Establish a long term relationship with each customer
The advent of the Internet and technological tools accelerate modern CRM revolution CRM is important for B2C or C2B, and even more crucial in B2B environments
Why Data Mining?Why Data Mining?
Between businesses and customers… Collecting customer demographics and behavior data makes precision targeting possible Helps to devise an effective promotion plan when new products developed Creates and solidifies close customer relationships
Between businesses… Helps to smooth transactions, communications and collaboration Simplifies and improves logistics and procurement process
What is Data Mining?What is Data Mining?
“…a sophisticated data search capability that uses statistical algorithms to discover patterns and correlations in data.” “…another way to find meaning in data.” Data mining is part of a larger process called knowledge discovery
What Data Mining is What Data Mining is ~NOT~~NOT~
Data mining software does not Data mining software does not eliminate the need to know the eliminate the need to know the business, understand the data, business, understand the data, or be aware of general or be aware of general statistical methods. statistical methods.
DM does not find patterns or DM does not find patterns or knowledge without verificationknowledge without verification
DM helps to generate DM helps to generate hypotheses, but it does not hypotheses, but it does not validate the hypothesesvalidate the hypotheses
Evolutionary Stages of Data Evolutionary Stages of Data MiningMining
(1960’s)
•Retrospective,static data delivery
•Summations or averages
•Computers, tapes, disks
•IBM, CDC
Data Collection
Data Access
Data Navigation
Data Mining
(1980’s)
•Retrospective,dynamic data delivery at record level
•Branch sales at specific period of time
•RDBMS, SQL, ODBC
•Oracle, Sybase, Informix, IBM, Microsoft
(1990’s)
•Retrospective,dynamic data delivery at multiple level
•Global view or drill down
•OLAP, multidimensional databases, data warehouses
•Pilot, IRI, Arbor, Redbrick
(2000’s)
•Retrospective,Proactive information delivery
•Online analytic tools, feedback and information exchange
•Adv. Algorithms, multiprocessor, computers, massive databases
•Lockheed, IBM, SGI
Breakdown of Data Mining Breakdown of Data Mining from a Process Orientationfrom a Process Orientation
Data Mining
Discovery Predictive Modeling
ForensicAnalysis
•Conditional Logic
•Affinities and Associations
•Trends and Variations
•Outcome Prediction
•Forecasting
•Deviation Detection
•Link Analysis
Applications of Data Applications of Data MiningMining
RetailRetail BankingBanking TelecommunicationsTelecommunications
1. Performing basket analysis
2. Sales forecasting
3. Database marketing
4. Merchandise planning and allocation
1. Card marketing
2. Cardholder pricing and profitability
3. Fraud detection
4. Predictive life-cycle management
1. Call detail record analysis
2. Customer loyalty
OTHER APPLICATIONSOTHER APPLICATIONSCustomer
Segmentation
Manufacturing
Warranties
Frequent flierincentives
Discrete segments by
adding variables Customize Products.
Predict features
No. clients who will ask for claims
Identify groups who can receive
incentives
INTERNAL INTERNAL CONSIDERATIONSCONSIDERATIONS
Skillsets and technologies must be available to integrate themSkillsets and technologies must be available to integrate them
Data mining Decision-making process
Knowledgegained
through DM
Sell to and service customersSell to and service customers Manage inventoryManage inventory Supervise employees Supervise employees Work to correct and prevent lossWork to correct and prevent loss
-An algorithm for scoring
-A score for particular customer, employee
-An action associated with a customer, employee or transaction
DATA MINING TECHNIQUESDATA MINING TECHNIQUES
They are applied to tasks of predictive They are applied to tasks of predictive modeling and forensic analysismodeling and forensic analysis
DMApproaches
Data Retained
Data distilled
NearestNeighbor
Case-BasedReasoning
Logical
CrossTabulational
Equational
Numeric and Non-numeric
NumericData
Non-numericData
They extract patterns and then use for various They extract patterns and then use for various purposespurposes
Pros and cons to data mining Pros and cons to data mining approachesapproaches
Approach Pros Cons
Logical
Cross-tabulation
Equation
Work well with multidimensional and OLAP dataAble to deal with numeric and nonnumeric data in a uniform manner
Simple to use with small number of nonnumeric values
Work well on large sets of data
Work well with complex multidimensional models
Ability to approximate smooth surfaces
Unable to work with smooth surfaces that typically occur in nature
Not scalable
Ability to handle numeric valuesAbility to handle conjunctionsRequire all data to be numeric System can quickly become a black box
CUSTOMER RELATION CUSTOMER RELATION MANAGEMENTMANAGEMENT
KnowKnow TargetTarget SellSell ServiceService
Definition
CRM: Development of the offer
3 Which’s
2 Stage Concept
1 - From product to customer orientation- Market Strategy from outside-in
2 -Push the development of customer orientation-Innovating value proposition
Components of CRMComponents of CRM
Customer Information Customer
Data
Internal Customer
Data
Outside Source Data
•Billing Records
•Surveys
•Web logs, Credit Card recordsData
Warehouse
•External data sources
Current Address, Web page viewing profiles.
Historical Data
Analyze the Data
Data Mining Techniques + Customer Oriented
Campaign Execution &
Tracking
Interactions between MKT, information, Tech and sales channels
Data Mining & CRMData Mining & CRM
The RelationshipThe Relationship Customer Life CycleCustomer Life Cycle
ProspectsProspects RespondentsRespondents Active CustomersActive Customers Former CustomersFormer Customers
Inputs
What information is available
Data Mining Output
What is likely to be interested
Case StudiesCase StudiesNeural Networks vs. CHAIDNeural Networks vs. CHAID
Case #1Case #1Neural NetworksNeural Networks
Neural NetworksNeural Networks
The exact way in The exact way in which the brain which the brain enables thought enables thought is one of the is one of the great mysteries great mysteries of scienceof science
NeuronsNeurons
NeoVistas Solutions’ DecisioNeoVistas Solutions’ Decision Seriesn Series
For retail, insurance, For retail, insurance, telecommunications, and healthcare. telecommunications, and healthcare.
Includes discovery tools based on Includes discovery tools based on neural networks, clustering, genetic neural networks, clustering, genetic algorithms, and association rulesalgorithms, and association rules
The problemThe problem
Large retailerLarge retailer Over $1 billion in salesOver $1 billion in sales Overstocked on slow-moving Overstocked on slow-moving
products products Under-stocked on most popular items Under-stocked on most popular items
at critical selling periods.at critical selling periods.
SolutionSolution
With Clustering and Neural With Clustering and Neural Network:Network: Review point-of-sale history and Review point-of-sale history and
equate store groupings to sales equate store groupings to sales patterns.patterns.
Forecast stocking requirements on Forecast stocking requirements on a store-by-store basis.a store-by-store basis.
ResultsResults
Management is able to forecast Management is able to forecast seasonal trends at the store-seasonal trends at the store-item level. item level.
The Decision Series tools The Decision Series tools showed that clustering similar showed that clustering similar items into actionable groups items into actionable groups streamlined the ordering streamlined the ordering process. process.
Revenues increased by 11.6%Revenues increased by 11.6%
Case #2Case #2CHAIDCHAID
Applied MetrixApplied Metrix Uses a combination of CHAID Uses a combination of CHAID
segmentation and logistic segmentation and logistic regression response probability regression response probability modeling to establish predictive modeling to establish predictive models that are deployed over a models that are deployed over a proprietary Internet systemproprietary Internet system
The problemThe problem
Home equity marketer that Home equity marketer that extended home equity lines of extended home equity lines of credit at the national level. credit at the national level.
The client’s goal was to increase The client’s goal was to increase the efficiency of targeting the efficiency of targeting current mortgage customers current mortgage customers who might be interested in the who might be interested in the client’s service.client’s service.
The SolutionThe Solution
CHAID identified CHAID identified 16 distinct 16 distinct market market segments. segments.
In particular, In particular, one particular one particular segment segment accounted for accounted for 65% of 65% of responses to the responses to the mailing.mailing.
ResultsResults
The highest-rated group from the The highest-rated group from the predictive model had by far the predictive model had by far the highest response rate to the equity highest response rate to the equity line of credit campaign—85% above line of credit campaign—85% above average for the direct mailing, average for the direct mailing,
The goal of the program was a 10% The goal of the program was a 10% increase in response rate, but the increase in response rate, but the actual response rate increased 30%. actual response rate increased 30%.
The firm was able to increase profits The firm was able to increase profits by over one million dollars in the first by over one million dollars in the first year after implementation.year after implementation.
Case #3Case #3
PNY Technologies, IncPNY Technologies, Inc Oct.Oct. 20072007
PNY - New JerseyPNY - New Jersey
PNY – TaiwanPNY – Taiwan
PNY –UKPNY –UK
PNY –FrancePNY –France
PNY –GermanyPNY –Germany
Manufacturing & Sales Manufacturing & Sales
Sales OfficeSales Office
PNY Locations
PNY - CaliforniaPNY - California
*All US product ships from NJ location
PNY –MiamiPNY –Miami
PNY – ItalyPNY – Italy
PNY – NorwayPNY – Norway
PNY –SpainPNY –Spain
PNY – BeneluxPNY – Benelux
PNY – ChinaPNY – China
13 Locations worldwide.PNY Products are sold in over 50 countries482 Employees Worldwide
PNY Product Mix Shift
53%
Flash
Flash = Flash Cards & Drives, Mobile
Memory = Consumer & OEM
Graphics = Consumer & Professional
Consolidated Revenue by Channel
0%
20%
40%
60%
80%
100%
2001 2002 2003 2004 2005 2006 2007E
OEMChannel
Current U.S. Channels of Distribution
Distribution Mail Order/E-Commerce Major Retail Regional Retail System Integration VAR's
2001 2002 2003 2004 2005 2006 2007E
+23.8%
+6.7%+8.7%
+17.3%
+23.0%
Revenue Growth
+21.8%
US - 2006 Market Share:
1,444
2,434
1,831
2,832
1,612 1,623
304 259
0
1,500
3,000
Flash Drives Flash Cards* Memory Graphics
2004 2005
7,280
11,44112,316
18,246
5,448 5,294
1,963 1,889
0
10,000
20,000
Flash Drives Flash Cards* Memory Graphics
2004 2005
2004 vs. 2005 Units(in thousands)
2004 vs. 2005 Units(in thousands)
PNY INDUSTRY TOTAL
+69%+55%
+1%
-15%
+57%+48%
-3%
-4%
US INDUSTRY OVERVIEW BY US INDUSTRY OVERVIEW BY CATEGORY - UNITSCATEGORY - UNITS
Sandisk
29.1%
Other
24.6%
Memorex
15.5%
Dane Elec
8.8%
PNY
12.2%
Sony
9.8%
Other
13.0%
PNY
19.7%
Sandisk
47.8%
Dane Elec
6.9%
Kingston
8.0%
Lexar
4.7%
E vga
21.9%
AT I
6.7%XFX
9.8%
Bf g
10.4%
Other
33.6%
P NY
17.6%
Kingston33.4%
PNY20.3%
K-Byte4.3% Corsair
11.3%
Crucial4.2%
Other22.6%
Centon4.0%
Market Share – Month of AugustUSB Unit Share - Aug SD Unit Share - Aug
VGA Unit Share - AugPC Memory Unit Share - Aug
#2
#2
#2
#3
Sandisk
29.2%
Memorex
16.8%
Kingston
6.2%
Other
26.8%
Sony
5.9%
PNY
15.1%
Flash Drive Overview – YTD Aug 2007
Observations
•PNY holds the #3 share position YTD
•1GB represents the largest segment within the
category with 40% of the unit sell-thru
•2GB represents 31% of YTD sell-thru
USB Flash Drive Capacity Trend
0%
20%
40%
60%
80%
100%
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
256MB 512MB 1GB 2GB 4GB 8GB
Rank Brand Model Description Units
1 SANDISK 2GB MICRO CRUZER USB 2.0 FLASH DRIVE 1,588,9252 SANDISK 1GB MICRO CRUZER USB 2.0 FLASH DRIVE 1,303,9063 SANDISK 4GB MICRO CRUZER USB 2.0 FLASH DRIVE 709,9364 PNY 1GB ATTACHE FLASH DRIVE USB 2.0 632,0665 PNY 2GB ATTACHE USB 2.0 FLASH DRIVE 549,7326 PNY 1GB ATTACHE USB 2.0 FLASH DRIVE 3-PK 180,892 x 37 MEMOREX 1GB TRAVELDRIVE FLASH DRIVE USB 2.0 460,6218 KINGSTON 1GB DATA TRAVELER USB 2.0 FLASH DRIVE 355,1359 MEMOREX 32MB USB 2.0 FLASH MEMORY DRIVE 341,189
10 DANE ELEC 1GB USB 2.0 FLASH DRIVE 321,531
TOP SELLING SKUs
USB Unit Share – YTD Aug 2007
Secure Digital Overview – YTD Aug 2007
PNY
21.8%
Other
13.5%Sandisk
45.5%Lexar
4.6%
Dane Elec
7.7%
Kingston
6.9%
Observations
•PNY holds the #2 market share YTD at 21.8%
•Secure Digital accounts for 55% of Flash Card
sell through YTD
•1GB is the highest selling capacity at 41%
followed by 2GB at 38%
Rank Brand Description Units
1 SANDISK 2GB SECURE DIGITAL MEMORY CARD 1,869,3372 SANDISK 1GB SECURE DIGITAL CARD 1,659,4943 PNY 2GB SECURE DIGITAL FLASH CARD 992,1374 PNY 1GB SECURE DIGITAL FLASH CARD 818,2085 SANDISK 512MB SECURE DIGITAL CARD 492,1876 SANDISK 2GB SECURE DIGITAL ULTRA II CARD 485,7297 SANDISK 1GB SECURE DIGITAL ULTRA II CARD 398,8588 KINGSTON 1GB SECURE DIGITAL FLASH CARD 372,7969 DANE ELEC 1GB SECURE DIGITAL CARD 338,15310 PNY 1GB SECURE DIGITAL FLASH CARD 3-PK 96,100 x 3
Top 10 Selling SKUs
SD Unit Share – YTD Aug 2007
SD Capacity Trend
0%
20%
40%
60%
80%
100%
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
256MB 512MB 1GB 2GB 4GB 8GB
Kingston
30.9%
PNY
20.2%
Corsair
10.0%
Centon
6.9%
K-Byte
4.0%
Other
28.0%
Observations
•PNY holds the #2 Market Share in Memory
•7 of the top 10 selling SKUs in the industry are
DDR
•Notebook Memory accounts for 25% of Memory
sell-thru YTD
Rank Brand Model Description Units
1 Kingston 512MB PC3200 DDR SDRAM DIMM Kit 222,4932 PNY 1GB Optima PC3200 DDR SDRAM DIMM 209,2733 PNY 512MB PC3200 DDR SDRAM DIMM 178,8674 PNY 1GB PC-5300 DDR2 667MHz SODIMM 151,5795 Kingston 512MB PC2700 DDR SDRAM DIMM 109,3276 Centon 1GB 2@512MB PC3200 DDR SDRAM 86,9457 PNY 256MB SDRAM 168Pin PC100 DIMM 86,0718 PNY 1GB PC2700 DDR SODIMM 85,0259 PNY 512MB PC2700 DDR SODIMM 82,66610 Kingston 1GB PC2-4200 533MHz DDR2 SODIMM 80,041
TOP 10 SKUsMemory Capacity Trend
0%
20%
40%
60%
80%
100%
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
128MB 256MB 512MB 1GB >2GB
Memory Overview – YTD Aug 2007
Memory Unit Share – YTD Aug 2007
Evga
19.2%
PNY
17.5%
Other
30.6%
Visiontek
5.7%Bfg
10.4%ATI
9.1%
XFX
7.5%
VGA Overview – YTD Aug 2007
Observations
•PNY holds the #2 overall share in the Consumer
Graphics category YTD
•PNY has 5 of the top 10 selling SKUs in the
industry
•512MB represents 18% of the sell-thru YTD
VGA Capacity Trend
0%
20%
40%
60%
80%
100%
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
64MB 128MB 256MB 320MB 512MB 640MB 768MB 1GB
Rank Brand Item Description Units
1 PNY 256MB DDR Verto GeForce FX 5200 PCI 75,2092 PNY 256MB DDR Verto GeForce FX 5200 8XAGP 43,6983 PNY 512MB GDDR2 GeForce 7600 GS AGP 37,4024 Evga 640MB GDDR3 e-GeForce 8800 GTS PCI-Exp16 28,2295 PNY 512MB GDDR2 GeForce 7600GS PCI-ExpX16 27,2346 ATI 512MB GDDR2 Radeon X1650 PRO PCIE-x16 25,7037 Evga 256MB GDDR3 nVIDIA GeForce 7600 GT PCIE 23,2738 ATI 512MB GDDR2 Radeon X1650 PRO 8XAGP 21,2589 PNY 256MB DDR2 Verto GeForce 7300 GT PCIE 20,625
10 ATI 256MB DDR Radeon 9550 8XAGP 19,623
Top 10 SKUs
Graphics Unit Share – YTD Aug 2007
CHAID v.s Neural NetsCHAID v.s Neural Nets CHCHisquard isquard AAutomatic utomatic IInteraction nteraction DDetector/etector/DD
etectionetection Clarity and explicabilityClarity and explicability Implementation/IntegrationImplementation/Integration Data requirementsData requirements Accuracy of modelAccuracy of model Construction of modelConstruction of model CostCost ApplicationApplication
Clarity and ExplicabilityClarity and Explicability CHAIDCHAID 較易理解的 較易理解的 Neural NetsNeural Nets 模糊的模糊的 Easy to explain to a domain expert Easy to explain to a domain expert
or business useror business user CHAID wins!!!CHAID wins!!!
Implementation/IntegrationImplementation/Integration
實行困難度實行困難度:: CHAID < Neural NetsCHAID < Neural Nets The risk of missing code by an IT deThe risk of missing code by an IT de
partmentpartment :: CHAID < Neural NetsCHAID < Neural Nets PerformancePerformance :: CHAID > Neural NeCHAID > Neural Ne
ts(significantly faster)ts(significantly faster) CHAID wins!!!CHAID wins!!!
Data RequirementsData Requirements CHAID : more data must be provideCHAID : more data must be provide
d d 資料皆須進行前置作業資料皆須進行前置作業 Neural Nets : binary formatNeural Nets : binary format CHAID : continuous independent vCHAID : continuous independent v
ariables bust be bandedariables bust be banded
Accuracy of ModelAccuracy of Model Neural Nets provide Neural Nets provide more accuratemore accurate
(powerful & predictive) models (powerful & predictive) models ccomplex problemsomplex problems
Have risksHave risks Neural Nets wins!!!Neural Nets wins!!!
Construction of ModelConstruction of Model CHAID CHAID easier and quicker to con easier and quicker to con
structstruct Neural Nets Neural Nets many parameters th many parameters th
at need to be setat need to be set 很難應用很難應用 v.s v.s 易於偵測錯誤易於偵測錯誤 CHAID wins!!!CHAID wins!!!
CostsCosts High cost(Neural Nets)High cost(Neural Nets) TimeTime & & High level of building skillsHigh level of building skills CHAID wins!!!CHAID wins!!!
ApplicationsApplications
顧客忠誠度、購買傾向、顧客終身價值顧客忠誠度、購買傾向、顧客終身價值 Neural Nets > CHAID(both direct and undirNeural Nets > CHAID(both direct and undir
ected ways)ected ways) Continuous independent variables v.s CatContinuous independent variables v.s Cat
egorical with high cardinality(performancegorical with high cardinality(performance)e)
Classification problems v.s Estimation proClassification problems v.s Estimation problems blems
Easier to build and implement and less coEasier to build and implement and less costly(CHAID)stly(CHAID)
THANKTHANK
YOU!!!YOU!!!