jan hřivňák - co vám čísla nepoví, ale jejich analýza napoví
TRANSCRIPT
Extend your BI solution
with Preditive Analytics
for better decisions
Jan Hrivnak
Consultant & Analyst
Customers Stories
„These guys know what they’re doing. Thanks their targett approach our open and conversion rates icreased over 100% which led to 45% campaign revenue growth“ Dalia Lasaite, CG Trader
“They helped us with our Facebook acquisition campaigns, identyfying VIP customers anfd using them for lookalike audience creation. The result was impresive – we generated over four times more campaign revenue and our ROI was almost 50% higher” Martim Chamrad, Craneballs
“A predictive model driven recommendation engine was super-easy to implement, and we found that the recommendations are 85% more likely to lead to sales.” Chirag Nirmal, Bow & Drape
“Behavioral segmentation gave us a fresh look our current clients with another perspective in different consequences. It has been helping us to identify clients who changed their shopping approach towards our company, so we could react to this change adequately” David Kroupa, Seznam
Data Mining Concept Data mining is the process of automatically discovering useful information in large data
repositories. (Tan, Steinbach, Kumar, 2006)
A process of non-trivial retrieval of the implicit, formerly unknown and potentially useful
information from data (Fayad et al., 1996).
A process of revealing hidden consequences in data.
Exploratory analysis of observational data.
Data -> Information -> Decision.
Traditional techniques may
be unsuitable due to
Large amount of data
High dimensionality of data
Heterogeneous,
distributed nature of data
Statistics
Data Mining
AI
Machine Learning
Pattern Recognition
Data Mining Tasks In general: predictive vs. descriptive
Classification (credit risk calculation)
Estimation (long-term customer value)
Segmentation (groups of subjects with similar behavior)
Shopping cart analysis (products being bought together)
Fraud detection (suspicious credit card transactions, claim validation)
Anomaly detection (aircraft systems monitoring during flight, medical systems)
Prediction (“Churn” – which customers will leave next year?)
Social networks mining, spatial data mining
Data quality mining (data quality measurement and improvement)
Find human-
interpretable patterns
that describe the
data.
Use some variables
to predict unknown
or future values of
other variable.
Data Mining Methods Decision trees
Association analysis
Clustering
Graphical probabilistic models
Neural networks
Kohonen self-organizing maps
Support vector machine
Nearest neighbor
Non/linear regression
Logistic regression
Time series analysis
Genetic algorithms
Fuzzy modeling
GUHA, …
Areas of Data Mining Applications
Banking & insurance (fraud detection,
predicting customer life-time value, …)
Telecommunication (-||-)
Direct marketing
Supply chain management
eCommerce
Trading (technical analysis)
Scientific research
Medicine & healthcare (medical expert systems)
Technical fault diagnosis
…
Data Quality: a Critical Issue
“Garbage in, garbage out”
90% of time: data preparation (ETL)
10% of time: the DM itself
Data transformation issues
Data ambiguity (e.g. Gender = ‘F’, ‘Female’, ‘woman’, ‘male’, ‘man’, etc.)
Missing values
Duplicate values
Naming conventions of terms and objects
Different currencies
Different formats of numbers and text strings
Referential integrity
Missing dates
Software for Data Mining Commercial
SPSS PASW Modeler / Clementine (http://www-01.ibm.com/software/analytics/spss/)
SAS (http://www.sas.com/)
Microsoft SQL server (http://www.microsoft.com/sqlserver/2008/en/us/default.aspx)
Microsoft Excel (DM Add-In; http://www.microsoft.com/sqlserver/2008/en/us/data-mining-
addins.aspx)
Oracle DM (http://www.oracle.com/technology/products/bi/odm/index.html)
Kxen (http://www.kxen.com/)
MS Azure ML (Claud)
…
OpenSource or Freeware
Weka (http://www.cs.waikato.ac.nz/ml/weka/)
R (http://www.r-project.org/)
Orange (http://www.ailab.si/Orange/)
LISP Miner (http://lispminer.vse.cz/)
Ferda (http://ferda.wiki.sourceforge.net/)
…
Benefits for Customers Better understanding of their business
Increasing efficiency
Increasing safety, reliability
Possibility of restructuring business processes.
Possibility of changing person’s mindset.
Possibility of increasing profit, decreasing risks, better financial stability.
Competitive
advantage
Risks
Unsure result
Data Mining can reveal already known or obvious facts
The result depends on data quality (errors) and distribution of values (skewness, kurtosis,
...)
Overfitting (model is not generalizing enough, it is too much trained to concrete data) can
occur, but there are ways to minimize it.
Use Case: Progressive rewards
Each customer has
different worth
Focus on customers
individually
=> Happy Customer
=> $$$
10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0%
Your cut optimum
Best Worst
Use Case: Up-sell, X-sell
Focus on Loyal customers => +4%
Convert them into VIP customers
=> Increase Revenue $$$
Use Case: Up-sell, X-sell
R 1 2 3 4 5
M F F F F F
1 1 1 1 1 2
2 1 1 1 2 5
3 1 2 3 3 7
4 1 3 3 5 10
5 2 6 7 12 21
R 1 2 3 4 5
F $ $ $ $ $
1 45k 197k 118k 114k
2 162k 89k 55k
3 369k 238 507k 246k 342k
4 260k 575k 1407k 1061k 397k
5 918k 1476k 2863k 6677k
R 1 2 3 4 5
F $ $ $ $ $
1 1M 15M 8M 3M
2 24M 3M 0M
3 13M 23M 14M 9M 11M
4 5M 25M 97M 57M 14M
5 17M 50M 157M 935M
R 1 2 3 4 5
F # # # # #
1 27 78 69 29
2 140 37 4
3 34 95 27 36 32
4 18 43 69 54 34
5 19 34 55 140
# of Customers - Heat map RF Heat map RF
Avg Frequency - Heat Map RM
Avg Transactions / product - Head Map RF
Use Case: Customer Churn
identifiy customers, who want to leave to competition in given period
Historical data
(Previous months)
Regular predictions
(Current month)
Marketing campaign
(Next month)
Potential churn
(Next 2 months)
Use Case: Market basket analysis,
Shelf content optimalisation
Offering of products that you most likely to buy
Defining the contents of the shelf according to what people most often buy
together
Use Case: Claim handling
Automation of claim handling process and therefore saving money
Speeding-up the process
Reducing complexity without impacting the result
Better understanding of what are the real key factors of the decision
process
Identifying suspicious exceptions in the decision process (fraud detection)
Optimizing the process to be more accurate in terms of whether a claim
should be accepted or rejected
Projects - References
Mondi – production process optimalisation, cost
optimalisation, Paper Mill
Nordic mobil devices producer – claim handling
Overkill, CGTrader, Seznam.cz – customer
segmentation, campaigns and markreting
Enterasys – analysis of won opportunities
NBA – customer segmentation, market basket
analysis
Customers Stories
„These guys know what they’re doing. Thanks their targett approach our open and conversion rates icreased over 100% which led to 45% campaign revenue growth“ Dalia Lasaite, CG Trader
“They helped us with our Facebook acquisition campaigns, identyfying VIP customers anfd using them for lookalike audience creation. The result was impresive – we generated over four times more campaign revenue and our ROI was almost 50% higher” Martim Chamrad, Craneballs
“A predictive model driven recommendation engine was super-easy to implement, and we found that the recommendations are 85% more likely to lead to sales.” Chirag Nirmal, Bow & Drape
“Behavioral segmentation gave us a fresh look our current clients with another perspective in different consequences. It has been helping us to identify clients who changed their shopping approach towards our company, so we could react to this change adequately” David Kroupa, Seznam
Just few attributes really needed
WHO
START
RECVG TO
SHIPD DAYS SRVC CODE
SENDER
CNTRY
SRVC COSTS
CRNCY
IS
INFORMATION
ONLY RETN TYPE IN WRTY IND
Benefits for Customer Automation of claim handling process and therefore saving money
Speeding-up the process
Reducing complexity without impacting the result
Better understanding of what are the real key factors of the decision
process
Identifying suspicious exceptions in the decision process (fraud
detection)
Optimizing the process to be more accurate in terms of whether a
claim should be accepted or rejected
References
http://video.google.com/videosearch?q=David+Mease&emb=0#
http://www.microsoft.com/sqlserver/2008/en/us/data-mining-addins.aspx
http://www.microsoft.com/emea/spotlight/event.aspx?id=99
CRISP-DM (http://www.crisp-dm.org/)