marketing analytics: a source of informational advantage mgmt e-6750 harvard extension school,...
TRANSCRIPT
Marketing Analytics: a Source of Informational Advantage
MGMT E-6750Harvard Extension School, Harvard University
Andrew Banasiewicz, [email protected]
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Big Data
According to the McKinsey Global Institute, Big data “…refers to datasets whose size is beyond the ability of typical software tools to capture, store, manage and analyze.”
Consequently, big data can take on a variety of formats, including the “traditional” numerically-coded, the “new” text-encoded or mixed sources; it has been estimated that about 95% of all data is textual.
In and of itself, data is merely a raw material that requires (considerable at times) amount of processing before it can yield value;
“Traditional” business analytics focused on the easier to analyze numeric data, which comprises roughly 5% of all available data;
Analyst-driven vs. machine learning approaches; Confirmatory vs. exploratory
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
The definition of what constitutes “big” (data) is variable across entities (e.g., companies) and across time. To the degree to which “big” is synonymous with “difficult to handle and/or analyze”, the threshold will continue to move up…
How Big is “Big”?
The Library of Congress as the yardstick of choice: McKinsey Global Institute: Organizations globally stored more than 7 exabytes of
data on disk drives in 2010, which is about 28,000 x the information stored in the U.S Library of Congress (which reported as having 235 terabytes of storage in April of 2011)…
Winterberry Group: In 2011, Facebook users uploaded the amount of data that is roughly equal to 3,600 x the print collection of the U.S Library of Congress…
Tableau: All of the books in the Library of Congress total about 15 terabytes, which is the amount of data generated by Twitter in a single day…
So…how big is the Library of Congress, really? The size of the Library cannot be accurately expressed in digital metrics; 142 million items in physical collections – books and printed items account for only
about 32 million of the total (the rest include maps, manuscripts, globes, photos, recordings, sheet music, etc.);
6 million items stored at the new Packard Campus for Audio-Visual Conservation are being digitized at the rate of 3-5 petabytes (3,000-5,000 terabytes) per year; the process is expected to take several decades…
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Doing more with less: Business process automation – ATMs, inventory management systems,, electronic
transaction processing; Algorithm-based decisioning – application processing, ordering, price adjustments;
Doing things previously not possible or not economically feasible: Near-real-time decision support systems; Micro-segmentation and corresponding product design/adaptation; On-going in-market experimentation; Comprehensive promotional impact measurement;
New business models: Insurance underwriting;
Big Data – Big Opportunities
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Volume: According to Google, from the dawn of time through 2003,
human civilization generated approximately 5 exabytes of information – by 2009, that much data (equivalent to about 25 quadrillion tweets) was generated every 2 days…
By 2010, all the digital data in existence was estimated at about 1,200 exabytes, while the amount of data created in 2011 surpassed 1.8 zettabytes (1.8 trillion gigabytes);
The Large Hadron Collider generates 40 terabytes of data per second;
Twitter generates about 15 terabytes of data per day…
· 1 Bit = Binary Digit· 8 Bits = 1 Byte· 1024 Bytes = 1 Kilobyte · 1024 Kilobytes = 1 Megabyte · 1024 Megabytes = 1 Gigabyte · 1024 Gigabytes = 1 Terabyte · 1024 Terabytes = 1 Petabyte · 1024 Petabytes = 1 Exabyte· 1024 Exabytes = 1 Zettabyte · 1024 Zettabytes = 1 Yottabyte
Velocity: One estimate suggests that the volume of data grows at about 40% (McKinsey) to 60% (other
estimates) annually compounded rate; Another estimate indicates that the volume of data grew by a factor of 9 in a span of 5 years…
Variety; From traditional numeric (e.g., UPC scanner data) to text-encoded social interactions to online
clickstreams to location (e.g., GPS) to weather to sensors to… Virtually all new and emerging communication and transaction processing technologies capture
data…
Big Data – Big Challenges
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Knowledge extraction - techniques: Conventional statistical methods don’t always work well, even with numerically-coded data; Text-coded data presents additional challenges: Does not adhere to the “computer-friendly” two-
dimensional data matrix format; ambiguity of human language
Big Data – (More) Big Challenges
Knowledge extraction - technologies: Data capture – storage – management – access; Data amalgamation – multi-source analytics;
Data policies: Privacy vs. utility tradeoff; Data security; Intellectual property rights (i.e., who own data) and related legal considerations;
Organizational change and talent: To truly reap the benefits of data , behavioral change is a must! CIOs are infrastructure-, not knowledge creation- focused ; Deep analytic know-how is relatively scarce;
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Challenges to deployment of analytics: Data issues: Ranges from quality to accessibility to sharing (within firms); Expertise issues: The skills required to translate data into insight; packaged vs.
custom approaches; Cultural issues: Analytic orientation or the use of data as decision driver;
Big Data Analytics – More Challenges
Evidence-Based Management & obstacles to behavioral change: Evidence-Based Management: Sample problems – different solutions; Our dear, yet biased intuition;
8
7 ≈ thickness of an avg. notebook
10 ≈ width of a hand (thumb included)
14 ≈ height of an avg. person 17 ≈ two story house 20 ≈ quarter of the way up the Sears Tower
30 ≈ past the outer limits of Earth’s atmosphere 50 ≈ 87 million miles (almost the distance to the Sun)
Behavioral Change & Biased Intuition
9
An individual has been described by a neighbor as follows:
“Steve is very shy and withdrawn, invariably helpful but in little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.”
Is Steve more likely to be a librarian or a farmer?
There are more than 20 male farmers for every male librarian in the U.S.
Biased Intuition: Example #2
PROCESS INTRODUCTIONMarketing Database Analytics (MDA)
(Based on Marketing Database Analytics, Banasiewicz, A. D., 2013, Routledge, New York, NY)
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Marketing Database Analytics
In view of the above, the goal of marketing database analytics is to contribute to the creation of informational advantage by providing an ongoing flow of decision-guiding, competitively-advantageous knowledge.
According to Drucker, the overriding objective of any business is to create a customer – given that, it follows that marketing has three (3) primary goals:
1. New customer acquisition (persuasion);2. Current customer retention (persuasion);3. Marketing mix optimization (economic rationalization);
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Knowledge: Explicit vs. Tacit
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Knowledge as a Source of Competitive Advantage
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
The Creation of Explicit Knowledge
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
The Data – Information – Knowledge Continuum
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Process Foundation: The General Systems Model
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
The General Systems Model and the Logic of Marketing Database Analytics
correlation vs. causation
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Exploration
Explanation
Prediction
Validation
Update
MDA
The Same MDA Logic Shown Differently
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
From the General Systems Model tothe Marketing Database Analytics (MDA) Process
Incrementality Measurement
Behavioral PredictionsSegmentationExploratory
Analyses
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
The Marketing Database Analytics (MDA) Process
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
The Marketing Database Analytics (MDA) Process:Process – Skills – Tools
DATA MINING VS. PREDICTIVE ANALYTICS
Marketing Database Analytics
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Data Exploration vs. Hypothesis Testing
Data exploration: Open-ended search for relationships. No or non-specific pre-existing beliefs; Early knowledge-creation steps; An attempt to reach beyond what is currently
known; Hypothesis testing: Confirming currently held beliefs.
Focused on specific, pre-existing beliefs; More advanced knowledge-creation steps; An attempt to validate what is currently
believed;
Predictive analytics: a special case of hypothesis testing. Purpose-driven: churn; response; Uniqueness, not generalizability focused; Demands ongoing refresh; Efficacy directly measurable;
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Exploration vs. Explanation/Prediction
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Exploratory Analyses
t-test F-test Χ2 test
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
0.3413 * 2 ≈ 68% of area under the curve
Hypothesis Testing: The Basics of Significance Testing
α = 0.10 or 90% Confidence Level
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
(0.3413 * 2) + (0.1359 * 2) ≈ 95% of area under the curve
Hypothesis Testing: The Basics of Significance Testing
α = 0.05 or 95% Confidence Level
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
(0.3413 * 2) + (0.1359 * 2) + (0.0215 * 2) ≈ 99% of area under the curve
Hypothesis Testing: The Basics of Significance Testing
α = 0.01 or 99% Confidence Level
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
The fundamental premise of hypotheses testing:
Null hypothesis Ho: X = YAlternative hypothesis Ha: X ≠ Y
Key Hypothesis Testing Concepts
When things go awry: Type I vs. Type II errorType I: Incorrectly concluding that there is a difference;Type II: Incorrectly concluding that there is no difference;
Standard deviation: difference between the actual value and the estimated mean; the variability around the mean.
Standard error: difference between the estimate and the “true” value; the variability of the mean estimate.
These confusing “errors”: Standard Error vs. Standard Deviation
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
One-tailed vs. Two-tailed tests
If current knowledge allows a directional alternative hypothesis (e.g., mean value of factor A is larger than value X), then a one-tailed significance test should be used.
If previous research results were mixed, or the research is purely exploratory, or if population parameters are poorly understood, use two-tailed test
When to perform significance tests? When we use sample data, NOT population. How to interpret significance tests? Confidence interval - NOT point estimates.
Statistical Significance Testing
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Beware of Statistical Significance Tests!Incommensurate goals of theory development and practical applications
Theory Development: Practical Applications:
universal generalizations competitive advantage
sample-to-population now-to-future
expected precision: direction expected precision: magnitude
Statistical vs. practical significance: Often invoked, but nonsensical distinction!
variable sample size typically large sample size
http://faculty.vassar.edu/lowry/polls/calcs.html
UNDERSTANDING THE DATA – ANALYTIC PLANNING – EXPLORATION
Marketing Database Analytics
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
The Importance of Analytic Planning
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
An Analytic Planning Template
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Getting to Know the Data
Data can be: Root or derived Qualitative or quantitative
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Data’s Analysis-Readiness
In business, the vast majority of data is a byproduct of electronic transaction processing, computer/network connectivity and other processes, due to which it is rarely captured in analysis-ready form.
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Delving into the Available Data
Univariate analysis
Metadata
Source: Banasiewicz, Andrew D., Marketing Database Analytics, 2013, Routledge, New York, NY. All Rights Reserved.
Delving into the Available Data Search for associations: Bivariate and multivariate analyses