a introduction to a-b test

17
An introduction to A-B test 数据挖掘组 王犇(garfieldwang) 2014-10

Upload: yihucha

Post on 02-Aug-2015

55 views

Category:

Technology


2 download

TRANSCRIPT

An introduction to A-B test

数据挖掘组 王犇(garfieldwang)

2014-10

Controlled experiment

example

example

• random variable

• null hypothesis

• Z-score approximate

example

Hypothesis testing

1. State a null and alternative hypothesis clearly (one-tailed or two-tailed test)e.g. one-tailed

2. Determine a test size (significance level). e.g. test size(alpha) = 0.05, critical value=1.645

3. Decision-making: reject or do not reject the null hypothesis.e.g. test statistic = 2.25, p-value = 0.02 …

4. Draw a conclusion and interpret substantively

Statistic Power

• Type I Error (α) : probability of rejecting the null hypothesis when it is true

• Type II Error(β) : accept a wrong null hypothesis [beta]

• Power of a test(1- β):the probability that it will correctly lead to the rejection of a false null hypothesis

Determining sample size

• Formula 1

Determining sample size

• the point where the upper value of α on the null curve and the value for β on the alternative curve meet

• 80% Power,95% confidence level (Lehr`s equation)

• assume that the distribution of the mean is normal

Determining sample size

• Formula 2

– When |Skewness| > 1 , 355 × S^2 for each variant

– In order to close normal distribution

– skewness: is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. [ from wiki ]

Rules - Small Changes can have a Big Impact to Key Metrics

Sessions success rate improved, time-to-success improved, +$10M annuallyThis kindle of succ is rare

Rules - Speed Matters a LOT

• every 100msec speedup improves revenue by 0.6%

Rules - Reducing Abandonment is Hard, Shifting Clicks is Easy

• local improvements are easy

• global improvements are much harder

• succ– significant improvements to relevance,

– anti-malware flight

More Tips

• A-A test

• Primacy & newness effects

• Robots

• Long-term goals

Beyond A-B test

• Overlapping Experiment Infrastructure—More、Better、Fast

Reference

• [1] Jesse Farmer. Statistical Analysis and A/B Testing

• [2] Ron Kohavi. Controlled experiments on the web : survey and practical guide

• [3] Ron Kohavi. Seven Rules of Thumb for Web Site Experimenters. KDD 2014

• [4] Diane Tang. Overlapping Experiment Infrastructure : More, Better, Faster Experimentation. KDD 2010

• [5] Charles DiMaggio. Power Tools for Epidemiologists. 2014

• [6] Gerald van Belle. Statistical Rules of Thumb

RTX: garfieldwang

mail: [email protected]

Thanks