a introduction to a-b test
TRANSCRIPT
Hypothesis testing
1. State a null and alternative hypothesis clearly (one-tailed or two-tailed test)e.g. one-tailed
2. Determine a test size (significance level). e.g. test size(alpha) = 0.05, critical value=1.645
3. Decision-making: reject or do not reject the null hypothesis.e.g. test statistic = 2.25, p-value = 0.02 …
4. Draw a conclusion and interpret substantively
Statistic Power
• Type I Error (α) : probability of rejecting the null hypothesis when it is true
• Type II Error(β) : accept a wrong null hypothesis [beta]
• Power of a test(1- β):the probability that it will correctly lead to the rejection of a false null hypothesis
Determining sample size
• the point where the upper value of α on the null curve and the value for β on the alternative curve meet
• 80% Power,95% confidence level (Lehr`s equation)
• assume that the distribution of the mean is normal
Determining sample size
• Formula 2
– When |Skewness| > 1 , 355 × S^2 for each variant
– In order to close normal distribution
– skewness: is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. [ from wiki ]
Rules - Small Changes can have a Big Impact to Key Metrics
Sessions success rate improved, time-to-success improved, +$10M annuallyThis kindle of succ is rare
Rules - Reducing Abandonment is Hard, Shifting Clicks is Easy
• local improvements are easy
• global improvements are much harder
• succ– significant improvements to relevance,
– anti-malware flight
Reference
• [1] Jesse Farmer. Statistical Analysis and A/B Testing
• [2] Ron Kohavi. Controlled experiments on the web : survey and practical guide
• [3] Ron Kohavi. Seven Rules of Thumb for Web Site Experimenters. KDD 2014
• [4] Diane Tang. Overlapping Experiment Infrastructure : More, Better, Faster Experimentation. KDD 2010
• [5] Charles DiMaggio. Power Tools for Epidemiologists. 2014
• [6] Gerald van Belle. Statistical Rules of Thumb