*He also is using 95% confidence as a cut-off. Don't do that. You don't need muc...

btilly · on Jan 9, 2013

Statistical significance grows roughly like the square root of the number of samples.

No, no, no. You are confusing the growth of the standard deviation (which does grow like the square root of the number of samples) with the increase in certainty as you add standard deviations. That falls off like e^(-O(t^2)) where t is the number of samples. This literally falls off faster than exponential.

What does this mean in the real world? In a standard 2-tailed test you get to 95% confidence at 1.96 standard deviations, 99% confidence at 2.58 standard deviations, and 99.9% confidence at 3.29 standard deviations. These numbers are all a long ways away from 5 standard deviations.

Let's flip that around and take 95% confidence as your base. If you are measuring a real difference, then on average 99% confidence requires a test to get 32% more data, and 99.9% confidence requires a test to get 68% more data. Depending on your business, the number of samples that you get are often proportional to the time it takes to run the test. If making errors with x% of your company involves significant dollar figures, the cost of running all of your tests to higher confidence tends to be much, much less than the cost of one mistake.

That is why I say that if the cost of collecting more data is not prohibitive, you shouldn't be satisfied with 95% confidence.

ISL · on Jan 11, 2013

Assume a random variable is barely resolved at 1-sigma off zero with N samples. If I wish to increase my confidence that it really is off zero (and the mean with N samples is actually the mean of the distribution), then I'll need 4N samples to halve my uncertainty and double the significance of the observation (as measured in sigma-units). It is in that sense that the significance of a measurement increases like \sqrt(N).

Viewed from my perspective, if you'd like to go from 2-sigma (95%) to 3.29-sigma, you'd need (3.29^2)/(2^2)=2.7 times the amount of data used to get the 2-sigma result, or 170% more samples.

It looks like you've reached your conclusion that I'd need 68% more data to reach 99.9% by taking 3.29/1.95=1.68. I believe that this is in error. Uncertainty (in standard deviation) decreases like 1/\sqrt(N), not 1/N.

\sqrt(N) has driven me to depression more than once.

btilly · on Jan 11, 2013

You are right that I used linear where I should have used quadratic.

However consider this. To go from 95% to 99% confidence takes 73% more data collection. So for 73% more data, you get 5x fewer mistakes.

To go from 95% to 99.9% confidence takes 182% more data. So for less than 3x the data, you get 50 times the confidence.

My point remains. Confidence improves very, very rapidly.

ISL · on Jan 11, 2013

Neat to see a different side of a coin. In our lab, individual measurements can take as long as a year. \sqrt(N), when constrained by human realities, presents a wall beyond which we cannot pass without experimental innovation.

As the derivative of \sqrt(N) is 1/2*1/\sqrt(N), your first measurement teaches you the most. Every measurement teaches you less than the last. In general, we measure as much as we must, double the size of the dataset as a consistency check, and move on. The allocation of time is one of the most important decisions of an experimenter.

btilly · on Jan 11, 2013

Ah. Well I talk about the cost of data acquisition for a reason.

I've seen a number of businesses who have a current body of active users, and this does not change that fast. So when they run an A/B test, before long their active users are all in it, and before too much longer those of their active users who would have done X will have done X, and data stops piling up. In that case there is a natural amount of data to collect, and you've got to stop at that point and do the best you can.

Businesses are as alike as snowflakes - I am happy to talk about generalities but in the end you have to know what your business looks like and customize to that.

cftm · on Jan 9, 2013

If memory serves though, the further out you push your sigma's the greater the likelihood of introducing a type-2 error.

There is no easy answer in statistics!