So this is something that people don't seem to grok quite well, and it really de...

kgwgk · on July 16, 2018

If you expect to get it right (in this particular prediction, Clinton to win) with 95% probability, what does it mean to say that this 95% is with low confidence or with high confidence?

thousandautumns · on July 16, 2018

Not OP, but that opens a whole different can of worms. “Confidence” has a specific meaning in the context of statistical theory, and specifically in a particular flavor of statistics called “frequentism”. I won’t get into what is involved in frequentism, and how it differentiates itself from the alternative, Bayesianism, but essentially “confidence” refers to a measure that really says more about the statistical methodology used to arrive at the estimated value (in this case, that Hillary had a 95% chance of winning) than the value itself. This makes is a bit esoteric and something that people misinterpret all the time.

Basically, confidence refers to a hypothetical scenario in which a the data gathering process were to be repeated and the same analysis done, X% of the confidence intervals (essentially, the +/- bounds around your estimate) will contain the true value for what you are trying to estimate.

So in this hypothetical scenario, we say we have the power to go back in time and recollect the polling data in 2016 and run the same analysis used to arrive at that 95% number. And let’s say we use this power over and over again, a very large number of times. Then 95% of the error bounds we construct should contain the true value of the probability Hillary wins, whatever that is.

The thing is that those error bounds can be huge. You can have 95% confidence that the probability that Hillary wins is between 3% and 98%, for example. You can also have 10% confidence that the probability of a Hillary win is between 94% and 96%. Without the confidence intervals, a “confidence level” doesn’t say much. It’s also predicated on the assumption you haven’t screwed up your data collection process or analysis methodology. And if you are predicting something will occur with a probability of 95%, and it doesn’t, that doesn’t automatically mean you are wrong, but the likelihood of you having screwed something up is definitely higher.

kgwgk · on July 16, 2018

I agree that this is a different can of (nasty) worms.

The message I replied to said that > It was not wrong to say Hillary had a 95% chance of winning the presidential election,

Frequentist inference cannot be interpreted as a probability unless one goes through some (often misunderstood, as you pointed out) contortions. In your scenario where you have 95% confidence of something it would be wrong to say that Clinton had a 95% chance of winning.

sobani · on July 17, 2018

The way I see it (as someone who knows nothing about statistics), confidence would be the difference between 9/10 vs 900/1000. And/or how much effort you spend ramming a square peg in a round hole to have a prediction model.

You have a lot of data about donkeys vs elephants. But this contest is between a mule and a mammoth. If you assume a mule is equivalent to a donkey and a mammoth is equivalent to an elephant, the mule has 95% odds in its favor. But you recognize the assumptions so your prediction doesn't have a high confidence.