>So if you conduct a 100 experiments, you'll get a 100 (non-overlapping) CI's! So in which of those 100 CI's does the "true population mean" lie ? All 100 of them ? 95 of them ?! You tell me.
I don't know why you get the idea that all 100 will be non-overlapping. That's simply false.
And yes, if your assumptions were correct, regular (i.e. frequentist) statistics will state that roughly 95% of the CIs will contain the true mean. There is nothing absurd about it.
Speaking as a Stat TA, literally over 90% of the students taking the class will conduct 1 experiment, not 100, which gives you 1 CI, not 100, & then say that particular CI has a 95% chance of containing the population mean! Then when I tell them the mean is either in that CI or not ( either 100% in, or 0% in ), they google the CI definition & point me to that. That's why I said that definition doesn't work for the masses. It can be interpreted as - "if you conduct a 100 experiments & get 100 CI's then roughly 95 of those will contain true mean", but then nobody ever conducts 100 experiments, so from their pov its an absurd definition.
Its imperative to understand that these definitions are written not for the average user of statistics, but for a trained statistician. Unfortunately, the average stat consumer vastly outnumbers the professional. Papers are littered with statements like p value proves H0, or proves H1. I have had numerous conversations with scientists ( not statisticians, but pharma/epidem/engg people who show up to the stat lab for consult ) that their p value doesn't prove H0 or H1. "What do you mean you can't prove H1 ? Oh you mean it only rejects H0 ? Ok but isn't that same as prove H1 ? It isn't ?! Well in my field if I just state it rejects H1 it won't be well understood so I am going to instead say H1 has been proved!" So there's little the statistician can do.
Regards overlap, I meant total/exact overlap, as in no two CIs will be identical on any conti dist.
>Speaking as a Stat TA, literally over 90% of the students taking the class will conduct 1 experiment, not 100, which gives you 1 CI, not 100, & then say that particular CI has a 95% chance of containing the population mean!
And as a TA, I hope you are marking them wrong! My stats professor went through great pains to point it out, as does my stats textbook.
>but then nobody ever conducts 100 experiments, so from their pov its an absurd definition.
The definition is not absurd. It's just a definition. You can argue that CI's are being abused and are not as helpful as people think they are and I'll probably agree with you. But that's no excuse for people to use it and get a pass for not even knowing what it means!
I contend that it was defined as such because at the time no one had anything better. I suspect that much is true even now. I'm not aware of any obviously better alternatives, and articles like these even suggest there aren't any - just that we need to be mindful that the CI alone doesn't allow for reliable conclusions.
At the end of the day, this is not a problem with the CI definition. It's not a statistics/technical problem. It's a social/cultural one. As such, the solution isn't to change statistics, but to fix the cultural problem: Why do we keep letting people get away with such analyses? Are there any journals that have a clear policy on these analyses? Are referees rejecting papers because of an over reliance on p-values?
Let's not change basic statistics definitions and concepts because the majority of non-statisticians don't understand them. When the majority of the public can't understand basic rules of logic (like A => B does not mean that (not A => not B)), we don't argue for a change to the discipline of logic. When huge numbers of people violate algebra (e.g. sqrt(a^2+b^2) = a + b), we don't blame mathematics for their sins. (I know I'm picking extreme cases for illustration, but the principle is the same). I had only one stats class in my curriculum. If you want people to perform correct statistics as part of the profession, make sure it is fundamental to much of their work. It would have been trivial to add a statistics component to most of my engineering classes, and that would hammer in the correct interpretation. Yet while we were required to know calculus and diff eq for most of our classes, none required any statistics beyond the notion of the mean (and very occasionally, some probability).
Statistics is a tool. It will always be the responsibility of the person invoking the tool to get it right.
>Regards overlap, I meant total/exact overlap, as in no two CIs will be identical on any conti dist.
Isn't that the whole point of inferential statistics? You have a population with a true mean. You cannot poll the whole population. Hence you take a sample. This is inherently random. There is variance in your estimate (obvious), What should be clear is that the CI should move with your point estimate. Furthermore, you never know the true stddev, so you estimate the stddev from your sample. Now both your center and the width of the CI will vary with each sample. I can't comprehend how you could hope to get the same interval from different samples, given that it is quite possible to get all your points below the mean in one sample and above the mean in the other.
I think people are bashing statistics because it isn't helping them come to a clear conclusion (which is fair). But as I said, all the proposals I've seen appear to be as problematic.
I don't know why you get the idea that all 100 will be non-overlapping. That's simply false.
And yes, if your assumptions were correct, regular (i.e. frequentist) statistics will state that roughly 95% of the CIs will contain the true mean. There is nothing absurd about it.