I roll a die three times. I get three ones. P<0.01 (for the null hypothesis of a fair die and the two-tailed test on the average).
Hmm. At a glance, that doesn't seem right. Yes, the chances of rolling 3 1's is 1/(6^3), but if we only rolled once and got a single 1, we wouldn't have any reason to suspect that the die was unfair. So maybe we should only consider the second two repetitions, and conclude with p ~ .03 that the die is unfair? Otherwise, consider the case that we rolled a 1, 5, 2 --- certainly we shouldn't use this series of non-repeated outcomes as p < .01 evidence of an unfair die?
If the die is fair, the average score will be 3.5. One can define a test based on that value and reject the null hypothesis when the average score is too low or too high.
The sampling distribution for the average can be calculated and for three rolls the extreme values are 1 (three ones) and 6 (three sixes) which happen with probability 1/216 each. Getting three ones or three sixes is then a p=0.0093 result.
You raise a valid point. This is clearly not the best test for detecting unfair dice, because for a die which has only two equally probable values 3 and 4 we would reject the null hypothesis even less often than for a fair die! (In that case, the power would be below alpha, which is obviously pretty bad.)
Hmm. At a glance, that doesn't seem right. Yes, the chances of rolling 3 1's is 1/(6^3), but if we only rolled once and got a single 1, we wouldn't have any reason to suspect that the die was unfair. So maybe we should only consider the second two repetitions, and conclude with p ~ .03 that the die is unfair? Otherwise, consider the case that we rolled a 1, 5, 2 --- certainly we shouldn't use this series of non-repeated outcomes as p < .01 evidence of an unfair die?