For a parameter θ, a 1-α confidence interval is Cn = (a, b)
where a = A(X1,. . , Xn) and b = B(X1,. . , Xn) are the functions of the data such that
Pθ(θ ∈ Cn) ≥ 1 – α, for all θ ∈ ϴ
In simpler words, (a, b) traps θ with a probability 1 – α. We call 1 – α the coverage of the confidence interval. When we say 95% confidence interval, that means we choose α = 0.05.
Suppose that we calculate the mean of the height of 50 people chosen at random from the world. The mean turns out to 170cm. We know that the standard deviation of the height of these men is 20cms. The 95% confidence interval is 170 ± 5.54cms.
It means that the mean of the height of all the people in the world will lie between 164.46 cms to 175.54cms and the probability of happening this is 95%. Now Let’s see how to calculate the confidence interval. For this we need 3 data
Number of observations n: 50
Mean (Sample mean) X̄: 170
Standard deviation (of sample) s: 20
The confidence interval is X̄ ± Z*(s/√n)
The only parameter unseen is Z, we find that using Z table below this example. The value of Z for 95% confidence interval is 1.96. Therefore the confidence interval is
170 ± 1.96 * ( 20/√50) = 170 ± 5.54cms
Normal based confidence interval (Derivation and intuition)
You just saw the fancy little method to find a confidence interval, ever wondered where the formula comes from. Let’s find out.
Suppose that θ̂n ≈ N(θ, ŝe). Let Φ be the CDF of a standard Normal and
zα/2 = Φ-1(1 – (α/2)) i.e. P(Z > zα/2) = α/2 and
P(-zα/2 < Z < zα/2) = 1 – α where Z ~ N(0, 1). Let
Cn = (θ̂n – zα/2ŝe, θ̂n + zα/2ŝe) then,
Pθ(θ ∈ Cn) → 1 – α
Proof: Let Zn = (θ̂n – θ)/ŝe. By assumption Zn ⇝ Z where Z ~ N(0, 1). Hence
Note that ŝe i.e. standard error for a sample of the population is σ/√n and X̄ is the point estimate. This is imputed in the above formula to get X̄ ± Z*(s/√n) (equation used in the previous example) and the confidence table is computed from the CDF of Standard Normal (Φ) table.