Bootstrap is a non-parametric method for estimating accuracy defined in terms of standard error, bias, variance, confidence interval, etc. Suppose we want to know VF(Tn) i.e. the variance of Tn, where Tn is any function of data.

  • Estimate VF(Tn) with Vn(Tn).
  • Approximate Vn(Tn) using simulation.

What is Simulation?

Simulation is a way of generating large amounts of data such that the inference on the data matches the real-world data. Suppose we draw IID samples Y1, . . . , YB from a distribution G. By the law of large numbers

as B → ∞. So for large samples from G, we can use Yn to approximate E(Y).

Generalizing the above formula we have,
simulation

Bootstrap Variance estimation

Our objective is to estimate Vn(Tn) i.e. the variance of Tn if the distribution of the data is F̂n. To simulate Tn we simulate X1*, . . . , Xn* from F̂n and then compute Tn* = g(X1*, . . . , Xn*). Since F̂n puts mass 1/n at each data point, drawing an observation from F̂n is equivalent to drawing one point at random from the original dataset. These are the steps of Bootstrap Variance estimation.

  • Draw X1*, . . . , Xn* ~ F̂n.
  • Compute Tn* = g(X1*, . . . , Xn*)
  • Repeat steps 1 and 2, B times, to get Tn, 1*, . . . , Tn, B*
  • Let
    bootstrap variance estimation

Summary

Let us take an example of estimating difference in medians of two data. The algorithm is:

x1 <- First sample
x2 <- Second Sample
n1 <- length(x1)
n2 <- length(x2)
theoretical_value <- median(x2) - median(x1)

B <- 1000
bootstrap_samples <- vector of length B

for i in 1:B{
   xx1 <- sample of size n1 with replacement from x1
   xx2 <- sample of size n2 with replacement from x2
   Tboot[i] <- median(xx2) - median(xx1)
}

se <- sqrt(variance(Tboot))

Leave A Comment