# Bootstrap

Bootstrap is a non-parametric method for estimating accuracy defined in terms of standard error, bias, variance, confidence interval, etc. Suppose we want to know V_{F}(T_{n}) i.e. the variance of T_{n}, where T_{n} is any function of data.

- Estimate V
_{F}(T_{n}) with V_{F̂n}(T_{n}). - Approximate V
_{F̂n}(T_{n}) using simulation.

### What is Simulation?

Simulation is a way of generating large amounts of data such that the inference on the data matches the real-world data. Suppose we draw IID samples Y_{1}, . . . , Y_{B} from a distribution G. By the law of large numbers

as B → ∞. So for large samples from G, we can use Y_{n} to approximate E(Y).

Generalizing the above formula we have,

### Bootstrap Variance estimation

Our objective is to estimate V_{F̂n}(T_{n}) i.e. the variance of T_{n} if the distribution of the data is F̂_{n}. **To simulate T _{n} we simulate X_{1}^{*}, . . . , X_{n}^{*} from F̂_{n} and then compute T_{n}^{*} = g(X_{1}^{*}, . . . , X_{n}^{*})**. Since F̂

_{n}puts mass 1/n at each data point, drawing an observation from F̂

_{n}is equivalent to drawing one point at random from the original dataset. These are the steps of Bootstrap Variance estimation.

- Draw X
_{1}^{*}, . . . , X_{n}^{*}~ F̂_{n}. - Compute T
_{n}^{*}= g(X_{1}^{*}, . . . , X_{n}^{*}) - Repeat steps 1 and 2, B times, to get T
_{n, 1}^{*}, . . . , T_{n, B}^{*} - Let

### Summary

Let us take an example of estimating difference in medians of two data. The algorithm is:

x1 <- First sample x2 <- Second Sample n1 <- length(x1) n2 <- length(x2) theoretical_value <- median(x2) - median(x1) B <- 1000 bootstrap_samples <- vector of length B for i in 1:B{ xx1 <- sample of size n1 with replacement from x1 xx2 <- sample of size n2 with replacement from x2 Tboot[i] <- median(xx2) - median(xx1) } se <- sqrt(variance(Tboot))