Empirical Distribution Function and Estimation of Statistical Functionals
When starting with the inference problem, the most basic is the non-parametric estimation of CDF and functions of CDF.
Let X1, . . . , Xn ~ F be IID where F is a distribution function on the real line. The emperical distribution function F̂n is the CDF that puts mass 1/n at each data point Xi, Formally
To understand EDF/ECDF (Empirical Cumulative Distribution function), it gives the fraction of sample observation less than or equal to a value of x.
- E(F̂n(x)) = F(x)
- V(F̂n(x)) = F(x)(1 – F(x)) / n
- MSE = F(x)(1 – F(x)) / n → 0
Plug-in estimator (for estimating functions of CDF)
A statistical functional T(F) is any function of F. Examples are mean µ = ∫xdF(x) and variance σ2 = ∫(x – µ)2dF(x) etc
The plug-in estimator of θ = T(F) is defined by
θ̂n = T(F̂n)
In other words just plugin F̂n (EDF) for the unknown F.
A functional of the form ∫r(x)dF(x) is called a linear functional. The EDF F̂n(x) is discrete, putting mass 1/n at each Xi. Linear functional for discrete is defined to be ∑jr(xj)f(xj)
The plug-in estimator for linear functional is:
Example: Estimating mean.
Let µ = T(F) = ∫xdF(x).
The plugin estimator is μ̂ = ∫xdF̂n(x) = X̄n
The standard error is se = √V(X̄n) = σ/√n.
The estimated standard error is σ̂/√n