Let X_{1}, . . . X_{n} be IID with PDF f(x; θ).

The likelihood function is defined by

The log-likelihood function is defined byℓ_{n}(θ) = log L_{n}(θ)The

maximum likelihood estimator MLE, denoted by θ̂_{n}is the value of θ that maximizes L_{n}(θ).

Example:

Suppose that X_{1}, . . . , X_{n} ~ Bernoulli(p). The probability function is f(x;p) = p^{x}(1−p)^{1−x}. The unknown parameter is p, then

where S = ∑X_{i}.

Hence*ℓ*_{n}(p) = S log p + (n − S) log(1 − p).

Taking the derivate of *ℓ*_{n}(p) and equating it to 0 gives us the MLE p̂ = S/n

### Properties of Maximum likelihood

- The MLE is
**consistent**: θ̂_{n}converges in probability to θ_{*}where θ_{*}denotes the true value of the parameter θ. - The MLE is
**equivariant**: if θ̂_{n}is the MLE of θ, then g(θ̂_{n}) is the MLE of g(θ). - The MLE is
**asymptotically normal**: θ̂_{n}– θ_{*}/ sê ⇝ N(0, 1) - The MLE is asymptotically optimal or efficient: roughly, this means that among all well-behaved estimators, the MLE has the smallest variance, at least for large samples.
- The MLE is approximately the Bayes estimator.