# Parametric and Non-Parametric models

A **statistical model** is a set of **distribution** or a set of **densities**. A **parametric model** is a statistical model that can be parameterized by a **finite** number of parameters. For example, the normal distribution is parameterized by two parameters μ and σ. The **non-parametric models** can’t be parameterized by a **finite** number of parameters. The amount of information that these parameters capture grows with the increasing size of data. An example of non-parametric models is CDF of a distribution.

A parametric model takes the form,

where θ is an unknown parameter (or a vector of parameter) than can take values from Θ.

To have intuitive sense of parametric and non-parametric models, let’s relate them to the machine learning algorithms. A machine learning problem can be summarised as an equation: *Y = f(x)*

An algorithm learns this target mapping from the data. Let’s consider a Linear regression problem with two variables/features. The equation would be:

Y = b0 + b1*x1 + b2*x2

Where b0, b1, and b2 are coefficients that our model tries to learn so as to predict Y for input x1 and x2. The only problem here is the estimation of the coefficients b0, b1, and b2. The parameters here are fixed in numbers i.e. you estimate the parameters and the model is ready. This is an example of a parametric model. Other examples of parametric models are Linear Discriminant Analysis, Simple Neural Networks, Perceptron, Naive Bayes etc.

Non-parametric models don’t have a fixed/finite number of parameters. For example, k-nearest neighbour algorithm performs classification according to the class of k nearest neighbour’s class. These kinds of models are more powerful and flexible for large complex data but are more prone to overfitting. Other examples are Support Vector Machines and Decision Trees like CART and C4.5.

### Some definitions

**One-dimensional Parametric estimation**: Let X_{1}, . . . , X_{n}be independent**Bernoulli(p)**distribution. The problem is to estimate the parameter**p**.**Two-dimensional Parametric estimation**: Suppose that X_{1}, . . . , X_{n}~ F and we assume that the data has a**normal**distribution. Then the problem is to estimate the**µ and σ**. If we are interested in only one parameter, say µ, then σ is a**nuisance parameter**.**Non-parametric estimation of CDF**: Let X_{1}, . . . , X_{n}be independent observations from a CDF F. The problem is to estimation F assuming**F ∈ {all CDF’s}**.**Non-parametric density estimation**: Let X_{1}, . . . , X_{n}be independent observations from a CDF F and let f = F’ be the PDF. The problem is to estimate f. Here we can’t just assume that f ∈ {all pdf’s}. We need to assume smoothness on f.**Non-parametric estimation of functionals**: Let X_{1}, . . . , X_{n}~ F. Any function of F is called a**statistical functional**. Examples of functions are mean, variance and median.

There are three main areas of interest in inferential problems

1. Estimation

2. Confidence set

3. Hypothesis testing

Let’s start with a brief introduction to each of them.