The purpose of mathematical statistics is the determination of properties of a (usually large) population based on a so-called random sample. What we do is to pick one individual "at random" (that means that each individual has the same chance of being chosen) and record the value of data (e.g. length, height, weight, ...) associated with this individual. This value is a random variable whose distribution is the (relative) frequency distribution of that value of data within the population. If we repeat this process times, we get a random sample .
Sampling can be done with or without replacement which means that one certain individual can by chosen more than once or only once respectively. The former leads to a sample with independent, identically distributed (i.i.d.) random variables . In the latter case are not independent. However, if the population is large enough with respect to the sample size, there is almost no difference between these two methods, so we can assume that sampling is always done with replacement.
A random sample of size from distribution is a sequence of i.i.d. random variables with common distribution . is called the sample size and an observation.
Given a random sample, we would like to make some statement about the underlying distribution which is made possible by the Glivenko-Cantelli theorem: For a sequence of i.i.d. random variables with common distribution we define the empirical distribution function as
Then, converges to uniformly with probability one.
If the distribution of can be characterized by one or more real numbers (parameters), we speak of parametric statistics, otherwise we speak of non-parametric statistics.
If is a function from to and is a random sample, is called a statistic.
Important statistics are
A statistic is called sufficient for the parameter , if the conditional distribution of given does not depend on .
An important criterion for sufficiency is the following: Let be a parametric family of distributions dominated by a measure , and . The statistic is sufficient for if the so-called likelihood-function
admits a decomposition
where does not depend on .