The purpose of mathematical statistics is the determination of properties of a (usually large) population based on a so-called random sample. What we do is to pick one individual "at random" (that means that each individual has the same chance of being chosen) and record the value of data (e.g. length, height, weight, ...) associated with this individual. This value
is a random variable whose distribution is the (relative) frequency distribution of that value of data within the population. If we repeat this process
times, we get a random sample
.
Sampling can be done with or without replacement which means that one certain individual can by chosen more than once or only once respectively. The former leads to a sample
with independent, identically distributed (i.i.d.) random variables
. In the latter case
are not independent. However, if the population is large enough with respect to the sample size, there is almost no difference between these two methods, so we can assume that sampling is always done with replacement.
A random sample of size
from distribution
is a sequence
of i.i.d. random variables with common distribution
.
is called the sample size and
an observation.
Given a random sample, we would like to make some statement about the underlying distribution which is made possible by the Glivenko-Cantelli theorem: For a sequence
of i.i.d. random variables with common distribution
we define the empirical distribution function as
Then,
converges to
uniformly with probability one.
If the distribution of
can be characterized by one or more real numbers (parameters), we speak of parametric statistics, otherwise we speak of non-parametric statistics.
If
is a function from
to
and
is a random sample,
is called a statistic.
Important statistics are
A statistic
is called sufficient for the parameter
, if the conditional distribution of
given
does not depend on
.
An important criterion for sufficiency is the following: Let
be a parametric family of distributions dominated by a measure
, and
. The statistic
is sufficient for
if the so-called likelihood-function
admits a decomposition
where
does not depend on
.
|
|
||
|
|
|
|
|
|
||