When I look at the Wikipedia entry for biostatistics, the relation to biometrics doesn't seem so obvious to me since, historically, biometrics was more concerned with characterizing individuals by some phenotypes of interest, with large applications in population genetics (as exemplified by the work of Fisher), whereas part of this discipline now focus on biometric systems (whose objectives are the "recognition or identification of individuals based on some physical or behavioral characteristics that are intrinsically unique for each individual", according to Boulgouris et al., Biometrics, 2010). Anyway, there still are reviews like Biometrika and Biometrics; although I read the latter on an irregular basis, most articles focus on "biostatistical" theoretical or applied work. The same applies for Biostatistics. By "biostatistical" applications, I mean that it has to do with applications or models related to the biomedical domain, in a wide sense (biology, health science, genetics, etc.).
According to the Encyclopedia of Biostatistics (2005, 2nd ed.),
(...) As is clear from the above examples,
biostatistics is problem oriented. It
is specifically directed to questions
that arise in biomedical science. The
methods of biostatistics are the
methods of statistics -- concepts
directed at variation in observations
and methods for extracting information
from observations in the face of
variation from various sources, but
notably from variation in the
responses of living organisms and
particularly human beings under study.
Biostatistical activity spans a broad
range of scientific inquiry, from the
basic structure and functions of human
beings, through the interactions of
human beings with their environment,
including problems of environmental
toxicities and sanitation, health
enhancement and education, disease
prevention and therapy, the
organization of health care systems
and health care financing.
In sum, I think that Biostatistics is part of a super-family--Statistics--, and share most of its methods, but has a more focused area of interest (hence, an historical background, specific designs, and a general theoretical framework) and dedicated modeling strategies.
E. L. Lehmann, in his classic Theory of Point Estimation, answers this question on pp 1-2.
The observations are now postulated to be the values taken on by random variables which are assumed to follow a joint probability distribution, $P$, belonging to some known class...
...let us now specialize to point estimation...suppose that $g$ is a real-valued function defined [on the stipulated class of distributions] and that we would like to know the value of $g$ [at whatever is the actual distribution in effect, $\theta$]. Unfortunately, $\theta$, and hence $g(\theta)$, is unknown. However, the data can be used to obtain an estimate of $g(\theta)$, a value that one hopes will be close to $g(\theta)$.
In words: an estimator is a definite mathematical procedure that comes up with a number (the estimate) for any possible set of data that a particular problem could produce. That number is intended to represent some definite numerical property ($g(\theta)$) of the data-generation process; we might call this the "estimand."
The estimator itself is not a random variable: it's just a mathematical function. However, the estimate it produces is based on data which themselves are modeled as random variables. This makes the estimate (thought of as depending on the data) into a random variable and a particular estimate for a particular set of data becomes a realization of that random variable.
In one (conventional) ordinary least squares formulation, the data consist of ordered pairs $(x_i, y_i)$. The $x_i$ have been determined by the experimenter (they can be amounts of a drug administered, for example). Each $y_i$ (a response to the drug, for instance) is assumed to come from a probability distribution that is Normal but with unknown mean $\mu_i$ and common variance $\sigma^2$. Furthermore, it is assumed that the means are related to the $x_i$ via a formula $\mu_i = \beta_0 + \beta_1 x_i$. These three parameters--$\sigma$, $\beta_0$, and $\beta_1$--determine the underlying distribution of $y_i$ for any value of $x_i$. Therefore any property of that distribution can be thought of as a function of $(\sigma, \beta_0, \beta_1)$. Examples of such properties are the intercept $\beta_0$, the slope $\beta_1$, the value of $\cos(\sigma + \beta_0^2 - \beta_1)$, or even the mean at the value $x=2$, which (according to this formulation) must be $\beta_0 + 2 \beta_1$.
In this OLS context, a non-example of an estimator would be a procedure to guess at the value of $y$ if $x$ were set equal to 2. This is not an estimator because this value of $y$ is random (in a way completely separate from the randomness of the data): it is not a (definite numerical) property of the distribution, even though it is related to that distribution. (As we just saw, though, the expectation of $y$ for $x=2$, equal to $\beta_0 + 2 \beta_1$, can be estimated.)
In Lehmann's formulation, almost any formula can be an estimator of almost any property. There is no inherent mathematical link between an estimator and an estimand. However, we can assess--in advance--the chance that an estimator will be reasonably close to the quantity it is intended to estimate. Ways to do this, and how to exploit them, are the subject of estimation theory.
Best Answer
Ex Ante means before the event. Ex Post means after the event. In this example, I think this means before and after the event that gives the statistical difference you're testing, respectively.
On the other hand, a priori and a posteriori are terms from philosophy, respectively denoting knowledge that is logically derived, and knowledge that requires empirical evidence. (Wikipedia)