It is not a convention, but quite often $\theta$ stands for the set of parameters of a distribution.
That was it for plain English, let's show examples instead.
Example 1. You want to study the throw of an old fashioned thumbtack (the ones with a big circular bottom). You assume that the probability that it falls point down is an unknown value that you call $\theta$. You could call a random variable $X$ and say that $X=1$ when the thumbtack falls point down and $X=0$ when it falls point up. You would write the model
$$P(X = 1) = \theta \\
P(X = 0) = 1-\theta,$$
and you would be interested in estimating $\theta$ (here, the proability that the thumbtack falls point down).
Example 2. You want to study the disintegration of a radioactive atom. Based on the literature, you know that the amount of radioactivity decreases exponentially, so you decide to model the time to disintegration with an exponential distribution. If $t$ is the time to disintegration, the model is
$$f(t) = \theta e^{-\theta t}.$$
Here $f(t)$ is a probability density, which means that the probability that the atom disintegrates in the time interval $(t, t+dt)$ is $f(t)dt$. Again, you will be interested in estimating $\theta$ (here, the disintegration rate).
Example 3. You want to study the precision of a weighing instrument. Based on the literature, you know that the measurement are Gaussian so you decide to model the weighing of a standard 1 kg object as
$$f(x) = \frac{1}{\sigma \sqrt{2\pi}} \exp \left\{ -\left( \frac{x-\mu}{2\sigma} \right)^2\right\}.$$
Here $x$ is the measure given by the scale, $f(x)$ is the density of probability, and the parameters are $\mu$ and $\sigma$, so $\theta = (\mu, \sigma)$. The paramter $\mu$ is the target weight (the scale is biased if $\mu \neq 1$), and $\sigma$ is the standard deviation of the measure every time you weigh the object. Again, you will be interested in estimating $\theta$ (here, the bias and the imprecision of the scale).
It means probability element.
This paragraph is taken from the following book chapter: Order Statistics
If $X$ is such that the probability $X\leq x$ is $F(x)$, or briefly if
$$
Pr(X\leq x)=F(x),
$$
then we say that $X$ is a random variable which has the cdf $F(x)$. If $F(x)$ has a continuous derivative $f(x)$, then $f(x)dx$ is called the probability element of $X$, and $f(x)$ the probability density function (pdf) of $X$.
I think it would have something to do with measure theory as well. Going over a Probability Theory course might help too.
Best Answer
Margins
Margins here refers to the values on the edges (margins!) of the table, that is, the total number of reds, total number of blacks, total number of drawn, and total number of not drawn. The related term marginal distribution refers to the distribution of a single variable obtained from a joint distribution of several variables by averaging over the other variables (etymologically, the term indeed comes from the values written on the margins of tables).
Conditioning
Conditioning refers to computing conditional distributions, that is, probability distributions given some information. Here, conditioning on the margins means that the margins are fixed, i.e., we assume that there are in total 6680 red balls (and 12160 black balls), as well as 382 drawn balls (and 18458 balls not drawn). So that, for example
would be a possible realization of our random distribution (the margins are the same). Under the null hypothesis that getting drawn and the color of the ball are independent, conditioning on the margins leads to the hypergeometric distribution.
Alternatively, if the experiment were such that one draws balls until 160 reds are obtained, it would not make sense to condition on the margins (as the total number of drawn balls could have turned out something else than 382). In this case, one could obtain realizations like
which would have different margins.