The distribution is simply the assignment of probabilities to sets of possible values of the random variable. If I tell you how probable it is that a certain random variable is between $3$ and $5$, and also how probably it is that it's in every other possible set, then I've told you the distribution. Since I can't do this for every set individually, since there are infinitely many sets, perhaps a more down-to-earth way to say this is this: Suppose $X$ and $Y$ are random variables. If it is true of every set that the probability that $X$ is in that set is the same as the probability that $Y$ is in that same set, then $X$ and $Y$ have the same distribution.
A probability density function is a way of characterizing some distributions. For example, consider the function
$$
f(x) = \begin{cases} 0 & \text{if }x<0, \\ e^{-x} & \text{if }x\ge 0. \end{cases}
$$
To say that this is the probability density function of a random variable $X$ is to say that for every measurable set $A$ of real numbers,
$$
\Pr(X\in A) = \int_A f(x)\,dx.
$$
The probability assigned to each set $A$ is given by the integral above. A more concrete example:
$$
\Pr(3<X<5) = \int_3^5 e^{-x}\,dx\text{ and }\Pr(X\ge 2) = \int_2^\infty e^{-x}\,dx.
$$
Not every probability distribution has a density. Say we let $X$ be the number of aces when a die is thrown four times. Then $X\in\{0,1,2,3,4\}$. The probability distribution assigns a positive number to every set that intersects that last set. For example the set $\{x : x\ge 3.2\}$ intersects $\{0,1,2,3,4\}$ and thus the probability distribution of $X$ assigns a positive number to that set. But there is no function $f$ such that for every set $A$ we have $\int_A f(x)\,dx$ equal to the probability that $X\in A$.
PS prompted by comments below: To put it in a different kind of language: Say $m$ is a measure (not necessarily assigning finite measure to the whole space) on the set of all measurable subsets of a space $S$. A probability density with respect to the measure $m$ is a measurable function $f:S\to[0,\infty)$ such that the function
$$
A\mapsto \int_A f\,dm
$$
is a probability measure on the set of measurable subsets of $S$.
A probability distribution on $S$ is simply a probability measure on the set of all measurable subsets of $S$. But not quite "simply": The probability distribution of a random variable $X:\Omega\to S$ is the probability measure on measurable subsets of $S$ that assigns measure $P(\{\omega\in\Omega : X(\omega)\in A\})$ to each measurable subset $A$ of $S$.
PPS: When $f\ge0$ is a measurable function on Borel or Lebesgue-measurable subsets of $\mathbb R$, one sometimes refers to the "measure" $f(x)\,dx$, meaning the measure
$$
A\mapsto \int_A f(x)\,dx.
$$
If in addition $\displaystyle\int_{\mathbb R} f(x)\,dx=1$, so that $f$ is a probability density, then one may similarly refer to the "probability distribution" $f(x)\,dx$.
(Of course, not all probability distributions on Borel subsets of the real line are of this form.)
Best Answer
In the theory of probability as developed by Kolmogorov, random variables just are measurable functions (provided that the underlying space is a probability space, which is just a space with a normalized finite measure).
That's actually the key idea to Kolmogorov's theory. It allows us to rigorously encode everything that one did with elementary probability theory into measure spaces and measurable functions. In this formulation, things like expected value (and other moments) are just linear functionals which allows one to use all of the results of measure theory and functional analysis (such as inequalities, convergence results, etc.) as tools for probabilistic calculations.