I know count is simply the amount of times the observations occur in a single bin width, but what is density?
[Math] the Difference between Frequency and Density in a Histogram
statistics
Related Solutions
Do it in such a way that the area of each box is proportional to the number of data points, or to the probability (depending on what kind of histogram it is).
The units on the vertical axis should be the reciprocal of the units on the horizontal axis, since when you multiply them to get probability, it needs to be dimensionless. For example, if $x$ is in inches, $y$ is in units per inch or percent per inch.
The difference between the classical definition and the empirical are similar to the difference between a theory and an experiment in physics. The theory is developed in an abstract (perfect) way while the experiments are practical observations. The same happens with probability.
In the classic definition of probability of an event you will assign a probability to an event based on abstract thinking. For example : what is the probability of getting the result 2 when perform the calculation $\frac{2x}{2}$, where $x=1,2,\dots,100$? There are $n=100$ possible outcomes but the result 2 will only occur once, so $m=1$. This is because we can write the relation as $\frac{2x}{2}=\frac{2}{2}x=x$, so only $x=2$ gives the right result. In this case we will say that the probability is $1/100$. So, in classical probability you think of the space of the outcomes and try to find an abstract reason to assign the probability (we used mathematics logic to came up with the number of possibilities and the one of outcomes).
In the empirical definition, on the other hand, you don't think, you just do experiments and count. So, to solve the last problem , you will do as many calculations as you can from the 100 possible and count how many times you get 2. For example, if you perform this experiment on the first 10 numbers (N=10)you will get only once the result of 2 , (N(A)=1), so your estimate for the probability will be $\frac{1}{10}$. This is not the right probability, but more experiments you do, better the estimate is.Closer you get to exhausting the number of possible outcomes closer you are to the true probability.
Now, everything is fine when the possible outcomes are a in finite number. The classical approach gives the right result but might require complex thinking, while the empirical approach gives without effort an estimate that will improve with the number of "measurements/experiments".
What about when you have an infinite number of outcomes? For example: what is the probability of selecting the number 6 from a box with all the natural numbers from 1 to 100? What if in the box there are the numbers to n=10.000, or n=10000000000.......0? The classic definition has an answer for you. Since 6 is unique, the probabilities are $\frac{1}{100}$, $\frac{1}{10.000}$ and $\frac{1}{10000000....0}$. The last probability is almost zero, which is the case "$P(A)=\lim_{n\rightarrow \infty}\frac{m}{n}=0$".
The empirical definition will never give you a good answer for this question since it won't ever be able to exhaust the possible outcomes. If in N tries the experimenter doesn't select the number 6, then the probability will be indeed $\frac{N(A)}{N}=\frac{0}{N}=0$, but the results was "correct" only by the chance of the experimenter. Instead, if she selects 6 at the beginning of the experiment, the result is $\frac{N(A)}{N}=\frac{1}{N}$ and the experimenter will get closer to the result only after a tremendous number of experiments. We need to notice here that there is never in the goal of the empiricist to reach infinity ($N\rightarrow\infty$)since she is always working with finite samples, she doesn't look for perfect knowledge but for useful approximations.
Another example : what is the probability of tails when flipping a fair coin? The classic approach will argue that the probability of "tails" in one flip is $1/2$ because there are only two possible outcomes and "tails" is one of them $\frac{m}{n}=\frac{1}{2}$. The empiricist will do N experiments and will count how many times A=tails occurs and finds $\frac{N(tails)}{N}$. This will always give her a constant since N is always a finite number of experiments.
Apart from the discourse whether the empiricist wants to reach infinity, by the law of large numbers the average result from a large number of experiments will get closer to the expected value of the phenomenon studies. By this law $\lim_{N\rightarrow\infty}\frac{N(A)}{N}$ will $ converge $ (not equal) to the expected probability of the event A, which is a constant.
The axiomatic definitions are conceived in an abstract perfect manner such that no mathematical contradiction can occur. This makes possible building a solid theory by using mathematical logic. The probability axioms where first proposed by Kolmogorov and can be found here http://en.wikipedia.org/wiki/Probability_axioms.
Best Answer
Illustrations:
Suppose $X_1, X_2, \dots, X_{100}$ is a random sample of size $n$ from a normal distribution with mean $\mu=100$ and standard deviation $\sigma=12.$ Also, we have bins (intervals) of equal width, which we use to make a histogram.
The vertical scale of a 'frequency histogram' shows the number of observations in each bin. Optionally, we can also put numerical labels atop each bar that show how many individuals it represents.
The vertical scale of a 'density histogram' shows units that make the total area of all the bars add to $1.$ This makes it possible to show the density curve of the population using the same vertical scale.
From above, we know that the tallest bar has 30 observations, so this bar accounts for relative frequency $\frac{30}{100} = 0.3$ of the observations. The width of this bar is $10.$ So its density is $0.03$ and its area is $0.03(10) = 0.3.$ The density curve of the distribution $\mathsf{Norm}(100, 15)$ is also shown superimposed on the histogram. The area beneath this density curve is also $1.$ (By definition, the area beneath a density function is always $1.)$ Optionally, I have added tick marks below the histogram to show the locations of the individual observations.
Definitions: If the frequency of the $i$th bar is $f_i,$ then its relative frequency is $r_i = f_i/n,$ where $n$ is the sample size. Its density is $d_i = r_i/w_i,$ where $w_i$ is its width. Ordinarily, you should make a density histogram only if each bar has the same width.
Notes: (1) Another type of histogram (that you did not ask about) would be a 'relative frequency' histogram with relative frequencies (not densities) on the vertical scale. (2) The sample mean of the data shown is $\bar X =102.98$ and the sample standard deviation is $S = 15.37.$ (3) These histograms were made using R statistical software.