It sounds to me that you need to detect the tempo of the music, and not the pitch. If you are trying to use a pitch-detection algorithms, then these are going to fluctuate rapidly, as they will lock onto the high frequences in your music. It sounds like you need something that filters out all but the lowest frequencies and allows you to determine how many BPM (beat per minute) the music is, as well as the phase of the beat also, so that you can do beatmatching as you originally mention.
However, I don't think that anyone here is going to be able to give you a simple formula for doing this directly from the samples. Digital signal processing is, by its very nature, a fairly mathematical subject. I do think that if you try Googling for "beat-matching signal-processing", or "beat-matching matlab", you will be pointed in the right direction, as you might find a published algorithm for doing exactly what you need. For instance, I found the following paper by searching: Design of an Automatic Beat-Matching Algorithm for Portable Media Devices. It might be worth looking at if you can get it without paying, say through a university with a subscription. Otherwise, I'm sure there are 100's of similar papers you can find on this subject. Also, many universities teach an audio signal processing class, and often the notes from these classes are online. Beat-matching is a common project for students to try in such classes and I'm sure you will be able to find some examples where people have done it.
Sorry I couldn't give you more explicit advice, but I hope I understood your question correctly and have pointed you in the right direction. Good luck.
Let us imagine that your factory manufactures two products, one of which is small, and the other is large. These products are shipped out in boxes. Suppose that your boxes come in two sizes, small and large. Suppose further that you can ship a small product in a large box, but that you cannot ship a large product in a small box.
Instead of products / boxes of various sizes, a more information-theoretic way of looking at things would be to think of the factory as a binary source, and to view the box-enlargement process as a binary channel. Let $X$ and $Y$ be discrete random variables with alphabets $\mathcal{X}$ and $\mathcal{Y}$, respectively, where $\mathcal{X} = \mathcal{Y} = \{0,1\}$. If the output of the production line is a small product, then $X = 0$, otherwise $X = 1$. If a small box is shipped out, then $Y = 0$, otherwise $Y = 1$. Hence, the random variable $X$ gives us the size of the product, while the random variable $Y$ gives us the size of the box. We can view $X$ and $Y$ as the input and output of a binary channel, respectively.
To deceive your competitors, every time a small product is ready to be shipped you flip a coin and, depending on the outcome, you choose to ship the small product in a large box or not. If you do so, then $X = 0$ and $Y = 1$. The "channel" has introduced an error. The channel is defined by the transition probabilities
$\{ \mathbb{P}[Y = 0 \mid X = 0], \mathbb{P}[Y = 1 \mid X = 0], \mathbb{P}[Y = 0 \mid X = 1], \mathbb{P}[Y = 1 \mid X = 1] \}$.
A competitor observes the sizes of the boxes being shipped out and tries to infer what the actual sizes of the products inside the boxes are. In other words, your competitor would like to infer what the probability mass function (p.m.f.) of $X$ is, knowing only the p.m.f. of $Y$. To keep your competitor maximally confused, you would like to maximize the conditional entropy $H (X \mid Y)$, which is the uncertainty about $X$ given $Y$. Recall that the mutual information is
$I (X;Y) = H(X) - H(X \mid Y)$
and it gives us the reduction in the uncertainty of $X$ due to knowledge of $Y$. We would like to minimize the mutual information, which is equivalent to maximizing the conditional entropy $H(X \mid Y)$, as $H(X)$ is fixed (depends on the p.m.f of $X$, which is assumed to be fixed).
The mutual information can be written as $I(X;Y) = D( p(x,y) \| p(x) p(y) )$, which is the Kullback-Leibler distance between the joint p.m.f. and the product of the marginal p.m.f.'s. Check [1] for details. Therefore, you have a relative entropy minimization problem.
Usually, we are given the channel, and we choose the p.m.f. of $X$ that maximizes the mutual information $I(X;Y)$. In this problem, we are given the p.m.f. of $X$, and we choose the channel that minimizes the mutual information. It's a sort of "dual" of finding the capacity of a given channel.
References:
[1] Thomas M. Cover and Joy A. Thomas, Elements of Information Theory, John Wiley & Sons 2006.
Best Answer
"The K-L divergence is only defined if P and Q both sum to 1 and if Q(i) > 0 for any i such that P(i) > 0."
I suspect that the second condition is your problem. Say that you have x which appears in P but not Q -- in this case you're probably adding zero contribution to the sum in your code so that you don't have to divide by zero or take the logarithm of zero, but this is effectively throwing out mass from P and you get a negative number for the divergence.
http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence