The value of x used to generate any point on that trend line is indeed the "very big number" that is the way Excel actually stores dates. It is - as you partially surmised - the number of days since Jan. 1, 1900 - so for any remotely current dates, the number is pretty large. If you are keeping not just dates, but dates and times - the time portion is kept as a fraction of a day (so 12 noon on a particular day is stored with a number 0.5 larger than the preceding midnight).
You will also want to be careful to stick just to the region of your data - with a 6th order polynomial - if you move very far outside the range between 2005 and 2012 - the values of the trend line are quite likely to become quite extreme (I think, looking at your coefficients, that if you tried to plot a value for a date in 1970 - or 2050 - it would be incredibly huge, but I might have things backwards, and those values could be extremely negative numbers).
Edited to add - User777 below is correct. Excel stores dates so that Jan. 1, 1900 is 1 - so it stores a number that is either the number of days since "Jan. 0, 1900" - or "Dec. 31, 1899" - whichever way is easier for you to understand.
What you are graphing is a cumulative distribution function ("cdf"). If your data samples were generated from a probability density function and you know what that function is, then you can find the formula for the cdf by computing its integral. If not, you'll have to do some empirical curve fitting if you want a formula for predicting Y from X.
(Before I continue - if all you want to do is predict the Y value from an X value, you don't necessarily have to compute a closed-form mathematical formula; following the instructions for "Part 2" of Avraham's answer above will get you what you want. To be more specific, say your 50000 data points are in rows 1 to 50000 of column A, and you want the percentage that are above (say) X=6540, then just type =COUNTIF(A1:A50000, ">6540") / COUNT(A1:A50000)
into any cell to get the percentage of the data points that are greater than that value of X.)
Okay, so let's say you want to fit a simple formula to this data. What you are looking at is some kind of sigmoid function; most cumulative distribution functions take this form. Hard to know exactly what sort of sigmoid without knowing where the data came from, but for ease of calculation, let's go with the general form of the sigmoid that you see at the top of the Wikipedia article about on the sigmoid function, which looks like this in Excelese (try plotting it with values of X ranging from -6 to 6):
=1/(1+EXP(-X))
To match your graph, we want to flip it around. In addition, we'll add two new variables.
One we'll call A, it controls the sharpness of the drop-off (values greater than 1 are sharper; positive decimal values closer to 0 are flatter. You'll probably need a value pretty close to zero). Another we'll call B - it's just the horizontal translation (how much the graph gets pushed to the left or right... probably should be a pretty large number to fit your data). The resulting formula is:
=(1-(1/(1+EXP(-((X-B)*A)))))
At this point it becomes a matter of parameter fitting - i.e. figuring out what values of A and B minimize the sum of squared differences between your data and the output of this function. You can do this with Excel via the Solver plugin, or with Eureqa Desktop. Although this method will probably not give you an exact function for the cdf, it should give you a pretty good fit.
Best Answer
Use
LINEST
, as shown:The method is to create new columns (C:E here) containing the variables in the fit. Perversely,
LINEST
returns the coefficients in the reverse order, as you can see from the calculation in the "Fit" column. An example prediction is shown outlined in blue: all the formulas are exactly the same as for the data.Note that
LINEST
is an array formula: the answer will occupy p+1 cells in a row, where p is the number of variables (the last one is for the constant term). Such formulas are created by selecting all the output cells, pasting (or typing) the formula in the formula textbox, and pressing Ctrl-Shift-Enter (instead of the usual Enter).