Algorithm For Ranking Dates Based On Deviation From Specific Date

algorithmsconic sectionslogarithms

I may be over-thinking this but it has been a while since I did proper maths (hoping to fix that soon) and so I am wondering the most concise way to accomplish this.

I am developing a weighted ranking of inventory based (in part) on availability of the inventory item. Some of the weights will just be booleans, so are easy.

For example, if the product is queried and the available date was in the past, the item gets 0 percent for the boolean score for Is Not In Past, one of my parameters. This is easy because if it is instead available in the future, it gets 100% for this parameter. This is what I mean by boolean in this case, that the item either gets the full score or nothing.

The issue I am having trouble is when I don't want to score as a boolean, (either/or value) but as a sliding scale. For example, if we determine due to seasonality that items available near Christmas are the most likely to be sold, this can't simply be a boolean, because if it is two days ahead or behind December 25th, it should have a slightly higher score than three days ahead of behind. That is, it isn't either/or.

My question is, given a date such as December 25th, 2018, how do I calculate the variance or deviation of a date from this date in a sliding scale way, so that the highest value or percent goes to those on or closest to the date, and it drops off the farther you get?

Ideally it would be an equation I could plug in UNIX or other date param into and get the deviation and have it pluggable into the same weighted system assigning percentages with the booleans.

So for example, for the availability category it has two parameters, each worth 50% of the total availability score/weight.

50% is either lost or gained on the boolean condition Is Not In Past and then 50% given based on how close the item is available to Christmas.

Example, an item sold on Christmas would be at 100% because it isn't in the past (automatic 50% based on boolean plus 50% for no variance from the date).

I am wondering how to do this, I imagine it as a parabola with Christmas in the center but when I looked up parabolic equations and deviations it was about a collection of values deviating from a mean. In this case I'm not interested in the mean of my values, just how much they deviate from a given date. Furthermore, if there are ways to have a linear scale up until a certain time is exceeded, that would be good too, for example, it goes down at a steady pace from Christmas until you hit 3 months, when the score declines sharply, and maybe drops off completely after a year.

I know I'm not posting this with equations and this is mostly text, which I apologize for, I'm just trying to explain my problem as I'm not versed enough to know the proper terms to google or equations to look up. Please assist or even just offer points in the right direction!

EDIT:

I decided to try a parabola, and did this:

$\left(x-1545696000000\right)^2\ =\ 4\left(-86400000\right)\left(y-100\right)$

1545696000000 is the UNIX timestamp in milliseconds of December, 25 2018.

I chose 86400000 as the focal width because I wasn't sure about my units or intervals as I moved away from 1545696000000, that is, it should only impact the score significantly if it is a day off, 86400000 is the amount of milliseconds in a day. I chose 100 because my highest score is a percent, so 100 at most.

I did a graph of this parabola and it seems right in terms of my concept, however I'm looking to get confirmation from brighter mathematical minds to know if this is a good approximation of what I wanted, I looked at this – What is the focal width of a parabola? – but was a little unsure.

EDIT 2: When I plot $x = 1545609600000$ the UNIX timestamp of the previous day, it intersects the main parabola where $y<0$ which is too harsh of a grade (0% for only one day off) – again I think this relates to the focal width but would like a mathematical principle on how to set it in this case rather than guessing.

EDIT 3: Perhaps logarithmic scale is best for this instead??

Example: $y=\log_{10}\left(\frac{x}{1545696000000} \right) + 100$ and I calculate the percentage offset from $y = 100$?

Best Answer

I've read through your descriptions and examples and it is not at all clear what you are trying to say, but I was able to glean a few things.

I think you probably want a normal distribution:

$$y = e^{-a(x - d)^2}\times 100\%$$

In programming parlance

y = 100 * exp(-a*(x - d)^2)

Where $x$ is the current date, $d$ is the target date (Christmas, in your example), $a$ is an adjustable parameter, and $y$ is the output value.

$y$ will be at $100\%$ when $x = d$, but the greater $x$ differs from $d$ in either direction, the lower $y$ will be. The parameter $a$ controls how fast $y$ decays as $x$ moves away from $d$. The bigger $a$ is, the faster $y$ will fall.

Some important characteristics of this function:

  • It is symmetric about $d$. 3 days before will have exactly the same value of $y$ as 3 days after.
  • $y$ never drops all the way to $0\%$. This is unlike your attempted formulas, which will drop below $0$. But the normal distribution just tapers off. 10,000 years in the future would still give a ridiculous small, but non-zero value (in exact mathematics - in computing, at some point it becomes too small for the computer to handle, so after that it will just return $0$).

There are a multitude of functions that you could use. If these characteristics are not right for your purpose, then there are other choices.

Related Question