[Math] How to interpolate between sets of data

data analysisinterpolation

I'm probably using the wrong terminology, making it difficult to find a starting point.

I have a set of motor data that looks like this:

enter image description here

I can easily create a trend line for a given flow rate (GPM). For example, the 4 GPM trend line is:

y = 6E-12x4 - 3E-08x3 + 8E-06x2 - 0.0503x + 1161.8

If I input 2300 psi for x I get the correct interpolation of 937.86 RPM.

Question

What I need is for the equation to take into account all flow rates dynamically.

i.e. The user will want to know what the RPM was for 2300 psi at 4.5 GPM.

So I need to interpolate between interpolations… I'm sure there is a standard term for this 😉

Answer doesn't have to be in excel but that's what I'm using to prove out the algorithm.

Also, I'm trying to automate this because I have multiple tables of data for different sizes of motors and different values such as speed and torque and want to allow the user to pick different motors to update their calculations automatically without pulling out the book.

Best Answer

Your data seems to be pretty smooth. A simple solution will be the so-called bilinear interpolation. For a given (PSI, GPM) pair you can find the four neighboring known data points: in the case of (2300, 4.5), the neighbors are 2030, 2400, and 4, 5.5.

You perform two independent linear interpolations on PSI, and then a final linear interpolation on GPM using these two interpolated values. (Note that reversing the order of interpolation, GPM then PSI, you get the same results.)

Related Solutions

[Math] How to find a surface from two lines

Your lines must intersect to determine a plane; skew lines don't define a plane. But if they do intersect, then produce 3 non-collinear points $(x_1,y_1,z_1)$, $(x_2,y_2,z_2)$, $(x_3,y_3,z_3)$. In general, you will have a planar equation of the form $z = ax + by + c$ (except for the special case of a vertical plane). Your coefficients are the 3 unknowns, and you have three points. Using these 3 points, you can solve for your coefficients and get an equation for your plane.

Since you have two different lines, you can choose any two points on one of the lines, and a third point on the other line; you do not need to use their point of intersection. For convenience, try taking as many of the coordinates to be zero as possible.

In practical applications, your data will not be this nice. You will have a number of points scattered throughout space, not all lying on the same plane. Every three of these determines a plane, so it's typical to form a triangular mesh by repeatedly applying the above method for each local triplet. It's also more common to form two vectors and take a cross product to get a normal to the plane, and to write it in point-normal form; you may want to look into that method after you're comfortable with the one I mentioned above, especially if you're already familiar with vectors and the cross product.

Update: Now that I see the actual situation you have, your question is not basic at all. In fact, you have almost no hope of getting what I think you want out of your data. Your interpolated guess in the second figure is not likely close to the truth, but it also not the worst thing you could do with your data. I don't know what algorithm you used (or your software used) to get the interpolation, but there are other ways to guess the value of your function at the missing points. You should read the various methods outlined at http://en.wikipedia.org/wiki/Multivariate_interpolation. This is not my area of expertise, so maybe someone else will weigh in, but I think that you can probably find a method there that produces results that you prefer to the interpolation you already have, and also with some parameters that let you fine-tune the results. I think you'll probably end up with, at best, a very bad guess for the missing data, especially if you want to interpolate far from either of the curves. You will also need to choose a discrete subset of your curves to employ most of the methods, but I think you probably already have a discrete dataset that you used to produce them in the first place.

[Math] How to appropriately normalize financial data

Wow...detailed question!! But very interesting :) Having some background in economics and finance, I will offer some ideas on this, but note that I am not a professional banker or underwriter, so please confer with your colleagues and the literature before using anything from this site (which I am sure you were going to do anyway).

I agree with you that your colleagues' approach seems confused, especially in the sense that its normalization on term length assumes that your risk/reward function is perfectly linear (i.e., twice the margin for twice the term length), which is probably not true since there is a qualitative, human element in setting term length and margin.

From your description, your problem is actually multi-dimensional, and hence cannot be so conveniently normalized to guage performance. Below, I will offer some ideas that may help you better understand where you are mispricing risk.

First, some assumptions and definitions (please correct if I am wrong):

Your firm is risk-averse, and so requires increasing profits from riskier investments. Note that there is no objective way to price risk, as there is always this subjective element to it. However, once you know your risk posture, you can make progress.
Lets change the definition of margin ($m$) to be the profit margin, i.e., $m=\frac{anticipated\space revenue}{purchase\space price}$. This removes the effect of the loan amount from our evaluation.
The expected margin ($p$) is the margin ($m$) adjusted for the default rate ($d$): $p=(1-d)m$
The subjective risk, $r$, is a function of the specific client information $C$. For a given risk level $r$, the possible loan terms can be described by a function of the margin $m$ and term length $t$, hence: $r(C) = f(m,t)$. This is an implict relationship, as both the left and right hand side are the results of human evaluations, but formally this is what you are doing.
The default rate is a function of the risk and the loan terms: $d=g(m,t,r)$.
I am ignoring any internal discount factors you may be using to get the NPV of your loans. You will need to adjust your expected margins for the time value of money if you think this will be relevant.

OK, now lets see what we can do given the above:

The key to evaluating your performance will be to verify that you are acting risk-averse (i.e., consistent with your risk posture). To do this, you will need to do two things, one difficult, one relatively easy:

Hard part: You will need to know what "risk category" or "risk level" your analysts assigned to each loan at the time of application (not ex post facto). If you already have such a system in place, then use those risk categories, if not, you will need to use the assigned margins and payback periods to infer the risk. A simple function that will do this is $r(m,t)=\frac{m}{t}$. This function assumes that if the loan periods are the same, then the one with the higher margin is assumed to have been preceived as riskier. Likewise, if both have the same margin but one has a longer period than another, then it is assumed that the one with the longer period is less risky. The exact risk may be some power of this ratio or some multiple of it, but at least you will be correctly ordering your loans by preceived risk.
Easy Part: Calculate the actual margin, $\hat m = \frac{actual\space revenue}{purchase\space price}$ for each loan.

To get a measure of performance, you will want to perform a regression using your observed triples $x_i \equiv (r_i,t_i,\hat m_i)$, with $r_i$ and $t_i$ being the predictors and $\hat m_i$ being the response. Specifically, we will model $\hat m_i$ as follows:

$\hat m_i = \varepsilon (r_i t_i)^k$, where $k$ is an unknown parameter and $\varepsilon$ is a lognormal random variable on $[0,\infty)$ with logmean $\mu$ and logvariance $\sigma^2$ (both unknown).

I chose the lognormal for computational convenience. A full, albeing more complex, treatment would require generalized linear models, which I think may be too much for this application.

Taking the natural logarithm of both sides, we get the usual linear regression equation with normally distributed errors:

$\ln(\hat m_i) = \ln(\varepsilon)+k\ln(r_it_i) = \ln(\varepsilon)+ks_i = $, where $s_i = \ln(r_it_i)$

You can now estimate $k$ by performing a simple linear regression with $\ln(\hat m_i)$ as the response and $s_i$ as the predictor.

The regression output (from excel, Minitab, or whatever you use) should give you a confidence interval or standard error and degrees of freedom for the slope parameter $k$. You will want to test that $k>1$ vs $k\leq 1$. If $k>1$, it means that you are acting in a risk averse manner and are hence properly pricing your risk.

For a more detailed view, you can make a 3-D plot of your original triples to see if there are certain subsets where the $\hat m$ surface "slopes downward" significantly. You may be good at identifying very low and very high risks but are inconsistent in the middle risk ranges.

Best Answer

Related Solutions

[Math] How to find a surface from two lines

[Math] How to appropriately normalize financial data

Related Question