Solved – Multivariate (Polynomial?) Regression

machine learningmultiple regressionpythonscikit learn

I'm trying to solve one problem, I'm not sure how to do it…

Here is my problem (related to Heat transfer/Fluid mechanics) that I tried to simplify :

I have a few independents measurements $x_1, x_2, x_3, x_4$ and $y$
- These datas give me some parameters, using some constants :
  - $P_1 = x_1 + i$
  - $P_2 = j * x_1*x_2 $
  - $P_3 = x_3^\frac{1}{2} + k$
  - $P_4 = x_4^4 – \frac{x_3^4}{l}$
- Then I have : $$y = a_1 * P_1 + a_2* P_2 + a_3 * P_3 + a_4 * P_4$$

My questions are :

How can I find the coefficients $a_1, a_2, a_3$ and $a_4$ (using python) ?
- Shall calculate all $P_i$ and make a linear regression on the formula ?
- Shall expand everything and make a polynomial regression on $x_i$ ?
- Shall i do something else than regression (ANN, etc.) ?
- Am I missing something here ?
What should I do in case of one $P_i$ is dependent ?
For example : $P_5 = x_1(x_2+1) $ (which can be extracted from $P_1$ and $P_2$) with $y = … + a_5 * P_5$

Best Answer

You can calculate pi and run linear regression. You should not be confused about the term "polynomial regression". A regression on polynomial basis expansion (even some of the terms do not exists) can be called polynomial regression.
Drop the dependent variables or add regularization

Related Solutions

Solved – Interaction term in multivariate polynomial regression

Following the guidelines of Blackwell ("all models are wrong, but some are useful "), you have no obligation of including an interaction term, since:

you may not afford including an interaction term due to a small number of observations in your dataset;
you may have too much noise in your phenomenon so that including interaction terms can be futile, since you seldom will get them statistically significant; and
higher degree interaction terms (like $x_1^3 x_5^2 x_9^5$, for instance) will very, very, very rarely have any meaningful interpretation in terms of your real phenomenon (and it only gets worse if your $y$ is multivariate).

OK, in strict mathematical speak, you can (perhaps ought to) include them, but considering strictly modeling issues, most of time it will not make sense for your first approach to a phenomenon, albeit it can and will make more sense in the course of a series of improvements of an initial model, like, for instance, in the regression of the concentration of one product in function of the concentration of several reagents and catalysts in a very complex chemical reaction.

But please notice that all of it will make sense only if your experiments are extremely well controlled and if you have a horribly awful lot of observations, both conditions more compatible with a rather long series of experiments in a well consolidated research, not in a very first attempt to get to understand, say, a social phenomenon.

And, last but not least, as asked, $$ y = \sum_{i_1=0}^{3} \sum_{i_2=0}^{3-i_1} \sum_{i_3=0}^{3-i_1-i_2} \sum_{i_4=0}^{3-i_1-i_2-i_3} \beta_{i_1i_2i_3i_4} x_1^{i_1}x_2^{i_2}x_3^{i_3}x_4^{i_4} +\varepsilon, $$ which, according to my calculations, will be a hell of a long expression. =)

Something like $$ y = \beta_{0000} + \beta_{1000} x_1+ \beta_{0100} x_2+ \beta_{0010} x_3+ \beta_{0001} x_4+ \beta_{2000} x_1^2+ \beta_{0200} x_2^2+ \beta_{0020} x_3^2+ \beta_{0002} x_4^2+ \beta_{1100} x_1x_2+ \beta_{1010} x_1x_3+ \beta_{1001} x_1x_4+ \beta_{0110} x_2x_3+ \beta_{0101} x_2x_4+ \beta_{0011} x_3x_4+ \beta_{3000} x_1^3+ \beta_{0300} x_2^3+ \beta_{0030} x_3^3+ \beta_{0003} x_4^3+ \beta_{1110} x_1x_2x_3+ \beta_{1101} x_1x_2x_4+ \beta_{0111} x_2x_3x_4+ \ldots+ \varepsilon. $$

Edit: I took the time to write a lazy script in R to generate an expression in LaTeX with all the terms of the sum above, which output this: $$ y = \beta_{0000} + \beta_{0001} x_4 + \beta_{0002} x_4^2 + \beta_{0003} x_4^3 + \beta_{0010} x_3 + \beta_{0011} x_3 x_4 + \beta_{0012} x_3 x_4^2 + \beta_{0020} x_3^2 + \beta_{0021} x_3^2 x_4 + \beta_{0030} x_3^3 + \beta_{0100} x_2 + \beta_{0101} x_2 x_4 + \beta_{0102} x_2 x_4^2 + \beta_{0110} x_2 x_3 + \beta_{0111} x_2 x_3 x_4 + \beta_{0120} x_2 x_3^2 + \beta_{0200} x_2^2 + \beta_{0201} x_2^2 x_4 + \beta_{0210} x_2^2 x_3 + \beta_{0300} x_2^3 + \beta_{1000} x_1 + \beta_{1001} x_1 x_4 + \beta_{1002} x_1 x_4^2 + \beta_{1010} x_1 x_3 + \beta_{1011} x_1 x_3 x_4 + \beta_{1020} x_1 x_3^2 + \beta_{1100} x_1 x_2 + \beta_{1101} x_1 x_2 x_4 + \beta_{1110} x_1 x_2 x_3 + \beta_{1200} x_1 x_2^2 + \beta_{2000} x_1^2 + \beta_{2001} x_1^2 x_4 + \beta_{2010} x_1^2 x_3 + \beta_{2100} x_1^2 x_2 + \beta_{3000} x_1^3 + \varepsilon $$

The script itself is

d <- 3
formula <- "$$y = "  
`%$%` <- paste0
for (i1 in 0:d) {  
    for (i2 in 0:(d-i1)) {  
        for (i3 in 0:(d-i1-i2)) {  
            for (i4 in 0:(d-i1-i2-i3)) {  
                formula <-   
                    formula %$%   
                    "\\beta_{" %$%   
                    i1 %$% i2 %$% i3 %$% i4 %$%   
                    "}" %$%   
                    (if (i1>0) (" x_1" %$% if (i1>1) ("^" %$% i1))) %$%   
                    (if (i2>0) (" x_2" %$% if (i2>1) ("^" %$% i2))) %$%   
                    (if (i3>0) (" x_3" %$% if (i3>1) ("^" %$% i3))) %$%   
                    (if (i4>0) (" x_4" %$% if (i4>1) ("^" %$% i4))) %$%  
                    " + "  
            }  
        }  
    }  
}  
formula <- formula %$% "\\varepsilon$$"  
cat(formula)

At last, notice that it generated no fourth degree terms, like $\beta_{1111} x_1 x_2 x_3 x_4$, since you required a third degree polynomial ($d=3$) with four variables ($p=4$).

Solved – Is it possible to perform a regression where you have an unknown / unknowable feature variable

The complete formula for a linear model is (in quasi matrix form)

$$Y=\beta X+\epsilon$$

So we have multiple coefficents for the variables that we are controlling for, and then we have $\epsilon$, which is everything else which we did not explain with our included variables.

In this error term belong all the variables which we did not consider, either because we do not have information for them or because we simply do not know of them (random deviation).

So there is just no way for you to know what in this term belongs to what unknown term.

Best Answer

Related Solutions

Solved – Interaction term in multivariate polynomial regression

Solved – Is it possible to perform a regression where you have an unknown / unknowable feature variable

Related Question