Is there a function or library in Python to automatically compute the best polynomial fit for a set of data points?
I am not really interested in the ML use case of generalizing to a set of new data, I am just focusing on the data I have. I realize that the higher the degree, the better the fit. However, I want something that penalizes or looks at where the error elbows. When I say elbowing, I mean something like this (although usually it is not so drastic or obvious):
One idea I had was to use Numpy's polyfit
(https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.polyfit.html) to compute polynomial regression for a range of orders/degrees. polyfit
requires the user to specify the degree of polynomial, which poses a challenge because I don't have any assumptions or preconceived notions. The higher the degree of fit, the lower the error will be but eventually it plateaus like the image above. Therefore, if I want to automatically compute the degree of polynomial where the error curve elbows e.g. if my error is E and d is my degree, I want to maximize (E[d+1]-E[d]) – (E[d+1] – E[d]).
Is this even a valid approach? Are there other tools and approaches, perhaps using well-established Python libraries like numpy
or scipy
, that can help finding the appropriate polynomial fit (without the order/degree being specified)? I would appreciate any thoughts or suggestions! Thanks!
Best Answer
Usually you would not fit polynomial models to your data willy nilly without good reasons. So assuming this is not a problem and is acceptable to you, I present two options: