Solved – the possible reason that the R squared value equal to 1

bioinformaticscross-validationr-squaredweka

I have one natural data set of biological data from my lab such as binding energy, exon type and oligo length etc. And my goal is to train a model to predict skipping rate. I used weka with no filter on my data and set the number of fold for cross-validation to be 10. (I have 515 instances in total) I used the greedy option of funciton LinearRegression. My summary gave me a model to predict the skipping rate with exon type ignored and meanwhile got perfect R^2 = 1 and F-stat equals infinity. I don't trust this results and I want to prove it's invalid. However, even if I manually divide my data into training and testing data sets with 2/3 portion, shuffling both of them and run classification with the same LinearRegression evaluation function, I still got R^2 = 1 and F-stat equals infinity. What could go wrong in my analysis?

enter image description here

enter image description here

Best Answer

R^2 = 1 means you have a perfect fit of your data from a simple linear model. It's really hard to tell what's going on here without being familiar with your data. But from the output, it looks like your trying to predict a predicted value. If I'm understanding this correctly it sounds like you have a model that's modelling a modeled value. As long as the same model was used in both cases, that it would result in a perfect fit is clear.