Residuals – What Does the Y-Axis in Frequency vs Residuals Graph Mean
frequencyresiduals
(Asner et al 2013)
This is on a paper I'm reading, and I get the residuals part, but I'm not so sure on what the frequency means. Can someone please explain these graphs?
Please let me know if you need more information.
Best Answer
The units on the vertical axes are relative frequencies per unit of $x.$ That is, these plots are histograms. They represent relative frequency in terms of areas under the curve rather than by heights of the curve.
The way you can tell is that the areas under all the graphs are unity. A quick visual check is to approximate one of these graphs as a triangle. For instance, the red Barro Colorado graph has a base of approximately $40 - (-75)=115$ and a height of $0.015,$ so its area must be close to $(1/2)\times 115\times 0.015 \approx 0.86,$ which is practically $1$ for such a rough estimate. The other graphs similarly check out.
According to the units calculus, then, the units on the vertical axes must be
Both equations have an $\epsilon$ term. If you model that as two equations, that's fine. But what if you model it as one equation - do you want to assume that the $\epsilon$ terms are uncorrelated? If you do, then don't correlate them - as in, don't put estimate a correlation in the residual. Usually you don't, so you'd correlate the residuals.
An example: Say you want to look at the effect of age (in adults) on: speed at running 100m, speed at running 5 miles. I'd expect a negative relationship for both of these, but if you modeled them in one equation, you'd expect unexplained variance in 100m running speed to be correlated with 5 mile running speed, controlling for age - so the residuals are correlated.
You can also think of this in terms of latent variables - there are common causes of the residual for both 100m and 5 mile speeds, and hence you can hypothesize the existence of a latent (unmeasured) variable.
The sampling difference is a problem. To compare apples-to-apples, both series need to be based on the same frequency and timing. In this case, if Series B is based on the last day of each month, then Series A also needs to be based on the same days. If you can't get daily data for Series A then you may need to interpolate on the weekly data.
If seasonality is involved, then using weekly data is an even bigger problem. The main seasonality issue with weekly data is that there aren't 52 weeks in a year. Using 365 days per year and 7 days a week gives 52.14 weeks per year. That's not an integer number, which means that when a year ends, the associated week may or may not end. As result, all calculations have to be modified to reflect that. Monthly data has exactly 12 months per year. Each month may not have the same number of days, but that is typically understood for monthly data.
Best Answer
The units on the vertical axes are relative frequencies per unit of $x.$ That is, these plots are histograms. They represent relative frequency in terms of areas under the curve rather than by heights of the curve.
The way you can tell is that the areas under all the graphs are unity. A quick visual check is to approximate one of these graphs as a triangle. For instance, the red Barro Colorado graph has a base of approximately $40 - (-75)=115$ and a height of $0.015,$ so its area must be close to $(1/2)\times 115\times 0.015 \approx 0.86,$ which is practically $1$ for such a rough estimate. The other graphs similarly check out.
According to the units calculus, then, the units on the vertical axes must be
because relative frequencies are unitless.