When referring to Box-Cox transformations there are really 2 concepts that look like they are being mixed up. The first is what the original paper was about, the methodology of finding a transformation within a family of transformations that gives the "best" transformation assuming the truth results in normal residuals with equal variance and a linear relationship. This is what you already did with the response (dependent) variable.
In the paper Box and Cox presented a family of transformations (actually the paper has more than 1, but variations on the main one are what people mainly refer to), this family of transformations (or variations on it) is also often called the Box-Cox transformations. If you want to apply one of this family of transformations to a predictor variable, then just plug the variable into the formula.
To determine what types of transformations to apply to predictor variables, the most important thing to use is knowledge about your data and the science behind it. How do you expect weight to change with age, height, or income? (I would be surprised if a Box-Cox transform is the best for any of these).
There are tools like spline fits, ACE, AVAS, and others that can suggest transformations, but you need to use knowledge and common sense to convert these into meaningful transformations.
(mostly copied from the comment by Nick Cox) The Box-Cox transform does not really fail: it is more that it is unnecessary, as there will be no need of transformation if max/min is small. Mostly, with max/min small all the observations are away from zero (relatively), so the power transform will be well approximated linearly over a short interval!
Best Answer
It could be used to describe that but it will typically mean more than that.
Consider that if you just look at $Y$ and find a Box-Cox transformation before you consider your $x$-variables, you're looking at the marginal distribution for $Y$, when the issue in regression is really (a) the shape of the relationships with those predictors and (b) its conditional distribution (especially getting things like conditional variance reasonably close to constant). As such you can't really hope to find a suitable transformation without doing it within the context of the regression itself.
So typically this would be "simultaneous" with the regression, not doing one thing then the other. For example, to use the
MASS::boxcox
function in R you pass it a model object. If you give it the same $y$ but a different model the estimate of $\lambda$ you end up with is different.However, once you have an estimate of $\lambda$ in the context of a model, you can then transform your $y$ variable and rerun your model using regression (just as the routine to find suitable values of $\lambda$ does at each value of $\lambda$ it looks at).
No direct connection, outside the obvious one (Cox himself).