Larry Wasserman defines a parametric model as a set of distributions "that can be parameterized by a finite number of parameters." (p.87) In contrast a nonparametric model is a set of distributions that cannot be paramterised by a finite number of parameters.
Thus, by that definition standard logistic regression is a parametric model. The logistic regression model is parametric because it has a finite set of parameters. Specifically, the parameters are the regression coefficients. These usually correspond to one for each predictor plus a constant.
Logistic regression is a particular form of the generalised linear model. Specifically it involves using a logit link function to model binomially distributed data.
Interestingly, it is possible to perform a nonparametric logistic regression (e.g., Hastie, 1983). This might involve using splines or some form of non-parametric smoothing to model the effect of the predictors.
References
- Wasserman, L. (2004). All of statistics: a concise course in statistical inference. Springer Verlag.
- Hastie, T. (1983). Non-parametric logistic regression. SLAC PUB-3160, June. PDF
You quote several pieces of advice, all of which is no doubt intended helpfully, but it is difficult to find much merit in any of it.
In each case I rely totally on what you cite as a summary. In the authors' defence I would like to believe that they add appropriate qualifications in surrounding or other material. (Full bibliographic references in usual name(s), date, title, (publisher, place) or (journal title, volume, pages) format would enhance the question.)
Field
This advice is intended helpfully, but is at best vastly oversimplified. Field's advice seems to be intended generally; for example, the reference to Levene's test implies some temporary focus on analysis of variance.
For example, suppose I have one predictor that on various grounds should be logged and another indicator variable that is $(1,0)$. The latter (a) cannot be logged (b) should not be logged. (Indeed any transformation of an indicator variable to any two distinct values has no important effect.)
More generally, it is common -- in many fields the usual situation -- that some predictors should be transformed and the rest left as is.
It's true that encountering in a paper or dissertation a mixture of transformations applied differently to different predictors (including as a special case, identity transformation, or leaving as is) is often a matter of concern for a reader. Is the mix a well thought out set of choices, or was it arbitrary and capricious?
Furthermore, in a series of studies consistency of approach (always applying logarithms to a response, or never doing it) does aid enormously in comparing results, and differing approach makes it more difficult.
But that's not to say there could never be reasons for a mix of transformations.
I don't see that most of the section you cite has much bearing on the key advice you highlight in yellow. This in itself is a matter of concern: it's a strange business to announce an absolute rule and then not really to explain it. Conversely, the injunction "Remember" suggests that Field's grounds were supplied earlier in the book.
Anonymous paper
The context here is regression models. As often, talking of OLS strangely emphasises estimation method rather than model, but we can understand what is intended. GWR I construe as geographically weighted regression.
The argument here is that you should transform non-normal predictors and leave the others as is. Again, this raises a question about what you can and should do with indicator variables, which cannot be normally distributed (which as above can be answered by pointing out that non-normality in that case is not a problem). But the injunction has it backwards in implying that it's non-normality of predictors that is the problem. Not so; it's no part of regression modelling to assume anything about marginal distributions of the predictors.
In practice, if you make predictors more nearly normal, then you will often be applying transformations that make the functional form $X\beta$ more nearly right for the data, which I would assert to be the major reason for transformation, despite the enormous emphasis on error structure in many texts. In other words, logging predictors to get them closer to normality can be doing the right thing for the wrong reason if you get closer to linearity in the transformed space.
There is so much extraordinarily good advice on transformations in this forum that I have focused on discussing what you cite.
P.S. You add a statement starting "For instance, in a comparison of means, comparing logs to raw data would obviously yield a significant difference." I am not clear what you have in mind, but comparing values for one group with logarithms of values for another group would just be nonsensical. I don't understand the rest of your statement at all.
Best Answer
Why odds ratios look strange on transformed variables
Transformations change the metric of the variable. Odds ratios are the predicted difference in odds for a one unit increase on the IV holding all other IVs constant. The meaning of one unit will be very different after a square root transformation.
For example, if you had a 1 to 100 raw scale, then after transformation, the difference between 16 and 25 on the raw scale would be the same as the difference between 4 and 5 on the square root transformed scale. Thus, it's not surprising that your odds ratios became a lot larger after square root transformation.
If you want to examine the effect of the transformation in a scaling-neutral way, you could standardise your IVs (i.e., make them z-scores). Thus, you could compare the odds ratio of a z-score of the raw variable to a z-score of the transformed variable. This will allow you to isolate the effect of changing the relative distance between categories.
Whether to transform non-normal predictors in logistic regression
Normality of predictors is not an assumption of logistic regression, or linear regression for that matter. See @whuber's answer here for more details.
That said, you may find one scaling of your IVs more predictive or interpretable. I'd use criteria like that to decide whether you want to transform a predictor variable.