There are a lot of questions here, but I'll try to answer them all.
Should every mathematical theory take place in a ∞-category? Or is 'real' mathematics basically evil?
I would say that all mathematics should take place in its natural context. Sometimes you have things that are sets where equality makes sense, like an ordinary presheaf, and then you work in a 1-category. Sometimes you have things where only isomorphism makes sense, like a presheaf of categories, and then you work in a 2-category. Etc.
It is true that any n-category for finite n can be considered a special case of an ∞-category with only identity cells above n, so in this degenerate sense all n-categories are ∞-categories, and thus one might say that "all mathematics takes place in an ∞-category" — at least if one believes that all mathematics takes place in an n-category for some n! But even that is not clear, e.g. some mathematics naturally takes place in other categorical structures, such as a double category or a proarrow equipment. Some mathematics uses no category theory at all (at least as far as anyone has noticed so far), and so it would be a stretch to say that it takes place in any sort of category.
Anyway, may we think of it as a usual functor, without turning into troubles? Or is it important, in practice, to have this higher category theoretic point of view? Or is it possible to turn this functor into a honest functor, by choosing the tensor products $M\otimes_A B$ carefully?
I would say qualified yes, yes, and yes, respectively. You can think of it as a usual functor as long as doing so doesn't cause you to think that it behaves in any way that a pseudofunctor doesn't! Which is sort of a vacuous statement, but the point is that pseudofunctors really shouldn't be a very scary concept (as opposed to a technical definition, which might be a bit complicated, though cf. Harry's comment) — they really are just like ordinary functors, except that you're dealing with things (e.g. categories) for which it doesn't really make sense to ask morphisms to be equal, only isomorphic.
On the other hand, the "higher category theoretic" fact that pseudofunctors are not all strict functors is very important. I believe that Benabou, the inventor of bicategories, once said that the important thing about bicategories is not that they themselves are "weak," but that the morphisms between them are weak. In particular, although every bicategory is equivalent to a strict 2-category, not every pseudofunctor between bicategories is equivalent to a strict functor.
But on the third hard, it is true that any pseudofunctor with values in the 2-category Cat is equivalent to a strict functor. In the language of fibrations, this says that any fibration is equivalent to a split one. Tyler mentioned one construction of an equivalent strict functor in the case of modules and tensor products. There is also a general construction which, applied to the case of modules, will replace $Mod_A$ by a category whose objects are pairs (M,φ) where M is an R-module and φ:R→A is a ring homomorphism. We regard such a pair as a formal representative of $M\otimes_R A$ and define morphisms between them accordingly, to get a category eequivalent to $Mod_A$. Now the extension-of-scalars functor $\psi_!:Mod_A \to Mod_B$ is represented by the functor taking a pair (M,φ) to (M,ψφ), which is strictly functorial since composition of ring homomorphisms is so.
I'm not entirely sure what you're looking for in an answer, but maybe I'll flesh out my comment.
It looks like what you're describing is equivalent to the homotopy category associated to the model structure on Cat where the weak equivalences are equivalences of categories. (I can say "the" because there is only one such, as pointed out in the comments. The cofibrations are functors injective on objects, and the fibrations are "isofibrations".)
I would say that in this context your category has been much studied. In particular, it is interesting to ask questions about homotopy limits and colimits in this category because many useful constructions arise in this way. (Homotopy (co)limits with this model structure are the same as "2-(co)limits" which is the name appearing in most of the literature, especially older literature.)
An example application of this language is the following theorem: The subcategory of presentable (resp. accessible) categories is closed under homotopy limits.
Using this one can prove that most of your favorite things are presentable (resp. accessible). For example, the category of modules over a monad arises via a homotopy limit construction, and this takes care of most things of interest.
Here's a neat application of this (which is the ordinary category version of a result that can be found, for example, in Lurie's HTT, 5.5.4.16.).
Say you want to localize a category $\mathcal{C}$ with respect to some collection of morphisms, $S$. Usually $S$ will not be given as a set, but if $\mathcal{C}$ is presentable you're usually okay if $S$ is generated by a set. Well, it turns out that if $F: \mathcal{C} \rightarrow \mathcal{D}$ is a colimit preserving functor between presentable categories, and $S$ is a (strongly saturated) collection of morphisms in $\mathcal{D}$ that is generated by a set, then $f^{-1}S$ is a (strongly saturated) collection of morphisms generated by a set. The argument goes by way of showing that the subcategory of the category of morphisms generated by $f^{-1}S$ is presentable, using a homotopy pullback square.
Adapting this to the model category or $\infty$-category setting, one sees immediately that localizing with respect to homology theories is totally okay and follows formally from this type of argument. (Basically, after fiddling around with cells to prove the category of spectra is presentable, you don't have to fiddle any more to get localizations. This is in contrast to the usual argument found in Bousfield's paper. You've moved the cardinality bookkeeping into a general argument about homotopy limits of presentable categories.)
Anyway, apologies for the very idiosyncratic application of this language; these things have been on my mind recently. I'm sure there are much more elementary reasons why one would care about using the model category structure on Cat.
Best Answer
I am the author of that article in Inference. Mochizuki has explicitly said he is working with the truncation of the natural 2-categories of objects he wants to work with, for instance categories and isomorphism classes of functors, rather than categories, functors and natural transformations. This is a loss of information, even when two functors might be uniquely isomorphic.
As far as I can tell, this leads to a complication when one wants to treat diagrams in categories as being made up of specific objects, rather than isomorphism classes of objects: a diagram is a functor, after all. This leads to the 'solution' of considering only small subcategories $D \hookrightarrow C$ as diagrams in $C$. Up to cofinality (replacing $D$ by a (co)final subcategory), equivalence (one might need to replace $C$ by an equivalent category) and natural isomorphism (and finally the functor by a naturally isomorphic one), this is perfectly fine. But then if someone comes along who wasn't privy to this private fan dance, and who is ok with diagrams as functors, and in particular non-injective-on-objects functors, they will disagree that every node of the diagram must be unequal to every other node of the diagram, and you are going to disagree that one can have all nodes of the diagram equal, with no ill-effects.
Demanding that nodes of a diagram are unequal isn't a statement compatible with the principle of equivalence, since it is perfectly consistent with structural mathematics that the objects of a large category don't even have an equality predicate, or more prosaically, one cannot tell the difference between naturally isomorphic diagrams. Category theory is agnostic on whether objects are isomorphic or not, as opposed to replacing equal things with unequal but isomorphic things. Mochizuki is very much using the language of category theory, but he is not doing category theory, nor is he working in the spirit of it, and certainly is not using higher category theory, even if that would in fact tighten up some of his argument (though not the exposition). Just because a computer scientist uses natural numbers, it doesn't mean they are doing number theory.