You make good points, and I am inclined to agree that the definition is strange.
But it works, because if you have a classifying topos that works for all infinitary pretoposes then it also works for all cocomplete toposes in particular.
If I recall correctly the concept of pretopos is not used in MacLane–Moerdijk, so maybe they say "cocomplete topos" instead as a compromise.
Incidentally, I believe the classifying toposes that arise in practice do work for all infinitary pretoposes.
One way of justifying this is to observe that every small set of objects in an infinitary pretopos is contained in a (not necessarily full) subcategory that is a Grothendieck topos and such that the inclusion preserves finite limits, small colimits, and is conservative.
Every "inverse image functor" from a Grothendieck topos will factor through such a subcategory (but the subcategory will depend on the functor), so the category of "inverse image functors" will be a union of categories of geometric morphisms between Grothendieck toposes.
Similarly, every model of the theory you want to classify will also be contained in such a subcategory, so the category of models will be a union of categories of models in Grothendieck toposes.
A very thorough paper discussing this topic can be found here.
To summarise the paper, a statement in the language of category theory is a first-order statement about a category which can quantify over objects and, given two objects $A, B$, over the arrows from $A$ to $B$, where the atomic propositions are always of the form $f = g$, and both $f$ and $g$ are arrows with the same domain and codomain. It is not permitted in this language to discuss equality of objects, nor can we discuss equality of arrows unless they syntactically have the same domain and codomain.
An example of such a statement would be $\forall A \exists f : A \to B \forall g : A \to B (f = g)$. This is the statement that $B$ is a terminal object.
Fix a terminal object $1$. The notation $x :\in A$ is shorthand for $x : 1 \to A$. Given $f : A \to B$ and $x :\in A$, the term $f(x) :\in B$ is shorthand for $f \circ x$.
The following statement expresses one part of the property of well-pointedness:
$$\forall A, B \forall f, g : A \to B, (\forall x :\in A, f(x) = g(x)) \to f = g$$
Note the above part of well-pointedness, if interpreted as a statement in normal set theory, is just the statement of function extensionality.
If we are talking about a category with enough structure, such as a topos, we can take a statement $\phi$ in the language of category theory and construct a new statement $X \Vdash \phi$ in the language of category theory. This translation is known as the “stack semantics”. If we restrict $\phi$ to only contain quantifiers over arrows and not over objects, we can interpret $\Vdash$ using the Kripke-Joyal semantics or any number of other methods and get an equivalent statement. The stack semantics is a generalisation of this.
We say a statement $\phi$ holds in a category $C$ externally if $\phi$, interpreted in the obvious way, is true about $C$. So for instance, $\forall A, B \forall f, g : A \to B, (\forall x :\in A, f(x) = g(x)) \to f = g$ is true externally in $C$ if and only if $1$ is a generator in $C$.
But we can also discuss what it means for $\phi$ to be true internally in $C$. This is exactly saying that $1 \Vdash \phi$ is true externally in $C$. In particular, $\forall A, B \forall f, g : A \to B, (\forall x :\in A, f(x) = g(x)) \to f = g$ is always true in $C$.
We have the following two results:
The following are equivalent: (a) for every statement $\phi$ which doesn’t involve quantifies over objects (but can take parameters in $C$), $\phi \iff 1 \Vdash \phi$ holds in $C$, (b) $C$ is (constructively) well-pointed
The following are equivalent: (a) for every statement $\phi$ with parameters in $C$, $\phi \iff 1 \Vdash \phi$ holds in $C$, (2) $C$ is (constructively) well-pointed and satisfies the axiom scheme of collection.
The axiom of choice is a good example of this phenomenon. The external axiom can be stated as $\phi :\equiv \ulcorner$For all epis $f : A \to B$, there is some $g : B \to A$ such that $f \circ g = 1_B \urcorner$. The internalisation $1 \Vdash \phi$ is equivalent to the statement that for all $f : A \to B$, the right adjoint to pullback $\Pi_f : C / A \to C / B$ preserves epis. If $C$ is well-pointed, the two statements are equivalent.
Best Answer
Let $\mathscr{C}$ be a category with an initial object $0$, such that the product $C\times 0$ exists for any object $C$ in $\mathscr{C}$. The following are equivalent:
For $2\Rightarrow 1$, the projection map $\pi_2\colon C\times 0\to 0$, coming from the definition of the product, is an isomorphism. So $C\times 0\cong 0$.
For $1\Rightarrow 2$, let $f\colon C\to 0$ be an arrow. Then we have an arrow $(\text{id}_C,f)\colon C\to C\times 0$ and the projection arrow $\pi_1\colon C\times 0\to C$ in the other direction. We have $\pi_1\circ (\text{id}_C,f) = \text{id}_C$, and $(\text{id}_C,f)\circ \pi_1 = \text{id}_{C\times 0}$, since $C\times 0$ is initial. Thus $C\cong C \times 0$, so $C$ is initial. It follows that $f$ is an isomorphism, since the unique arrow between two initial objects is automatically an isomorphism.
The hypothesis that $C\times 0$ exists for any object $C$ is actually not necessary, since if $0$ is a strict initial object, it follows that $C\times 0$ exists for all $C$.
Suppose $0$ is strict initial, and let $C$ be an arbitrary object. We'll check that $0$, with the unique projection maps $\pi_1\colon 0\to C$ and $\pi_2\colon 0\to 0$, satisfies the universal property of the product.
Suppose $X$ is an object and $f\colon X\to C$ and $g\colon X\to 0$ are two arrows. Since $0$ is strict initial, $g$ is an isomorphism, so $X$ is initial. Thus we can define $(f,g) = g\colon X\to 0$, and we have $\pi_1\circ (f,g) = f$ and $\pi_2\circ (f,g) = g$ automatically, since $X$ is initial.