About 1: Yes!
About 2: (Internal logic of Zariski topos) I don't think it has been done systematically. A glimpse of it is in Anders Kock, Universal projective geometry via topos theory, if I remember well, and certainly in some other places. But one point is that it is not at all easy to find formulas in the internal language which express what you have in mind. See my answer at "synthetic" reasoning applied to algebraic geometry
About 3:You can indeed glue all sorts of things:
Things fitting into the axiomatic framework of "geometric contexts":
Look at the "master course on Algebraic stacks" here: http://perso.math.univ-toulouse.fr/btoen/videos-lecture-notes-etc/
This one is great reading to understand the functorial point of view on schemes and manifolds!
Commutative Monoid objects in good monoidal (model) categories: http://arxiv.org/abs/math/0509684
Commutative monads (here you can glue monoids, semirings and other algebraic structures mixing them all): http://arxiv.org/abs/0704.2030
In Shai Haran's "Non-Additive Geometry" you can even glue the monoids and semirings etc. with relations (although I wouldn't know why)
You can also glue things "up to homotopy instead" of strictly - this is roughly what Lurie's infinity-topoi are about, and also the model catgeory part of the 2nd point, or any oter approaches to derived algebraic geometry
(Edit in 2017) The PhD thesis by Zhen Lin Low is relevant. "The main purpose of this thesis is to give a unified account of this procedure of constructing a category of spaces built from local models and to study the general properties of such categories of spaces."
One of several good points of view on what a Grothendieck topology does, is to say it determines which colimits existing in your site should be preserved under the Yoneda embedding, i.e. what glueing takes already place among the affine objects. So, if you insist on glueing groups it could be a good idea to look e.g. for a topology which takes amalgamated products (for me this means glueing groups, you may want only selected such products, e.g. along injective maps) to pushouts of sheaves...
Then feel free to develop a theory on this and send me a copy!
About 4: (Why don't people work with sheaves instead of schemes) They do. One situation where they do is when taking the quotient of a scheme by a group action. The coequalizer in the category of schemes is often too degenerate. One answer is taking the coequalizer in the category of sheaves, the "sheaf quotient" (but sometimes better answers are GIT quotients and stack quotients).
My short answer, is that in the great majority of situation where this $\mathcal{Y}/f^*$ plays a role, it is a mistake to see it as a topos. There are exceptions, but most of the time it is not meant to be a topos, but rather a $\mathcal{X}$-indexed topos.
To clarify the discussion, I will follow Joyal's terminology:
I'm calling "Logos" what we usually call a Grothendieck topos. Morphisms of logos are the continuous left exact functor, i.e. the $f^*$.
Toposes are the object of the opposite category of the category of logos. If $\mathcal{X}$ is a topos I denote by $Sh(\mathcal{X})$ the corresponding logos, which I think of as the category of sheaves of sets over $\mathcal{X}$. If I say "$x \in \mathcal{X}$" I mean that $x$ is a point (or maybe a generalized point) of $\mathcal{X}$.
This is mean't to mimic the picture of the conection between frames and locales (where the frame corresponding to a locales is denoted by $\mathcal{O}(\mathcal{X})$.
Now, The topos corresponding to the logos $Sh(\mathcal{Y})/f^*$ is the following object:
$$ \left(\mathbb{S} \times \mathcal{Y} \right) \coprod_{\mathcal{Y}} \mathcal{X}$$
where $\mathbb{S}$ is the Sierpinski topos, i.e. corresponding to the topological space with one closed point and one open point. (this is easily seen using that colimits of toposes corresponds to limits of logos, which are justs limits of categories)
So you can think of it as a kind of "cone construction" where the Sierpinski space is used as an interval. This object can indeed be interesting, and you can somehow guess that it will be especially interesting for the theory of local morphisms as you mentioned in your post.
But, in my opinion, this topos as simply nothing to do with the idea of working with $\mathcal{Y}$ as "an object over $\mathcal{X}$", i.e. to try to think of the map $\mathcal{Y} \rightarrow \mathcal{X}$ by somehow looking the fiber $\mathcal{Y}_x$ for $x \in \mathcal{X}$ and how it varies when $x$ varies in $\mathcal{X}$. Which is what we want to do in all the case you mentioned:
A proper map is a map whose fiber are compacts, in a "nicely locally uniforme way""
A locally connected map is map whose fiber are locally connected in a nicely locally uniform way.
A separated map... well you can actually also see it as a map whose fiber are separated in a locally uniforme way, but that is not quite what the definition you gave really says. The way I think about this definition is that when you have a class of map stable under composition and pullback then it is nice to consider the class of maps whose diagonals have this property, because you then get for free the lemma that "if $f \circ g$ is proper and $f$ is separated then $g$ is proper".
Let's now look at what happen when you want to work with $\mathcal{Y}$ as an object over $\mathcal{X}$ :
Essentially, instead of looking at $Sh(\mathcal{Y})$ you want to look at something like $Sh(\mathcal{Y}_x)$ for all $x \in \mathcal{X}$, which should give you some kind of family of logoses parametrized by $x \in \mathcal{X}$.
In the same way that the correct notion of "familly of set parametrized by $x \in \mathcal{X}$ is a sheaves over $\mathcal{X}$ the correct notion of such familly of logos is something like "a sheaves of logos". It is not quite just a "sheaves in the category of logos" because the definition of logos involved some infinitary operations (the infinite coproduct/colimits) which you want to replace by $Sh(\mathcal{X})$-indexed colimits/coproducts. So the correct notion is what I would call an "internal logos" and is a special kind of sheaf of logos (It is exactly a sheaf of logos which admits $Sh(\mathcal{X})$-indexed disjoint and universal coproducts).
Also note that these sheaf of logos in particular gives you the type of structure that you asked about in your last question. (they satisfies stronger property though, like Beck Cheavely conditions related to the fact that they have indexed colimits)
Then for technical reason, we tend to look at sheaves of categories rather as indexed categories or fibered categories, that is why you end up with a fibration over $Sh(\mathcal{X})$.
But if you somehow forget that it was mean't to be a "sheaf of logos" over $Sh(\mathcal{X})$ and see the total category of the fibration as a new logos, then you just get a completely different and new object that have very little to do with what I was describing.
I'm finishing with an informal discussion of why the $\mathcal{X}$-indexed logos corresponding to $\mathcal{Y} \rightarrow \mathcal{X}$ should be $Sh(\mathcal{Y})/f^*$. This is not a proof, just some kind of heuristic arguement. The proof is justs that there is a relatively deep theorem saying that the category of $Sh(\mathcal{X})$-indexed logoses is equivalent to the category of toposes over $\mathcal{X}$ and that the equivalence is given by this construction, and is compatible with pullbacks along maps $\mathcal{X'} \rightarrow \mathcal{X}$.
So What should be the "sheaf of logos" (or internal logos) corresponding to $\mathcal{Y} \rightarrow \mathcal{X}$. I need to think about "what should be its section over some étale cover $p: \mathcal{E} \rightarrow \mathcal{X}$".
I want something that, in a continuous way associate to each $e \in \mathcal{E}$ a sheaves over $\mathcal{Y}_p(e)$, i.e. something that continuously in $e \in \mathcal{E}$ and $y \in \mathcal{Y}_{p(e)}$ associate a set. So basically it is a sheaf of set over $\mathcal{Y} \times_{\mathcal{X}} \mathcal{E}$.
Now if $\mathcal{E}$ is the etale space of a sheaves $ E \in Sh(\mathcal{X})$, one has that $Sh(\mathcal{E}) = Sh(\mathcal{X})/E$, and $Sh(\mathcal{Y} \times_{\mathcal{X}} \mathcal{E}) = Sh(\mathcal{Y})/f^* E$
So in the end you do get the fibered category we are talking about.
Best Answer
There are known statements that are true in any Grothendieck topos, but not in every elementary topos with NNO. For instance:
Freyd's theorem that a complete small category is a preorder is not constructively provable and can fail in elementary toposes with NNO, such as the effective topos; but can be shown to hold in any Grothendieck topos by an "external" argument (see this question).
Axioms that assert things like "the axiom of choice fails in at most a small (i.e. set-indexed rather than proper-class-indexed) way" often tend to be true in Grothendieck toposes (at least when defined over ZFC), but can fail in elementary toposes that correspond to things like "permutation models over proper classes". Some "small choice" axioms are WISC, AMC, SVC, and SCSA, but I don't know offhand the situation for all of these regarding their truth in Grothendieck topoi.
An axiom scheme that holds in all Grothendieck toposes, but not all elementary toposes with NNO, is "autology" (representability of large quantification in the stack semantics) as defined in this paper. This is a sort of "replacement axiom" for topos theory, and fails in models like $V_{\omega\cdot 2}$.
A Grothendieck topos (defined over ZFC) also has all higher inductive types, including in particular localizations at any set of maps, and free algebras for all (even infinitary) algebraic theories. These can also fail in elementary toposes with NNO, such as a model of ZF in which $\omega$ is the only infinite regular cardinal (see Blass's paper "Words, free algebras, and coequalizers").