Let KP0 be the induction-free fragment of KP. That is, the axioms of KP0 are extensionality, pairing, union, Δ0-separation, Δ0-collection, and (since I see no way of proving this from the other axioms) the existence of transitive closures.
Following Barwise, I define the class of Σ-formulas to be the smallest class which contains atomic formulas and their negations, and which is closed under conjunction, disjunction, bounded quantification, and unbounded existential quantification. Similarly, I define the of Π-formulas to be the smallest class which contains atomic formulas and their negations, and which is closed under conjunction, disjunction, bounded quantification, and unbounded universal quantification. The negation of a Σ-formula φ is canonically equivalent to a Π-formula and vice versa; I will identify ¬φ with its canonical equivalent without further comment.
The following three facts are standard in KP. The usual proofs actually go through in KP0.
Σ-Reflection. For every Σ-formula KP0 ⊦ φ ↔ ∃aφ(a), where φ(a) is obtained from φ by replacing every unbounded quantifier ∃x by the corresponding bounded quantifier ∃x∈a.
Σ-Collection. For every Σ-formula φ(u,v),
KP0 ⊦ ∀u∈x∃vφ(u,v) →∃y∀u∈x∃v∈yφ(u,v).
Δ-Separation. For every Σ-formula φ(u) and every Π-formula ψ(u),
KP0 ⊦ ∀u∈x(φ(u) ↔ ψ(u)) → ∃y∀u∈x(u ∈ y ↔ φ(u)),
where y does not occur free in φ nor ψ.
Instead of Mike Shulman's foundation axiom, I will use the classically equivalent regularity axiom:
(Reg) x ≠ ∅ → ∃y∈x(x∩y = ∅).
This axiom already implies a certain amount of induction because of Δ-separation.
Δ-Induction. For every Σ-formula φ(u) and every Π-formula ψ(u),
KP0 + Reg ⊦ ∀u(φ(u) ↔ ψ(u)) → (∀u(∀v∈uφ(v) → φ(u)) → ∀uφ(u)).
For convenience, I will now assume that the language contains a function symbol rk(x) for the ordinal rank of a set. Let A be a model of KP0 + Reg. A cut in A is a proper initial segment I of the ordinals of A which has no least upper bound; a Σ-cut (resp. Π-cut) is one which can be defined by a Σ-formula (resp. Π-formula).
Σ-Cut Lemma. If A is a model of KP0 + Reg and I is a Σ-cut in A, then AI = {x ∈ A : rk(x) ∈ I} is a model of KP0 + Reg.
Sketch of proof. The main point of contention is Δ0-collection. Suppose that φ(u,v) is a Δ0-formula such that AI ⊧ ∀u∈x∃vφ(u,v). Consider the set of ordinals
J = {α ∈ A : ∃u∈x∀v(rk(v) < α → ¬φ(u,v))}.
This is a Π-definable initial segment of I. We cannot have I = J,
since that would violate Δ-induction. Pick β ∈ I \ J. Working in A, we can find a set y such that
∀u∈x∃v∈y(rk(v) < β ∧ φ(u,v)).
If necessary, we can cut down y so that rk(y) ≤ β and hence y ∈ AI. ∎
If A has a minimal Σ-cut, then this gives a model of KP0 that satisfies Σ-induction. Unfortunately, there are models of KP0 + Reg without minimal Σ-cut.
Now, on the positive side a minor twist of the proof of the Σ-cut lemma gives the following.
Π-Cut Lemma. If A is a model of KP0 that satisfies Π-induction and I is any cut of A, then AI is a model of KP0.
(Π-induction implies that the Π set J defined as above must have a least upper bound β ∈ I. The rest of the proof is identical.)
So when A satisfies Π-induction, we can take a sufficiently small cut of A to get a model of KP (with full induction). In fact, we can take the maximal wellfounded initial segment of the ordinals of A, with the caveat that this cut is very often just ω.
When I have a chance, I'll try to add examples showing that you can't do better than the Σ-cut lemma and the Π-cut lemma.
Addendum. I just stumbled across two papers that address these questions. The first paper is by Domenico Zambella Foundation versus induction in Kripke-Platek set theory [J. Symb. Logic 63 (1998), MR1665739] where it is shown that foundation implies open-induction, but not Σ1-induction over KP0 (without transitive closures). This is complemented by Antonella Mancini and Domenico Zambella A Note on Recursive Models of Set Theories [Notre Dame J. Formal Logic 42 (2001), MR1993394] where they generalize Tennenbaum's Theorem by showing that fragments of KP which prove Σ1-induction have no computable models other than the standard one.
Best Answer
For an ultrapower of $V$ by an ultrafilter $\mathcal{U}$ there is an exact characterization of when the ultrapower will be well-founded: precisely when $\mathcal{U}$ is closed under countable intersections.
As for how a model $M$ could possibly satisfy regularity but not be wellfounded, the problem is that there may be infinite descending chains that $M$ cannot 'see': each individual object may belong to $M$ but the chain itself may not. I can be a little more precise. Let's say $R$ is what $M$ understands to be the $\in$-relation. There may well be $x_0,\ldots x_n,\ldots $ belonging to $M$ so that $x_{n+1}Rx_n$ for all $n\in\omega$; as long as the sequence $\langle x_n:n\in\omega\rangle$ does not belong to $M$ the axiom of regularity from $M$'s point of view need not be violated.
You don't need ultrapowers to construct such models. Assuming CON(ZFC) you can build one using the compactness theorem for first order logic.