What you seem to be missing is the early history. You can check the paper by Fienberg (2006) When Did Bayesian Inference Become "Bayesian"?. First, he notices that Thomas Bayes was the first one who suggested using a uniform prior:
In current statistical language, Bayes' paper introduces a uniform
prior distribution on the binomial parameter, $\theta$, reasoning by
analogy with a "billiard table" and drawing on the form of the
marginal distribution of the binomial random variable, and not on the
principle of "insufficient reason," as many others have claimed.
Pierre Simon Laplace was the next person to discuss it:
Laplace also articulated, more clearly than Bayes, his argument for
the choice of a uniform prior distribution, arguing that the posterior
distribution of the parameter $\theta$ should be proportional to what
we now call the likelihood of the data, i.e.,
$$ f(\theta\mid x_1,x_2,\dots,x_n) \propto f(x_1,x_2,\dots,x_n\mid\theta) $$
We now understand that this implies that the prior distribution for
$\theta$ is uniform, although in general, of course, the prior may not
exist.
Moreover Carl Friedrich Gauss also referred to using an uninformative prior, as noted by David and Edwards (2001) in their book Annotated Readings in the History of Statistics:
Gauss uses an ad hoc Bayesian-type argument to show that the posterior
density of $h$ is proportional to the likelihood (in modern
terminology):
$$ f(h|x) \propto f(x|h) $$
where he has assumed $h$ to be uniformly distributed over $[0,
\infty)$. Gauss mentions neither Bayes nor Laplace, although the
latter had popularized this approach since Laplace (1774).
and as Fienberg (2006) notices, "inverse probability" (and what follows, using uniform priors) was popular at the turn of the 19th century
[...] Thus, in retrospect, it shouldn't be surprising to see inverse
probability as the method of choice of the great English statisticians
of the turn of the century, such as Edgeworth and Pearson. For
example, Edgeworth (49) gave one of the earliest derivations of what
we now know as Student's $t$-distribution, the posterior distribution
of the mean $\mu$ of a normal distribution given uniform prior
distributions on $\mu$ and $h =\sigma^{-1}$ [...]
The early history of the Bayesian approach is also reviewed by Stigler (1986) in his book The history of statistics: The measurement of uncertainty before 1900.
In your short review you also do not seem to mention Ronald Aylmer Fisher (again quoted after Fienberg, 2006):
Fisher moved away from the inverse methods and towards his own
approach to inference he called the "likelihood," a concept he claimed
was distinct from probability. But Fisher's progression in this regard
was slow. Stigler (164) has pointed out that, in an unpublished
manuscript dating from 1916, Fisher didn't distinguish between
likelihood and inverse probability with a flat prior, even though when
he later made the distinction he claimed to have understood it at this
time.
Jaynes (1986) provided his own short review paper Bayesian Methods: General Background. An Introductory Tutorial that you could check, but it does not focus on uninformative priors. Moreover, as noted by AdamO, you should definitely read The Epic Story of Maximum Likelihood by Stigler (2007).
It is also worth mentioning that there is no such thing as an "uninformative prior", so many authors prefer talking about "vague priors", or "weekly informative priors".
A theoretical review is provided by Kass and Wasserman (1996) in The selection of prior distributions by formal rules, who go into greater detail about choosing priors, with extended discussion of usage of uninformative priors.
By Bayes theorem
$$ \text{posterior} \propto \text{prior} \times \text{likelihood} $$
so posterior combines information that comes from your data (through likelihood) and information that comes from your prior. So there is no posterior without prior. In case of the quote you provided, it provides a closed-form posterior distribution, because it uses the fact that Wishart distribution is a conjugate prior for precision parameter of multivariate normal distribution, so we have a "standalone" closed-form for it.
This means that for such model you do not have to sample from the prior distribution to conduct a MCMC simulation, since the posterior is known.
This does not mean that we do not use prior in here, we have already used it. On another hand, if you do not want to use prior, then you should rather use maximum likelihood estimation rather then Bayesian estimation.
Best Answer
Taking your example and adjusting it slightly to $\pi(\mu,\sigma^2)\propto\frac{1}{\sigma^2}$ similar to Wikipedia's example:
an argument that this prior is non-informative is that it is location-invariant and scale-invariant (uniform on the logarithmic scale), for example with properties that it leads to equal likelihoods for all possible values of the mean and that your results will be indifferent to the units of measurement (such as millimetres or kilometres);
an argument that this prior is informative is that it suggests that you think the mean is more likely to be greater in distance from $0$ than any particular large value you state, and that you think it more likely than not that the variance is either smaller than any particular small value you state or is greater than any large value you state; in other words it embodies the information that you believe it more likely than not that the mean and variance will be extreme to an incredible degree.
By the time you have some actual observations, these are less likely to be substantial arguments: with enough data, most moderately sensible priors produce broadly similar posterior distributions in most cases