My personal opinion is that one should consider the 2-category of categories, rather than the 1-category of categories. I think the axioms one wants for such an "ET2CC" will be something like:
- Firstly, some exactness axioms amounting to its being a "2-pretopos" in the sense I described here: http://ncatlab.org/michaelshulman/show/2-categorical+logic . This gives you an "internal logic" like that of an ordinary (pre)topos.
- Secondly, the existence of certain exponentials (this is optional).
- Thirdly, the existence of a "classifying discrete opfibration" $el\to set$ in the sense introduced by Mark Weber ("Yoneda structures from 2-toposes") which serves as "the category of sets," and internally satisfies some suitable axioms.
- Finally, a "well-pointedness" axiom saying that the terminal object is a generator, as is the case one level down with in ETCS. This is what says you have a 2-category of categories, rather than (for instance) a 2-category of stacks.
Once you have all this, you can use finite 2-categorical limits and the "internal logic" to construct all the usual concrete categories out of the object "set". For instance, "set" has finite products internally, which means that the morphisms $set \to 1$ and $set \to set \times set$ have right adjoints in our 2-category Cat (i.e. "set" is a "cartesian object" in Cat). The composite $set \to set\times set \to set$ of the diagonal with the "binary products" morphism is the "functor" which, intuitively, takes a set $A$ to the set $A\times A$. Now the 2-categorical limit called an "inserter" applied to this composite and the identity of "set" can be considered "the category of sets $A$ equipped with a function $A\times A\to A$," i.e. the category of magmas.
Now we have a forgetful functor $magma \to set$, and also a functor $magma \to set$ which takes a magma to the triple product $A\times A \times A$, and there are two 2-cells relating these constructed from two different composites of the inserter 2-cell defining the category of magmas. The "equifier" (another 2-categorical limit) of these 2-cells it makes sense to call "the category of semigroups" (sets with an associative binary operation). Proceeding in this way we can construct the categories of monoids, groups, abelian groups, and eventually rings.
A more direct way to describe the category of rings with a universal property is as follows. Since $set$ is a cartesian object, each hom-category $Cat(X,set)$ has finite products, so we can define the category $ring(Cat(X,set))$ of rings internal to it. Then the category $ring$ is equipped with a forgetful functor $ring \to set$ which has the structure of a ring in $Cat(ring,set)$, and which is universal in the sense that we have a natural equivalence $ring(Cat(X,set)) \simeq Cat(X,ring)$. The above construction then just shows that such a representing object exists whenever Cat has suitable finitary structure.
One can hope for a similar elementary theory of the 3-category of 2-categories, and so on up the ladder, but it's not as clear to me yet what the appropriate exactness properties will be.
Wikipedia also says that Troelstra said in 1988 that there were no satisfactory foundations for ultrafinitism. Is this still true? Even if so, are there any aspects of ultrafinitism that you can get your hands on coming from a purely classical perspective?
There are no foundations for ultrafinitism as satisfactory for it as (say) intuitionistic logic is for constructivism. The reason is that the question of what logic is appropriate for ultrafinitism is still an open one, for not one but several different reasons.
First, from a traditional perspective -- whether classical or intuitionistic -- classical logic is the appropriate logic for finite collections (but not K-finite). The idea is that a finite collection is surveyable: we can enumerate and look at each element of any finite collection in finite time. (For example, the elementary topos of finite sets is Boolean.) However, this is not faithful to the ultra-intuitionist idea that a sufficiently large set is impractical to survey.
So it shouldn't be surprising that more-or-less ultrafinitist logics arise from complexity theory, which identifies "practical" with "polynomial time". I know two strands of work on this. The first is Buss's work on $S^1_2$, which is a weakening of Peano arithmetic with a weaker induction principle:
$$A(0) \land (\forall x.\;A(x/2) \Rightarrow A(x)) \Rightarrow \forall x.\;A(x)$$
Then any proof of a forall-exists statement has to be realized by a polynomial time computable function. There is a line of work on bounded set theories, which I am not very familiar with, based on Buss's logic.
The second is a descendant of Bellantoni and Cook's work on programming languages for polynomial time, and Girard's work on linear logic. The Curry-Howard correspondence takes functional languages, and maps them to logical systems, with types going to propositions, terms going to proofs, and evaluation going to proof normalization. So the complexity of a functional program corresponds in some sense to the practicality of cut-elimination for a logic.
IIRC, Girard subsequently showed that for a suitable version of affine logic, cut-elimination can be shown to take polynomial time. Similarly, you can build set theories on top of affine logic. For example, Kazushige Terui has since described a set theory, Light Affine Set Theory, whose ambient logic is linear logic, and in which the provably total functions are exactly the polytime functions. (Note that this means that for Peano numerals, multiplication is total but exponentiation is not --- so Peano and binary numerals are not isomorphic!)
The reason these proof-theoretic questions arise, is that part of the reason that the ultra-intuitionist conception of the numerals makes sense, is precisely because they deny large proofs. If you deny that large integers exist, then a proof that they exist, which is larger than the biggest number you accept, doesn't count! I enjoyed Vladimir Sazonov's paper "On Feasible Numbers", which explicitly studies the connection.
I should add that I am not a specialist in this area, and what I've written is just the fruits of my interest in the subject -- I have almost certainly overlooked important work, for which I apologize.
Best Answer
Here is another approach, which I believe I first learned from Toby Bartels. Suppose $X$ is an arbitrary differentiable manifold (think of the state space of some physical system), and define a variable (one might also say "observable") to be a smooth real-valued function on $X$. If $x:X\to \mathbb{R}$ is such a "variable", then its differential is, as usual in differential geometry, a smooth function ${\rm d}x:T X \to \mathbb{R}$ on the tangent bundle of $X$. We also have the tangent map $T x : T X \to T\mathbb{R} \cong \mathbb{R}\times\mathbb{R}$, with $T x = (x, {\rm d}x)$.
If $y:X\to \mathbb{R}$ is another such "variable", then it might be related to $x$ by an equation such as $y = x^2$ or $x^2 + y^2 = 4$. Being equalities of real-valued functions, these are pointwise equalities. If $y= x^2$, then we can say that "$y$ is a function of $x$" in the sense that there is a function $f:\mathbb{R}\to\mathbb{R}$ such that $y = f\circ x$, namely $f = \lambda u. u^2$ (see this question). In this case, the chain rule of differential geometry tells us that $T y:T X \to T \mathbb{R}$ is the composite $T X \xrightarrow{T x} T \mathbb{R} \xrightarrow{T f} T \mathbb{R}$. Since $T f (u,v) = (f(u), f'(u) \cdot v)$, this means that (in addition to $y = f\circ x$) we have ${\rm d}y = f'(x) \cdot {\rm d}x$. This is a simple pointwise equality of functions $T X \to \mathbb{R}$, so we can divide by ${\rm d} x$ (at least assuming it is never zero) to get $f'(x) = \frac{{\rm d}y}{{\rm d}x}$, or in this case $\frac{{\rm d}y}{{\rm d}x} = 2x$.
Similarly, if $x^2+y^2=4$, then $y$ is not a function of $x$ in this sense, but $x^2+y^2$ and $4$ are two smooth functions $X\to \mathbb{R}$, where the first is expressed as a composite $$X\xrightarrow{(x,y)} \mathbb{R}\times\mathbb{R} \xrightarrow{\lambda u v. u^2+v^2} \mathbb{R}.$$ Thus the chain rule of differential geometry again gives us $2 x \,{\rm d}x + 2 y \,{\rm d}y = 0$ as a pointwise equality of functions $T X \to \mathbb{R}$, so that we can solve it as usual in elementary calculus to get $\frac{{\rm d}y}{{\rm d}x} = -\frac{x}{y}$.