The unitalization approach can be made to work.
Let $C_K = \{ (r,s) \in R \times R \mid r-s \in K \}$ be the congruence defined by an ideal $K$.
Then, we have three maps defined on $S = C_I \otimes_R C_J$ :
- $\pi_0 : S \to C_I$ induced by the first projection $C_J \to R$ and the identity on $C_I$
- $\pi_1 : S \to C_J$ induced by the first projection $C_I \to R$ and the identity on $C_J$
- $\mu : S \to R \times R$ induced by the inclusions $C_I \to R\times R$ and $C_J \to R \times R$.
Letting $\Delta \subseteq R \times R$ be the image of the diagonal, define $T = \{ x \in S \mid \pi_0(x) \in \Delta \wedge \pi_1(x) \in \Delta \}$. I claim that $\mu(T) = C_{IJ}$.
On the level of $R$ modules, we have isomorphisms $C_K \cong R \oplus K$, such as $(r,s) \mapsto (r, s-r)$, and so we have
$$ S \cong R \oplus I \oplus J \oplus (I \otimes_R J) $$
In this form, the maps $\pi_i$ become projections onto the relevant summands, so $T$ is precisely the submodule $R \oplus (I \otimes_R J)$, so we've eliminated the $I$ and $J$ summands you were having trouble with.
By splitting the $R$-module maps, $T$ is genenerated as an $R$-module by elements of the form
$$ (r, r+i) \otimes (s, s+j) - (0,i) \otimes (s,s) - (r,r) \otimes (j,0) $$
and applying $\mu$ to such a thing gives the element $(rs, rs + ij)$, and now it's easy to see that $\mu(T) = C_{IJ}$ as claimed.
If you're like Bill Lawvere you will have no trouble doing everything with categories. Also, it may seem like you've gone some way in learning math, but trust me, there's much more to come. So don't worry about your background, what course you might not have taken, or whether one view of a subject is better than another. Just keep gobbling up everything that comes in your path and when in doubt as to which way to do something, do it both ways.
If however you're a problem solver, you should just learn combinatorics or some other subject rife with insanely hard yet easily understood problems, and live happily ever after.
Best Answer
I didn't have time to write up a proper answer on initially seeing your question and much of what I have to say has been said in the comments and Tim's answer, but I'll still offer some specifics on things mentioned in the comments.
The definition of a set is, as Tim Campion pointed out, the axioms of whatever set theory you're working in. They do not so much define a set as describe a primitive notion that behaves like what we think sets should behave like, with different axiomatizations giving rise to differing notions of set.
These differing axiomatizations do have an impact on category theory because they change the behavior of ${\bf Set}$, and consequently the behavior of presheaf categories, and they also impact our ability to manipulate large categories. As mentioned by Ingo Blechschmidt in the comments, Mike Shulman has an excellent paper surveying some of these consequences. I will summarize some of them here, but I highly recommend you check out his paper.
A striking result referenced in the Shulman paper is due to Colin McLarty, establishing that the NF axiomatization of what a set is yields a ${\bf Set}$ that isn't Cartesian closed.
In ZFC we really only run into issues if we want to manipulate large categories as a whole, for example ${\bf Set}$ or ${\bf Group}$, which are not actually objects in ZFC since they're proper classes. We can get around this with shenanigans about formulas the metalanguage, but anyone looking for an integrated and 'natural' treatment of large categories on level footing with small ones will be disappointed in this setting.
NBG is a conservative extension of ZFC (meaning it doesn't prove anything about sets that ZFC can't) which does allow proper classes to be real objects in the theory, but we still run into some discomfort when dealing with large categories. NBG manages to be conservative over ZFC by restricting it's comprehension axiom to only apply to sets, not proper classes -- in practical terms, as Mike points out in his paper linked above, this means (for example) that we can't prove by induction that a large category $\mathcal{C}$ has an $n$-fold Cartesian product $\mathcal{C}^n$. We can get around this by constructing it directly as the category of functions from $n$ into $\mathcal{C}$, but the unavailability of canonical proof methods like induction is troubling.
MK is a non-conservative extension of ZFC, essentially NBG but with full class comprehension allowed so we have access to all of the standard proof tools for large categories. This new theory can prove things ZFC can't, like the consistency of ZFC, and is thusly strictly stronger in a meaningful sense. MK also has its own serious issues when working with large categories -- we can't define the category of functors between two large categories, and this applies to NBG as well.
Using full MK further suggests that we try to look at the category of classes, since they're really the category of collections we want to work with right? And bam, once again we're back to a situation where we have to play games in the metalanguage, or conservatively extend/step up the consistency strength of our theory. This leads mathematicians to situations like Grothendieck universes, where it's always possible to step up to the next universe if we need to talk about 'all the somethings' in the current universe. This is equivalent to working in ZFC plus an axiom asserting the existence of an inaccessible cardinal.
All the extra baggage of universes or inaccessibles is still somewhat of a sledgehammer for the problem at hand, though; all we want is for large categories to 'be like small categories' in enough ways that we can carry out all the constructions we care about with large categories, but inaccessibles or Grothendieck universes also have a plethora of other consequences (like the need to juggle universes1). A solution to these problems comes in the form of reflection principles, which are essentially axioms asserting that proper classes look enough like sets that we don't have to soil ourselves when they appear, but don't endow them with enough independence to give rise to a whole hierarchy of universes we need to ask questions about. All of this is discussed at length in the Shulman paper referenced above, with additional references therein.
First paper: Shulman, Mike. Set theory for category theory. arXiv:0810.1279v2 [math.CT]
Second paper: McLarty, Colin. Failure of Cartesian Closedness in NF. J. Symbolic Logic 57 (1992), no. 2, 555--556. https://doi.org/10.2307/2275291
1As Tim points out in the comments, how many universes we have to juggle when taking this route is up to us. Skilled jugglers may use an infinite number, while those new to the approach may use only two.