This answer is a small attempt to address your third question. You may find the article,
Jethro van Ekeren, The six-vertex model, $R$-matrices, and quantum groups,
to be helpful.
One of the main sources in the development of quantum groups was the field of exactly solvable models in statistical mechanics. A number of simple mathematical models were devised at the same time as quantum mechanics was being developed in order to understand phase transitions (changes of state) that occur in certain magnetic materials like iron, and that can also be used to understand properties of more familiar changes of state like the liquid-gas transition. One of these models was the Ising model, which is a classical (non-quantum) model. Similar models that came much later were the six-vertex and eight-vertex models. On the quantum mechanical side, there was the Heisenberg spin chain, among others.
The remarkable thing about these models is that, at least for certain values of their parameters, it is possible to compute certain physically interesting quantities, such as the free energy, exactly in the limit of infinite system size (the thermodynamic limit). In physics, symmetries of a system are associated with conservation laws. What makes these particular systems exactly solvable is that, in the thermodynamic limit, they have an infinite-dimensional group of symmetries, and therefore infinitely many conservation laws. (Without these conservation laws, the computation of the free energy become increasingly intractable as the system size grows.) These infinite-dimensional symmetry groups are mathematically interesting. In 1944 Lars Onsager computed the free energy of the two-dimensional Ising model with the external magnetic field parameter set equal to $0$ by introducing a certain infinite-dimensional algebra which was later discovered to be connected to Kac–Moody algebras.
Onsager's solution was considered by physicists to be difficult, and a number of alternative solution methods were discovered in subsequent decades. One of these—the method of commuting transfer matrices—was particularly fruitful in terms of generalizations. The Ising model is defined on a lattice—a two-dimensional square lattice—in Onsager's work. (This is intended as a simplified model of the crystalline lattice of real metals.) There is a binary (two-valued) variable (a "spin") associated with each lattice site, and the spins experience interactions with their nearest neighbors. The transfer matrix is an operator that corresponds to adding a row of sites to the lattice. In the Ising model, six-vertex model, and other exactly solvable two-dimensional models, the key to solvability is that transfer matrices with different values of a certain parameter commute with each other. The quantum mechanical models mentioned above, such as the Heisenberg spin chain, are one-dimensional models with Hamiltonian (energy operator) given by a matrix in the same commuting family. The members of this commuting family can be thought of as physical operators that represent conserved quantities of the spin chain (since they commute with the Hamiltonian, these quantities do not change with time).
Rodney Baxter discovered that commutativity of transfer matrices is implied by what is now known as the Yang–Baxter equation, which involves an operator called the $R$-matrix. The $R$-matrix can be regarded as the fundamental building block out of which transfer matrices are constructed, corresponding to the interactions of a single lattice site with its neighbors. The Yang–Baxter equation relates two different ways in which three sites can interact. It has a graphical representation closely connected with knot theory.
Quantum groups arise as algebraic structures in which $R$-matrices satisfying the Yang–Baxter equation naturally arise. These solutions to the Yang–Baxter equation give rise to new families of commuting transfer matrices, and therefore to new exactly solvable models describing new types of phase transitions.
There are connections between these models and other parts of physics. These particular exactly solvable models have critical points, at which the physical system exhibits a form of scale invariance. Scale invariance actually implies the stronger conformal invariance, and these models at their critical points have continuum limits called conformal field theories. These are a key ingredient in string theory, and play a role in many interesting mathematical developments as well (for example in the proof of the Moonshine conjectures).
The short answer is: it depends! To do differential geometry you don't really need category theory at all, and the same could (nearly) be said for some flavors of algebraic geometry. That said, some people (myself included) learn things best from a categorical standpoint. If you get excited whenever people mention universal properties, and are happiest defining things in terms of a functor that they represent, then starting with some category theory may be a good thing for you. In that case, I would recommend working through the first chapters of the classic Categories for the working mathematician. In particular, you want a solid understanding of limits, adjoint functors, and the relation between the two.
Now, if you don't even know group theory yet, starting with category theory is a bad idea. It would be best to start with some abstract algebra, using one of the standard texts.
It may be that you are a more normal mathematician for whom "categories first" or "algebra first" is not a good idea. In this case, if you are interested in differential geometry, the best thing to do would be to learn differential geometry, and only spend time on other topics as necessary.
Algebraic geometry can be almost entirely non-categorical, or hyper-categorical depending on what you are interested in doing. How much category theory you will need to know depends primarily on your own tastes in algebraic geometry.
Best Answer
If you have never seen anything about Hopf algebras I recommend perhaps looking at Section 2.2 of my own thesis. It is a very leisurely introduction in the technically easy finite dimensional case.
Perhaps for a first look at $\mathrm{C}^*$-algebraic quantum groups these notes of Roland Vergnioux might be a good idea:
These notes really well-motivate the definition and relate the definition very well to the commutative situation.
However perhaps use this as a reference and instead look at graduate lecture notes such as (in no particular order):
Between these you are in good nick.