First off, you mean to say $f(r) = \hat r / r^2$ since as you've noted, you know the divergence is zero when $r \neq 0$ since $$\nabla \cdot f(r) \hat r = \frac{1}{r^2} \frac{\partial }{\partial r} ( r^2 f(r) )$$
but let's try to under stand what happens when $r \to 0$. Since the function doesn't make sense here, let's try to understand this in a weak sense (or the sense of distributions, a surface integral will come about only when we start trying to make sense of this). Personally, I like to notice that $ -\nabla \frac{1}{r} = f(r)$, i.e. $\Delta \frac{1}{r}$ is the same as $\nabla \cdot f(r)$ (where $\Delta$ is the Laplacian operator). Let $\phi \in C^\infty$ be a nice test function, and consider the integral over the ball $B = \{ x \in \mathbb{R}^3 : ||x|| < \epsilon \}$.
$$ \int _B \phi(x) \Delta \left ( \frac{1}{r} \right ) dx $$
Using Green's identity ( i.e . integration by parts), you may check that
$$ \int _B \left [ \phi(x) \Delta \left ( \frac{1}{r} \right ) - \frac{1}{r} \Delta \phi \right ]dx = \oint_{\partial B} \left [ \phi \frac{\partial }{\partial r} \frac{1}{r} - \frac{1}{r} \frac{\partial \phi }{ \partial r} \right ] dS$$
Note that $dx = r^2 \sin \theta dr \, d\theta \, d \varphi$, thus
$$ \int _B \phi(x) \Delta \left ( \frac{1}{r} \right ) dx= \underbrace{ \int_B \frac{1}{r} \Delta \phi dx }_{ \mathcal{O}(\epsilon)}+ \oint_{\partial B} \left [ \underbrace{\phi \frac{\partial }{\partial r} \frac{1}{r} }_{\mathcal{O}(1)}- \underbrace{\frac{1}{r} \frac{\partial \phi }{ \partial r}}_{\mathcal{O}(\epsilon)} \right ] dS$$
Now if we take the limit as the ball shrinks to the origin (the bad point) $\epsilon \to 0$, we see terms of order $\epsilon$ die, and we're left with
$$ \lim_{\epsilon \to 0} \int _B \phi(x) \Delta \left ( \frac{1}{r} \right ) dx = \lim_{ \epsilon \to 0} \oint_{\partial B} \phi \frac{\partial }{\partial r} \frac{1}{r} dS = \lim_{ \epsilon \to 0} -\int_0^\pi \int_0^{2 \pi } \phi(x) \sin \theta d \theta d \varphi = - 4 \pi \phi(0)$$
Thus some people like to write
$$ \Delta \frac{1}{r} = - 4 \pi \delta (r) $$
Here's a way of calculating the divergence.
First, some preliminaries. The first thing I'll do is calculate the partial derivative operators $\partial_x,\partial_y,\partial_z$ in terms of $\partial_r, \partial_\theta, \partial_\varphi$. To do this I'll use the chain rule. Take a function $v:\Bbb R^3\to\Bbb R$ and compose it with the function $g:\Bbb R^3\to\Bbb R^3$ that changes to spherical coordinates: $$g(r,\theta,\varphi) = (r\cos\theta\sin\varphi,r\sin\theta\sin\varphi,r\cos\varphi)$$
The result is $\tilde v(r,\theta,\varphi)=(v\circ g)(r,\theta,\varphi)$ i.e. "$v$ written in spherical coordinates". An abuse of notation is usually/almost-always commited here and we write $v(r,\theta,\varphi)$ to denote what is actually the new function $\tilde v$. I will use that notation myself now. Anyways, the chain rule states that $$\begin{pmatrix}\partial_x v & \partial_y v & \partial_z v\end{pmatrix} \begin{pmatrix} \cos\theta\sin\varphi &-r\sin\theta\sin\varphi & r\cos\theta\cos\varphi \\ \sin\theta\sin\varphi & r\cos\theta\sin\varphi &r\sin\theta\cos\varphi \\\cos\varphi & 0 & -r\sin\varphi\end{pmatrix} = \begin{pmatrix}\partial_r v & \partial_\theta v & \partial_\varphi v\end{pmatrix}$$
From this we get, for example (by inverting the matrix) that $$\partial_x = \cos\theta\sin\varphi\partial_r - \frac{\sin\theta}{r\sin\varphi}\partial_\theta + \frac{\cos\theta\cos\varphi}{r}\partial_\varphi$$
The rest will have similar expressions. Now that we know how to take partial derivatives of a real valued function whose argument is in spherical coords., we need to find out how to rewrite the value of a vector valued function in spherical coordinates. To be precise, the new basis vectors (which vary from point to point now) of $\Bbb R^3$ are found by differentiating the spherical parametrization w.r.t. its arguments (and normalizing). Thus (one example), $$\mathbf e_r = \frac{\partial_r g}{\|\partial_r g\|} = \frac{\begin{pmatrix} \cos\theta\sin\varphi & \sin\theta\sin\varphi & \cos\theta\end{pmatrix}}{\|\begin{pmatrix} \cos\theta\sin\varphi & \sin\theta\sin\varphi & \cos\theta\end{pmatrix}\|} = \begin{pmatrix} \cos\theta\sin\varphi & \sin\theta\sin\varphi & \cos\theta\end{pmatrix} \\[4ex] = \cos\theta\sin\varphi \mathbf i + \sin\theta\sin\varphi \mathbf j + \cos\theta\mathbf k$$
I don't know how to justify this without speaking of tangent spaces, but I'm sure you can ask your teacher for an explanation. After calculating the new unit vectors, you'll again have to invert the relation to obtain $\mathbf i,\mathbf j,\mathbf k$ in terms of $\mathbf e_r,\mathbf e_\theta,\mathbf e_\varphi$. But that part is just linear algebra!
Now that everything is set up, we can calculate the divergence. But what is the divergence? What I mean is, how do we write it as an abstract object that acts on functions? Here is one possibility, in terms as familiar as possible to a calculus student (there are other definitions too): $$\mathrm{div}(\cdot) = \partial_x\left(\langle \mathbf{i},\cdot\rangle\right) + \partial_y\left(\langle \mathbf{j},\cdot\rangle\right) + \partial_z\left(\langle \mathbf{k},\cdot\rangle\right)$$
Where the symbol $\langle\cdot,\cdot\rangle$ is used for the dot product. Try to convince yourself why the above formula is so.
Now just substitue all of the expressions we just derived for the basis vectors, and the differential operators. Finally, place an arbitrary vector field $$ \mathbf E = E_r(r,\theta,\varphi)\,\mathbf e_r + E_\theta(r,\theta,\varphi)\,\mathbf e_\theta + E_\varphi(r,\theta,\varphi)\,\mathbf e_\varphi$$
in place of the "$\cdot$" in the (new) expression for $\mathrm{div}$, and expand.
Best Answer
The mapping is still continuous (at least in the spherical to Cartesian direction). You can find these in pretty much any reference on polar coordinates, and I am sure you are aware of them: $$\tag1 \begin{align} x &= r \sin\theta\cos\varphi \\ y &= r \sin\theta\sin\varphi \\ z &= r \cos\theta \end{align} $$ (The back-transformations are indeed not continuous, but not for the reason you seem to be thinking of - it is simply the topological matter of having to "cut" a circle.)
But yes, the directions of the basis vectors for the polar coordinates are not the same at each point. That's just as it should be - otherwise the coordinate system would not be polar at all. By definition, each basis vector points in the direction such that the corresponding coordinate increases, while the other two are (locally) constant, i.e. stationary. In other words, if I imagine the position vector $\vec r$ itself as a field, then its partial derivatives with respect to each coordinate give me the directions of the basis vectors: $$ \begin{align} \hat r &= \partial {\vec r}/\partial r \\ (r)\hat \theta &= \partial {\vec r}/\partial \theta \\ (r\sin\theta)\hat \varphi &= \partial {\vec r}/\partial \varphi \end{align} $$ (The extra factors in the latter two are to normalize them.)
If you would like to see what those look like in the (original) Cartesian system, use $\vec r = (x, y, z)$ with the expressions from (1) to get, for example, $$ \hat r = \frac{\partial}{\partial r}(r \sin\theta\cos\varphi,r \sin\theta\sin\varphi,r \cos\theta) = \left({x\over r}, {y\over r}, {z\over r}\right) = {\vec r\over r}. $$
This clearly points in a different direction for points with different $\theta, \varphi$: in fact, it points always radially out. If you do the same for the other two, you should see that $\hat\theta$ and $\hat \varphi$ point along a meridian and along a parallel, respectively (if you think of the angles as latitude and longitude).