Over the complex numbers (or any other algebraically closed field with $\operatorname{char} k\neq 2$), every invertible matrix has a square root. In fact, over $\mathbb C$, since every invertible matrix has a logrithm, we can take a one parameter family of matrices $e^{t\log A}$, and taking $t=1/2$ yields a square root of $A$. To see the existance of matrix logrithms, it suffices to show that $I+N$ has a logrithm, where $N$ is nilpotent, and this follows from Taylor series (similar to Ted's proof of the existence of sqare roots).
Thus, we can determine if a matrix $A$ has a square root by restricting to $\displaystyle\bigcup_n \ker A^n$, which is the largest subspace on which $A$ acts nilpotently. In what follows, we will assume that $A$ is nilpotent.
Up to conjugation, $A$ is determined by its Jordan normal form. However, equivalent to JNF for a nilpotent matrix is the data $a_i'=\dim \ker A^i$ for all $i$. This is obviously an increasing sequence. Less obvious is that the sequence $(a_i)$ where $a_i=a'_i-a'_{i-1}$ is a decreasing sequence, and hence forms a partition of $\dim V$ where $A:V\to V$. We note that this data is equivalent to the data in JNF, as $a_i-a_{i+1}$ will be the number of Jordan blocks of size $i$. More explicitly, a jordan block of size $k$ corresponds to the partition $(1,1,1,1,1\ldots, 0,0,0,\ldots)$ with $k$ $1's$, and if a nilpotent matrix $A=\oplus A_i$ is written in block form where each block $A_i$ corresponds to a partition $\pi_i$, then $A$ corresponds to the partition $\pi=\sum \pi_i$, where the sum is taken termwise, e.g. $(2,1)+(1,1)+(7,4,2)=(10,6,2)$.
Moreover, $A^2$ corresponds to the partition $(a_1+a_2, a_3+a_4,\ldots, a_{2i-1}+a_{2i}, \ldots).$ Because every matrix will be conjugate to a JNF matrix and $\sqrt{SAS^{-1}}=S\sqrt{A}S^{-1}$, we see that a matrix will have a square root if and only if the corresponding partition has a "square root."
The only obstruction to a partition having a square root is if two consecutive odd entries are equal. Otherwise, we can take one (of many) square roots by replacing each $a_i$ with the pair $\lceil a_i/2 \rceil, \lfloor a_i/2 \rfloor$.
If we can prove $f$ is differentiable with bounded derivative $Df$, Lipschitz continuity follows from the fundamental theorem of calculus. The Lipschitz constant then equals the maximum of $\lVert Df \rVert$.
Denote the space of $b\times b$-symmetric matrices by $\textrm{Sym}_b$. Define
\begin{align*}
&f_1: \mathbb{R}^{c\times b} \rightarrow \mathrm{Sym}_b &f_1(X) = (A+BX)^T(A+BX) \\
&f_2: \textrm{Sym}_b \rightarrow \textrm{Sym}_b &f_2(X) = X^\frac{1}{2} \\
&f_3: GL(b,\mathbb{R}) \rightarrow GL(b,\mathbb{R}) &f_3(X) = X^{-1}.
\end{align*}
As long as the singular values of $A+BX$ stay away from zero, $f = f_3\circ f_2 \circ f_1$ is defined, and $Df(X) = Df_3\left(f_2(f_1(X))\right)\cdot Df_2(f_1(X))\cdot Df_1(X)$ by the chain rule.
First, $Df_1(X)Y = Y^TB^T(A+BX) + (A+BX)^TBY$, hence $\lVert Df_1(X) \rVert_2 \leq \lVert B^T(A+BX)\rVert_2 + \lVert (A+BX)^TB \rVert_2$ by the triangle inequality. Now, since the largest singular value of $A+BX$ is bounded by $r$, $\lVert A+BX \rVert_2 \leq r$ and thus $\lVert Df_1(X) \rVert_2 \leq 2r \lVert B \rVert_2$.
The square root we can differentiate via the inverse function theorem. Indeed, let
$g: \textrm{Sym}_b \rightarrow \textrm{Sym}_b$, $g(X) = X^2$, then $Id_b = D(g\circ f_2) = (Dg\circ f_2) \cdot Df_2$, hence $Df_2 = Id_b / (Dg \circ f_2)$. By the product rule, $Dg(X)Y = XY+YX$. Fortunately, we can diagonalize this map: Let $u_1,\dots,u_b$ be an orthonormal base of eigenvectors of $X$ with eigenvalues $\mu_1,\dots,\mu_b$. Then $Y_{jk} = \frac{1}{2}\left(u_ju_k^T + u_ku_j^T\right)$, $1\leq j\leq k \leq b$, gives a basis of $\textrm{Sym}_b$, which is orthonormal w.r.t. the Frobenius scalar product $\left< A,B\right> = \mathrm{tr}(AB)$ on $\mathrm{Sym}_b$ and it's induced norm $\lVert \cdot \rVert_F$. Since
$$XY_{jk} + Y_{jk}X = \frac{1}{2}\left( Xu_j u_k^T + Xu_k u_j^T + u_j u_k^T X + u_k u_j^T X \right) = \frac{1}{2}\left( \mu_j u_j u_k^T + \mu_k u_k u_j^T + \mu_k u_j u_k^T + \mu_j u_k u_j^T \right) = (\mu_j+\mu_k)Y_{jk},$$
the eigenvalues of $Df_2(X) = Id_b/Dg(f_2(X))$ are given by $(\lambda_j+\lambda_k)^{-1}$, where $\lambda_1 \dots \lambda_b$ are the eigenvalues of $f_2(X)$. Since $Df_2$ is additionally symmetric w.r.t. the Frobenius scalar product, we infer $\lVert Df_2(X) \rVert_F \leq 2\frac{1}{|\xi|}$, where $\xi$ is the eigenvalue of $f_2(X)$ with the smallest absolute value. In our case, $\xi$ is the smallest singular value of $A+BX$, which we assumed to be greater than $r^{-1}$. Since all norms on finite dimensional vector spaces are equivalent, $\lVert Df_2(f_1(X))\rVert_2 \leq Cr $ for a dimension-dependent constant $C$.
Finally, it's very well known that $Df_3(X)Y = X^{-1}YX^{-1}$, hence $\lVert Df_3 \rVert_2 \leq \lVert X^{-1} \rVert_2^2$. This implies $\lVert Df_3(f_2(f_1(X))) \rVert_2 \leq r^2$.
Altogether, we find $\lVert Df(X) \rVert_2 \leq \lVert Df_3(f_2(f_1(X))) \rVert_2 \lVert Df_2(f_1(X)) \rVert_2 \lVert Df_1(X) \rVert_2 \leq 2Cr^4\lVert B \rVert_2$, yielding the desired Lipschitz constant.
Best Answer
The mapping $g:X\mapsto |X|:=(XX^T)^{1/2}$ that maps a square matrix to its polar part is Lipschitz continuous. This was first proved in Araki and Yamagami (1981), An inequality for the Hilbert-Schmidt norm, Comm. Math. Phys, 81:89-98 (see theorem 1 on p.89 and its proof on p.90). The authors proved that $\|\,|X|-|Y|\,\|_F\le\sqrt{2}\|X-Y\|_F$. The Lipschitz constant $\sqrt{2}$ is the best possible in the general case, but it can be improved to $1$ when $X$ and $Y$ are self-adjoint. See also section 5 of Rajendra Bhatia (1994), Matrix Factorizations and Their Perturbations, Linear Algebra and Its Applications, 197-198:245-276.
Since $\pmatrix{f(A)&0\\ 0&0}=\left|\pmatrix{A&B^{1/2}\\ 0&0}\right|$, your $f$ is also Lipschitz continuous with the same constant $\sqrt{2}$, but I don't know whether this constant can be improved or not.