Cipolla's algorithm http://en.wikipedia.org/wiki/Cipolla's_algorithm is an efficient algorithm for finding a square root modulo a prime number. Is there an efficient algorithm for finding a square root modulo a prime power?
Efficient Algorithm for Finding Square Root Modulo a Prime Power – Computational Number Theory
computational-number-theory
Related Solutions
Based on the comments, it looks like this is not a question specific to Pell's equation, and that you just want to evaluate a single binomial with big inputs as quickly as possible.
If you check the equation directly using fast multiplication algorithms (e.g., Schönhage-Strassen), you can expect the calculation to require about $(\log z )\cdot (\log \log z)$ operations, where $z = \max \{ x, y, D \}$.
If you want a way to quickly find a negative answer, you can check relative sizes and leading digits, then try reduction modulo small integers. If you start with a random negative answer, you can expect it to be eliminated after at most a few divisions (i.e., about $\log z$ operations).
To find a positive answer using modular arithmetic, you can use the Chinese remainder theorem. To prove that the identity holds, it suffices to check it modulo $n$, for $n$ ranging over a collection of positive integers whose least common denominator is larger than $x^2$ and $Dy^2$. It is common to check modulo a large collection of small primes, and this will require about $\log z$ primes and $(\log z)^2$ operations. Another natural choice with a binary computer is Fermat numbers, of the form $2^{2^n}+1$, since the division-with-remainder can be optimized - this ends up looking a lot like direct calculation.
In summary, the advantage of checking modulo small primes is that it lets you quickly eliminate negative answers, and the disadvantage is that (if I'm not mistaken) it is roughly quadratically slower than direct calculation when you have a positive answer. You can choose your method depending on exactly what sort of calculation you plan to do.
There are many algorithms that are suitable for different contexts. Asymptotically, a combination of Newton's method with FFT is the best known method according to R. P. Brent Multiple-precision zero-finding methods and the complexity of elementary function evaluation [Analytic computational complexity (Proc. Sympos., Carnegie-Mellon Univ., Pittsburgh, Pa., 1975), 151–176; MR0423869]. However, for numbers with up to a million digits, Zimmerman's Karatsuba square root algorithm gives better results. For numbers up to 50 digits or so, a good implementation of the traditional schoolbook method is perhaps even better.
Best Answer
Joe Silverman's comment gives the method. (if the square root of A mod p is 0 you have any easy first step.... let $\gcd(A\ ,p^n)=p^j.$ If $j$ is odd, give up, otherwise let $A=p^{2k}B$ and find the $\mod p \ $ square root of $B$ (if it is a quadratic residue.)
I ascertained this by looking at the modular square root code in Maple (a bit tricky to see the subprocedures..).
According to Wikipedia the Tonelli-Shanks Algorithm is more efficient that Cipolla's for odd primes not of the form $64Q+1$: Let $m$ be the number of bits in the binary expansion of $p$ and $p-1=Q2^S$ with $Q$ odd. Then it is asserted that Cipolla's method is better exactly when $S(S-1)>8m+20$. Of course for even primes neither method is needed.
The designers of Maple seem to have determined or decided that trying $2,3,4,\cdots$ is best for primes under $80$ or so. I wasn't able to understand (in the limited time I put into it) which of the the modular square root methods Maple uses for the prime case for larger primes.