Some comments, too extensive to fit into the comment box:
(1) There is a fairly recent reworking of at least some parts of the proof in
the book "Heegner points and Rankin $L$-series", MSRI Publ. 49. (Brian Conrad
in particular has a paper in there reworking the deformation theory arguments.)
(2) The theorem is a computation: one computes the height of the Heegner point,
using Neron-Tate local heights, and relates the answer (a sum of contributions from
each place) to a corresponding expression for the derivative.
(3) It is Kolyvagin's work which shows that if the Heegner point is non-zero,
then it generates the Mordell-Weil group (up to finite index); so if you want
motivation for the truth of Gross--Zagier, you can think of it as being a consequence
of BSD + Kolyvagin. (This may be ahistorical, though.)
(4) Historically, Birch was the one who computed Heegner points on elliptic curves,
and found that they were generators of the Mordell--Weil group (up to finite index)
precisely when the rank was one. This was a big source of encouragement for Gross
(as he explained at one point when I was in grad school), because it meant that
there should be a relation between the derivative at 1 and the height of the
Heegner point, and one just had to find it.
(5) The arithmetico-geometric parts of Gross--Zagier are wonderful; I wouldn't at all
think of it as futile to study them. I've not studied the analytic parts, but no doubt they're equally wonderful.
(6) You might start with the Crelle paper of Gross--Zagier, which essentially treats the
case of level one. Since the modular curve of level one has genus 0, the height is necessarily zero, and so one gets a very nice formula relating the sum of the finite local heights to the archimedean local height. And one can prove the same formula another way,
using a special case of the analytic arguments that in the general setting compute the
derivative. The fact that the same formula is obtained these two different ways is a special case of the general Gross--Zagier formula; but it may be simpler to understand the two sides and the comparison between them in this level one setting.
(7) As far as I understand, Kato says nothing in the analytic rank one case.
For BSD in this case, one needs Gross--Zagier plus Kolyvagin.
I don't think that this should be too hard: take a simple family of curves, such as
$y^2 = x^3 + px$ or something similar, and choose $p$ from a certain set of residue classes to guarantee that the 2-Selmer group has rank 1. You can complete the proof either by invoking rather deep constructions using Heegner points, or finding a family for which conditions such as $p = a^4 + b^2$ (there are infinitely many primes of this form) give you a global point (choose your family in such a way that $b^2 = px^4 - y^4$ occurs as a
principal homogeneous space in your standard 2-descent; see e.g. Silverman's book).
Sorry for being a little bit vague - a hard disk crash currently prevents me from looking at my own notes.
Best Answer
The BSD conjecture for an abelian variety $A$ over a function field holds if Ш$(A)[\ell^\infty]$ is finite for some prime $\ell$ ($\ell = p$ allowed). This is a theorem by Schneider, Bauer and Kato-Trihan. If $A$ is a constant abelian variety, Ш$(A)$ is finite by Milne's PhD thesis.
Edit: Since the analytic rank $\rho$ is always greater or equal than the algebraic rank, one has BSD if $\rho = 0$ (by the equivalence of weak BSD and the finiteness of an $\ell$-primary component of Sha). I show this inequality even for Abelian schemes over higher dimensional bases over finite fields in http://kellertimo.name/Height.pdf, Lemma 2.17.