Create a graph in which the points are nodes and two points are connected with an edge if and only if they lie within distance $d$ of each other. Stated in these terms, your criteria become
In short, you want to compute the connected components of this graph. Linear-time algorithms are well known and simple to execute, as described in the Wikipedia article.
To do this efficiently for a lot of points in 3D, you want to limit the amount of distance calculation you perform. Data structures such as octrees, or even simpler ones (such as exploiting a lexicographic sorting of the points) work well.
A somewhat complicated answer to your first question about distances goes like this.
Rencher's Multivariate Analysis cites Lance and Williams (1967) who proposed a general (flexible beta) method underlying most of the existing hierarchical cluster analysis: for three objects A, B and C, be that clusters or individual points considered to be clusters of size 1, if clusters A and B are joined into AB, then the new distance from AB to C can be expressed by
$$
d(AB,C) = \alpha_A d(C,A) + \alpha_B d(C,B) + \beta d(A,B) + \gamma| d(C,A) - d(C,B) |,
$$
They suggested $\alpha_A + \alpha_B + \beta=1$, $\alpha_A = \alpha_B$, $\gamma=0$ and $\beta < 1$, although you could try other choices. E.g., single linkage is $\alpha_A = \alpha_B = 1/2$, $\beta=0$, $\gamma=-1/2$ (contracting the space), complete linkage is the same except for $\gamma=1/2$ (diluting the space), and Ward's method is $\alpha_A = (n_A + n_C)/(n_A + n_B + n_C)$, $\alpha_B = (n_B + n_C)/(n_A + n_B + n_C)$, $\beta= - n_C/(n_A + n_B + n_C)$, and $\gamma=0$ (somewhat space-contracting). The book discusses the properties of the algorithms, such as monotonicity and space contraction/dilution as functions of these parameters.
So an algorithm can be build just based on the distances, and does not need the data matrix. However, the distance matrix for typical problems (sample size $n$ > dimension of the data $p$) takes more memory than the data matrix, so this may not be a very efficient way of approaching the problem, unless of course the data matrix is all you have.
Best Answer
The obvious first thing to try is hierarchical clustering.
Because it can use an arbitrary distance matrix.
Why don't you just try some? That's faster than asking on a web site...