One place to start would be Silverman's nearest-neighbor estimator, but to add in the weights somehow. (I am not sure exactly what your weights are for here.) The nearest neighbor method can evidently be formulated in terms of distances. I believe your first and second nearest neighbor method are versions of the nearest-neighbor method, but without a kernel function, and with a small value of $k$.
I think the part you're asking about is this part of the paper:
It can be shown (See Silverman 1986) that the optimal choice for $h$ (in the sense of minimising the mean square error)
for the triangular kernel is
$h=2.575\sigma N^{-\frac15}$
Note that the paper you mention references Silverman 1986 ... whose book is on density estimation (that's even the title). The bandwidth calculation they refer to is for density estimation.
This particular one derives from Silverman's optimal bandwidth estimator (see here for example). If we start with equation 3.2.1 in Silverman's book (that you mentioned):
$$h_\text{opt}=k_2^{-2/5}\left\{ \int K(t)^2 \text{d}t \right\}^{1/5}\left\{ \int f''(x)^2\text{d}x \right\}^{-1/5} n^{-1/5}$$
where $k_2=\int t^2 K(t) dt$.
and substitute a particular assumption for $f$, then a particular kernel for $K$ will yield an optimal bandwidth. Different kernels have different constants. Using Gaussian for both gives 1.059
The figure 2.575 is for the triangular kernel. I currently don't know for certain what is used for $f$ to get that but my current belief is that it's based on Gaussian $f$. (Edit: yes, it looks like that belief was correct; I took some time to sit and work it through.)
Calculating:
$K(t) = (1-|t|) I_{[-1,1]}$
$k_2=\int t^2 K(t) dt = \frac16$; $\int K(t)^2 \text{d}t = \frac23$; $\int f''(x)^2\text{d}x = \frac{3}{8\sqrt{\pi}}\sigma^{-5}$
$$h_\text{opt}=k_2^{-2/5}\left\{ \int K(t)^2 \text{d}t \right\}^{1/5}\left\{ \int f''(x)^2\text{d}x \right\}^{-1/5} n^{-1/5}$$
$$ = (36\cdot \frac23\cdot \frac{8\sqrt{\pi}}{3})^{\frac15}\sigma n^{-\frac15}$$
$$ = (64\sqrt{\pi})^{\frac15}\sigma n^{-\frac15}$$
$$ = 2.576 \sigma n^{-\frac15}$$
The discrepancy in the last figure is almost certainly due to premature rounding (specifically, it may be due to taking the rounded off value for a value related to $\int f''(x)^2\text{d}x$ in the Gaussian case directly from Silverman).
Best Answer
I assume your sample is from a continuous distribution. Then the situation is somewhat like it is for making histograms. If you have enough data, you want to choose the width of histogram bars to be just thin enough to suggest the shape of the population density curve without the 'raggedy' look that comes from neighboring bars of very different heights (maybe interspersed with some empty bins).
If the population density is multi-modal (as for a mixture of several distributions), then you don't want bars so thick that you can't see the individual modes. [The link, provided by @whuber while I was writing this answer, pays particular attention to detecting modes.]
Using the KDE (kernel density estimator) of
density
in R, I have found that the default bandwidth is usually about right. Sometimes I halve it and sometimes I double it, depending on the task at hand.Sampling from a normal distribution, you may need a sample as large as a thousand to get a useful density estimator.
The three panels below compare R's default histogram bins, KDE badwiths (dotted brown), and population density (blue) for samples of sizes 500, 1000, and 5000. (Maybe the histogram at the right could use thinner bars, but eh default KDE bandwidth seems about right.)
R code for figure:
Now we look at a bimodal population with (for simplicity, exactly) a 50:50 mixture of distributions with means sufficiently far apart to make bimodal populations.
Here again, we use defaults in R. The histogram at the left should probably have fewer bars, but the KDE bandwidth seems OK. Often, the KDE bandwidth is less 'fussy' than the width of histogram bars.
Note: If the population density has restricted support (for example, $[0,\infty)$ for gamma distributions, $[0,1]$ for beta distributions, the KDEs in R are often positive just outside the region of support.