When I'm trying to be relatively uninformative, I have tended to use a uniform prior on $\ln \sigma$ and specify an upper bound, which corresponds to $p(\sigma) \propto 1/\sigma$ over a finite range—relatively uninformative, and equal to Jeffreys' prior over the range (not equal to what the Jeffreys prior would be if you knew there was an upper bound on $\sigma$ and what it was.) If the posterior piles up against your upper bound, you can increase it and rerun, unless you have some strong reason for choosing that upper bound. This was suggested by Andrew Gelman in the Prior distributions for variance parameters paper here. (Some of the other articles in this issue of Bayesian Analysis are possibly relevant too, hence the link to the journal page.)
However, recently I've tried the beta-prime prior suggested in the first response to
Weakly informative prior distributions for scale parameters and that worked out well for me also. Importance sampling on the output of the MCMC indicated that the differences between the posteriors of the parameters of interest using the two priors were trivial, which, after all, is what you want when you're trying to be relatively uninformative - and it gets you away from that annoying specification of an upper bound on $\sigma$.
This question and answer may also be relevant:
Random effect on scale parameter
Don't confuse the statistic with the p-value.
The size of the KS-statistic was small, meaning the biggest distance between the empirical distribution and the power-law was small (i.e. a close fit). The corresponding p-value follows the statistic and is large (i.e. doesn't show a deviation large enough to be able to tell from deviations due to randomness).
Assuming they've calculated the p-value correctly, there's nothing there that indicates a deviation from the proposed model. Of course, with enough data almost any distribution will be rejected, but that doesn't necessarily indicate a poor fit* or mean it wouldn't make a suitable model for all kinds of purposes.
* (just one whose deviations from the proposed model you can tell from randomness)
That a continuous function might fit a discrete distribution well enough not to be detected isn't necessarily surprising, as long as the discreteness isn't so heavy** or there isn't so much data that the deviations between the step-function nature of the actual distribution and the continuous form of the tested distribution becomes obvious from the sample.
** e.g. where most of the probability is taken up by only a small number of values.
That said, if you'd like a discrete distribution that can look sort of lognormalish, a negative binomial is one that can sometimes look a bit like a
"discrete lognormal".
Very heavy-tailed distributions can be hard to assess from Q-Q plots because the high quantiles are extremely variable and so deviations even from a correct model can be considerable (to assess how much, simulate data from similar power-law distributions).
If you don't have zeros in your data, I'd suggest looking on the log-log scale, or if the discreteness dominates the appearance on that scale, you might consider a P-P plot (which will work even with zeroes).
Rather than just trying to guess distributions from some arbitrary list of common distributions, what should drive the choice of distribution and alternatives is theory, first and foremost. I'm not really in a position to do that for you.
If you haven't read A. Clauset, C.R. Shalizi, and M.E.J. Newman (2009), "Power-law distributions in empirical data" SIAM Review 51(4), 661-703
(arxiv here) and Shalizi's So You Think You Have a Power Law — Well Isn't That Special? (see here), I would suggest giving them both a look (probably the second one first).
Best Answer
Reference
$$ x \sim \log \mathcal{N}(\mu, \sigma^2) \\ \text{if} \\ p(x) = \frac{1}{x \sqrt{2\pi} \sigma} e^{- \frac{\left( \log(x) - \mu\right)^2}{2\sigma^2}}, \quad x > 0 $$
where $$ \text{E}[x] = e^{\mu + \frac{1}{2}\sigma^2}. $$
Note that $$ y \sim \log \mathcal{N}(m, v^2) \iff \log(y) \sim \mathcal{N}(m, v^2), $$
per this Q&A.
Answer
Theoretically? In most situations yes (see the logical equivalency above). The only case I found where it was useful to use the log-normal distribution explicitly was a case study of pollution data. In that instance, it was important to model weekdays and weekends differently in terms of pollution concentration ( $\mu_1 > \mu_2$ in the prior*), but have the expected values of the two log-normal distributions without restriction (I had to allow $e^{\mu_1 + \frac{1}{2}\sigma_1^2} \le e^{\mu_2 + \frac{1}{2}\sigma_2^2}$). Which day each measurement was taken was unknown, so the separate parameters had to be inferred.
You could certainly argue that this could be done without invoking the log-normal distribution, but this is what we decided to use and it worked.
The reason for this is just a consequence of our notion of distance on the support. Since $\log$ is a monotone increasing function, log-transforming variables preserves order. For example, the median of the log-normal distribution is just $e^\mu$, the exponential of the median of the log-values (since the normal distribution mean is also its median).
However, the $\log$ function only preserves order, and not the distance function itself. Means are all about distance: the mean is just the point which, when points are weighted by their probabilities, is the closest to all other points in the Euclidean sense. All the log-values are being compressed towards $0$ in an uneven way (i.e., larger values are compressed more). In fact, the log of the mean of the log-normal distribution is higher than the mean of the log-values (i.e. $\mu$) by $\sigma$: $$ \log \left(e^{\mu + \frac{1}{2} \sigma^2} \right) = \mu + \frac{1}{2} \sigma^2 > \mu. $$ That is, the mean of the log-values is compressed in as a function of the spread of the distribution (i.e., involving $\sigma$) as a result of the $\log$ function compressing distances in an uneven way.
*As a side note, these kinds of artificial constraints in priors tend to under-perform other methods for inferring/separating distributions.