It sounds like you're talking about what's sometimes called a regressogram, with a log-scaled x-variable.
There are a number of issues here, not necessarily in logical order:
the quantity you're plotting is a mean, so if you want to plot median absolute deviation, it's the MAD of the means you want.
your suggestion $\text{MAD}/\sqrt n$ leads to the question "when is the MAD of the mean equal to the MAD of the data divided by $\sqrt n$?"
when you say "it seems that median absolute deviation is a better estimator than mean absolute deviation" ... that depends what we're talking about - a better estimator of what?, and under what circumstances?
So, "when is the MAD of the mean equal to the MAD of the data divided by $\sqrt n$?"
The answer is, unlike the situation with standard deviation, this is not generally the case. The reason why standard deviations of averages scale as they do is that variances of independent random variables add (more precisely, the variance of the sum is the sum of the variances when the variables are independent), irrespective of the distributions of the components (as long as the variances all exist). It is this particular property that largely accounts for the popularity of variances and standard deviations.
Neither the median deviation, nor the mean deviation have that property in general.
However, when the data are normal, they will in effect inherit that property, since the ratio of the population mean deviation or median deviation to the standard deviation at a normal will be a constant, normals are closed under convolution, and standard deviations scale that way.
If the data were reasonably close to normal, it could perhaps be adequate.
What else might be done? One way to estimate the standard error of a statistic is via the bootstrap; for the mean deviation - being a mean - this should do well in large samples. Unfortunately, medians don't do so well under the bootstrap, and this issue will carry over to median absolute deviations.
If you have some probability model for your data, there's also simulation as a way of approaching the problem.
I thought about it some more, and I have a couple of ideas.
(1) About measurement uncertainty: from what you said, it's big enough to take into account. I agree with the formula for qi -- this is just the mass of the distribution for x[i] which falls into B[k]. From that, it looks to me that the mean of the proportion of x which falls into B[k] (let's call that q(B[k])) is the sum of those bits over all the data, i.e., q(B[k]) = sum(qi, i, 1, N). Then the height of the histogram bar k is q(B[k]). and its variance is q(B[k])*(1 - q(B[k])).
So I disagree about the variance -- I think the summation over i should be inside q in variance = q*(1 - q), not outside.
It occurs to me that you'll want to ensure that the q(B[k]) sum to 1 -- maybe that's guaranteed by construction. In any event you'll want to verify that. EDIT: Also, as the measurement error becomes smaller and smaller, you should find that the q(B[k]) converges to the simple n[k]/sum(n[k]) estimate.
(2) About prior information about nonempty bins, I recall that adding a fixed number to the numerator and denominator in n[k]/n, i.e., (n[k] + m[k])/(n + sum(m[k])), is equivalent to assuming a prior over the bin proportion, with the prior mean being m[k]/sum(m[k]). As you can see, the larger m[k], the stronger the influence of the prior. (This business about the prior count is equivalent to assuming a conjugate prior for the bin proportion -- "conjugate prior beta binomial" is a topic you can look up.)
Since q(B[k]) is not just a proportion of counts, it's not immediately clear to me how to incorporate the prior count. Maybe you need (q(B[k]) + m[k])/Z where Z is whatever makes the adjusted proportions sum to 1.
However, I don't know how hard you should try to fix up the bin proportions. You were saying you don't have enough prior information to pick a parametric distribution -- if so, maybe you also don't have enough to make assumptions about bin proportions. That's a kind of higher-level question you can consider.
Good luck and have fun, it seems like an interesting problem.
Best Answer
The $R^2$ value is essentially a rescaling of the $F$-statistic, see this question. The F-statistic is very widely used for model-fit checks, as part of ANOVA.