I think it will be helpful to separate the question into two parts:
- What is the functional form of your empirical distribution? and
- What does that functional form imply about the generating process in your network?
The first question is a statistics question. If you've applied the methods of Clauset et al. for fitting the power-law distribution and those methods gave you a $p>0.1$ for the upper-tail fit, then you're allowed to say that the upper tail (looking at your figure, this is $x\geq15$ or so) is plausibly power-law distributed. If the methods gave you $p<0.1$ then you can't say that, even if the fit looks good to the eye. Deciding whether the log-normal fit is better means basically doing the same thing. Can you reject that model as a generating process for the degree distribution data you have? If not, then you're allowed to put the log-normal into the "plausible" category.
As a small technical point, degrees are integer quantities, while a log-normal distribution requires a continuous variable, so the two are not really compatible (unless you are only talking about $x\gg1$ when the difference between integers and real values for these kinds of questions becomes negligible). To do the statistics properly, you'd want to write down the pdf for a "log-normally" distributed integer quantity, derive estimators for it and apply those to your data.
The second question is actually harder of the two. As some people pointed out in the comments above, there are many mechanisms that produce power-law distributions and preferential attachment (in all its variations and glory) is just one of many. Thus, observing a power-law distribution in your data (even a genuine one that passes the necessary statistical tests) is not sufficient evidence to conclude that the generating process was preferential attachment. Or, more generally, if you have a mechanism A that produces some pattern X in data (e.g., a log-normal degree distribution in your network). Observing pattern X in your data is not evidence that your data were produced by mechanism A. The data are consistent with A, but that doesn't mean A is the right mechanism.
To really show that A is the answer, you have to test its mechanistic assumptions directly and show that they also hold for your system, and preferably also show that other predictions of the mechanism also hold in the data. A really great example of the assumption-testing part was done by Sid Redner (see Figure 4 of this paper), in which he showed that for citation networks, the linear preferential attachment assumption actually holds in the data.
Finally, the term "scale-free network" is overloaded in the literature, so I would strongly suggest avoiding it. People use it to refer to networks with power-law degree distributions and to networks grown by (linear) preferential attachment. But as we just explained, these two things are not the same, so using a single term to refer to both is just confusing. In your case, a log-normal distribution is completely inconsistent with the classic linear preferential attachment mechanism, so if you decide that log-normal is the answer to question 1 (in my answer), then it would imply that your network is not 'scale free' in that sense. The fact that the upper tail is 'okay' as a power-law distribution would be meaningless in that case, since there is always some portion of the upper tail of any empirical distribution that will pass that test (and it will pass because the test loses power when there isn't much data to go on, which is exactly what happens in the extreme upper tail).
Ok, after getting statistical consultation from my university and reading Clauset et al. and White et al. more carefully, I think I finally understand what is the right approach. So adapting Clauset et al.'s terminology, assume I am trying to fit a power law of the following form:
$$f(x) = x^{-\alpha}$$
Previous Issues
The first issue I had was that I assumed that power laws would generally have positive exponents, but apparently this is not so; it seems like in literature, the term "power law" typically referrs to power laws with negative exponents. In the beginning however, this confused me, so because of the positive $\alpha$ values, I thought Clauset at al. did not apply to my case.
The second issue was that I misunderstood what Clauset at al.'s $x_\text{min}$ was used for, as I was still thinking of power laws with a positive exponent, so the lower limit didn't make much sense to me. Now I get that for negative exponents, $f(x)$ diverges for $x \rightarrow 0$, and in order to have a probability density function $p(x)$, we need a lower limit for $x$ such that $\int_{x_\text{min}}^\infty p(x) dx = 1$.
The third issue was that I thought in my original formulation of $f(x)=k \cdot x^{-a}$, I should be able to fit $k$ and $\alpha$ independently. But now I understand that $k$ is not an independent parameter, but the normalization constant which depends on $x_\text{min}$ and $a$. So that's why I was never able to determine the MLEs of both $k$ and $a$.
Finally, the last issue was that I misunderstood what Clauset et al. referred to as the continuous and discrete cases. I thought this was referring to whether the data I was trying to fit was continous or discrete--in which case it was discrete of course--, when it was actually referring to whether the underlying distribution the data were drawn from is continuous or discrete--in which case it is continuous.
Solution
So following Clauset et al.'s definition,
$$p(x) = C \cdot x^{-\alpha},$$
where $C$ is the normalization constant. For the continuous case this means that
$$C = (\alpha - 1) \cdot x_\text{min}^{\alpha - 1}.$$
More importantly however, instead of using the approximation suggested for the discrete case (which I originally thought I had to use but which doesn't work for my data), for the continuous case, the MLE for $\alpha$ is easy to compute via:
$$\hat{\alpha} = 1 + n \left[\sum_{i=1}^n \text{ln} \frac{x_i}{x_\text{min}}\right]^{-1},$$
"where $x_i$, $i = 1, ..., n$, are the observed values of $x$ such that $x_i \ge x_\text{min}$".
With this, it is only a matter of estimating the correct $x_\text{min}$. Clauset et al. for example suggests to fit the data using each $x$ value as $x_\text{min}$, and choosing the one that produced the smallest Kolmogorov-Smirnov distance.
Luckily, there already exists a Python library that can do all this: powerlaw
(accompanying paper). (The same and similar packages are also awailable for several other programming languages, see e.g. here.)
All that being said however, I guess one should be careful if a power law is even the right model in the first place, or if instead for example an exponential or lognormal distribution is the better choice. Next to the papers, this is for example also discussed by Aaron Clauset here, and the powerlaw
package offers likelihood tests to check this. Though in my case, I think a power law still makes the most sense when considering the underlying physicals laws that generate the particle distribution:
Lastly for those interested, there is also another more recent paper on the power law fitting by Virkar and Clauset.
Best Answer
Not sure if it is still of interest, however, you are estimating the exponent of the CDF of the degree which is equal to $-\alpha+1$, where $\alpha$ is 3 for the Barabasi Albert model, therefore, you should add +1 to your estimation which indeed gains something close to 3 as expected.