AIC will be negative whenever $k<\ln(L)$ (see this question and answers Negative values for AIC in General Mixed Model). Also keep in mind the log-likelihood can be positive, since the likelihood function is usually a probability density, i.e. it's always positive and can exceed $1$.
The interpretation, when comparing models, is the same though. The smallest AIC provides the "best" model, in the sense that it tends towards models with small $k$ and large $\ln{(L)}$.
Regarding likelihood, the higher the "better". Also, as log-likelihood is a monotonic transformation of likelihood, the same applies. This makes sense, since we want to minimize $\text {AIC}$:
$$\text {AIC} = 2k -2\ln(L)$$
Regarding your example in R, I'm pretty sure fit1$likelihood
and fit2$likelihood
are actually two times the negative log-likelihoods. The reason is simple:
fit1$likelihood - fit1$AIC
#[1] -38
fit2$likelihood - fit2$AIC
#[1] -54
If the AIC calculation is right, this means fit1
has 19 parameters and fit2
has 27 parameters.
The respective formulas for these two quantities are:
$$\text{deviance} = 2\log\mathcal{L}(\text{saturated model}\, |\, \text{data}) - 2\log\mathcal{L}(\text{model}\, |\, \text{data})$$
$$\text{AIC} = 2k- 2\log\mathcal{L}(\text{model}\, |\, \text{data})$$
where $\mathcal{L}$ is the likelihood and $k$ is the number of model parameters. For a fixed dataset and model family, the saturated model is fixed, and therefore for our purposes the equation for deviance is:
$$\text{deviance} = \text{constant} - 2\log\mathcal{L}(\text{model}\, |\, \text{data})$$
Plotting AIC against deviance the way that you've done, we expect the data to fall along a straight line if there exist constants $c_1$ and $c_2$ such that:
$$c_1 \cdot \text{AIC} + c_2 \approx \text{Deviance}$$
This can only be the case if $k \propto \log\mathcal{L}$. Although this is not a relationship that I have previously come across, it seems plausible.
However it could also be that a different formula for Deviance is being used altogether, as intimated here.
Best Answer
$L$ is not a joint probability (joint cumulative probability density) but joint probability density. Since density only needs to be non-negative and is not bounded from above, $\operatorname{ln}(L)$ can be both positive and negative. Hence, $AIC$ can also be both positive and negative.