There's two things going on here that both lead to wanting to low autocorrelation, but from slightly different angles.
First is that if you have some sort of sampler (i.e. Metropolis Hasting, Gibbs sampler, whatever), you would like to have very little autocorrelation in the samples. This can be explained very easily by thinking of the MCMC error: for the posterior mean, for example, the MCMC error will be lower if you have weakly correlated samples than if you have strongly correlated samples. That's the easy explanation, but it's worth thinking about more too. In general, it is easy to see that a sampler that produces weakly correlated samples is preferred over one that produces strongly correlated samples.
Second, which I think is more what you are getting, is using the empirical autocorrelation to determine what "burn-in" to remove. The reasoning behind this is that typically, we don't have great starting points; we often begin far from the mode. Including these starting points in a finite sample can add bias, as we are over representing these start points by beginning there and potentially slowly drifting toward the mode. Different samplers may "drift" faster than others. Once we get close to the mode, we should essentially bouncing around the mode at that point. But here in is the important note: when we are far from the mode, many samplers will have strong 'tug' back toward the mode, as moving toward the mode will likely greatly increase the posterior probability. However, when we are close to the mode, our sampler should be taking random jumps, with much less 'tug' toward the mode (because the samples are much closer). The stronger the tug, the higher the autocorrelation. Thus, if we observe that there is heavy autocorrelation early in our chain, which then levels off after awhile, this is indicative of being very far from the mode early on.
It means that your chain most likely did not converge. By this I mean you should be wary of the entire chain, not just worry about the dimensions with low effective number of samples.
Solutions
Burn-in (discarding early part of the chain) - see also this question
Sometimes a low effective number of samples is just because the chain started in a low-probability region, and found the basin of convergence (the high probability region, or typical set) only later on.
I do not recommend that now you just play with burn-in, since with only one chain it's hard to tell what's going on.
Instead, you might want to run a few chains in parallel, as opposed to a single long chain (see MacKay's book, Chapter 29). With multiple chains it is usually easier to spot lack of convergence, although there are different schools of thought here on the number of chains (I usually do 3 or 4).
Regarding burn-in, there are also several opinions. Some people, such as Geyer, say that it's pointless, as long as you are sure that you start in a high probability region:
Any point you don't mind having in a sample is a good starting point.
However, this is easier said than done in 150 dimensions. State-of-the-art statistical packages burn-in as much as 50% of the chain (see Stan), but one of the reasons for such a long burn-in is that it is also used to adapt some of the sampler parameters.
In your case, I definitely recommend that you initialize your sampler from a good guess (e.g., somewhere around the mode is better than nothing, although the mode itself is likely not in the typical set), and do some burn-in since in high dimensions it's hard to know where the probability mass resides (not the same as the probability density).
In my opinion, it's better to burn-in more than less (at worst, you are simply throwing away some effective samples, whereas if you do not burn-in you might be keeping samples that are not representative of the target distribution).
Sample more
Simple enough, start where your MCMC chain ended (or from some other good guess, as mentioned above), discard the previous chain, and take more samples overall.
Better tune your sampler
I don't know which MCMC method you are using, but most samplers have several tunable hyper-parameters (e.g., jump length(s) for Metropolis-Hastings with Gaussian proposal, mass matrix for Hamiltonian Monte Carlo, etc.). Performance of most MCMC samplers heavily depends on the choice of these parameters, so you might have a look at your sampled distribution and figure out whether you can improve them.
Change sampler
Sometimes, the sampler you are using might be not the best choice. For example, if your problem deals with continuous variables with a smooth, differentiable target distribution, you probably want to use some variant of Hamiltonian Monte Carlo (state of the art is NUTS, as implemented in Stan).
Change parameterization
As suggested by @Björn in the comments, you might want to try a different parameterization of your model.
Simple reparameterizations amount to a linear or nonlinear transformation of your variables; the former, for example, to reduce correlation between variables, and the latter, for example, to get rid of long, fat tails in the posterior which may be hard to sample from. More complex reparameterizations include the addition of auxiliary variables that might help mixing (e.g., by bridging different regions of the posterior), but this becomes extremely model-dependent.
Best Answer
A high auto-correlation literally means that the Markov chain is taking small steps, and is not able to jump long distances. This may be because the proposal distribution has a small variance, which means that the jumps are being proposed too close to the current step. However, if you increase the proposal distribution variance too much, it is possible then that a lot of proposals are rejected, which will also increase the autocorrelation. Thus a balance is required. Usually an acceptable probability of .234 is the target for high dimensional problems, and .44 for 1 dimensional problems. Since you have two dimensions here, you should to tune your proposal variance so that the acceptance ratio is between .234 and .44
This is not simple to answer, but in short, yes, it is a negative thing. The long explanation is, if you have high autocorrelation, then that means that since each sample is highly correlated to the previous sample, the contribution of the new sample is less. That is, the sample is able to provide significantly less information than if it had low autocorrelation. In addition, because samples are highly correlated, it may also mean that the sample has not been able to explore the state space well enough. However, both the issues can be somewhat handled if you just choose a large sample size. There are ways to figure out how many samples are reasonable by accounting for the autocorrelation: my answer here.
One of them, as explained, is the variance of the proposal. Another factor that affects autocorrelation is the starting value of the Markov chain. If the starting value is far from an area of high probability, then the chain might move very slowly towards towards that area. Yet another reason can be multimodality of the target distribution. If the target is multimodal, the chain might get "stuck" in a mode for a long time, and not be able to jump across modes often. This increases autocorrelation as well. Finally, some target distributions are complicated in other ways than being multi-modal. It might have heavy tails, or have non-smooth density, which will also affect autocorrelation