I assume that PreferA = 1 when one prefered A and 0 otherwise and that ControlFALSE = 1 when treated and 0 when control.
The odds of preffering A when a person did not do so previously and did not receive a treatment (ControlFALSE=0 and PreferA=0) is $\exp(3.135)= 23$, i.e. there are 23 such persons who prefer A for every such person that prefers B. So A is very popular.
The effect of treatmeant refers to a person did not prefer A previously (PreferA=0). In that case the baseline odds decreases by a factor $\exp(-2.309) = .099$ or $(1-.099) \times 100\%=-90.1\%$ when she or he is subjected to the treatment. So the odds of choosing A for those who were treated and did not prefer A previously is $.099*23=2.3$, so there 2.3 such person who prefer A for every such person who prefers B. So among this group A is still more popular than B, but less so than in the untreated/baseline group.
The effect of prefering A previously refers to a person who is a control (ControlFALSE = 0). In that case the baseline odds decreases by a factor $.006$ or $-99.4\%$ when someone prefered A previously. (So those that pefered A previously are a lot less likely to do so now. Does that make sense?)
The interaction effect compares the effect of treatment for those persons that prefered A previously and those that did not. If a person prefered A previously (PreferA =1) then the odds ratio of treatment increases by a factor $\exp(2.850) = 17.3$. So the odds ratio of treatment for those that prefered A previously is $17.3 \times .099 = 1.71$. Alternatively, this odds ratio of treatment for those that prefered A previously could be computed as $\exp(2.850 - 2.309)$.
So the exponentiated constant gives you the baseline odds, the exponentiated coefficients of the main effects give you the odds ratios when the other variable equals 0, and the exponentiated coefficient of the interaction terms tells you the ratio by wich the odds ratio changes.
In regression analysis without categorical variables, it's straight forward to include numeric predictors in a regression model. For example, if we wanted to predict, say, weights of children in a school as a linear function of age, we could specify the following model:
\begin{eqnarray*}
Y_{i} & = & \beta_{0}+\beta_{age}X_{age,i}+\epsilon_{i}
\end{eqnarray*}
where $Y_i$ is the weight of the $i$th child, $X_{age,i}$ is the age of the $i$th child, $\epsilon_i$ is a random error term associated with the $i$th child, $\beta_0$ is an intercept parameter and $\beta_{age}$ is the slope parameter associated with the variable age. A fitted model for this might look like this:
\begin{eqnarray*}
E[Y] & = & 35+3X_{age}
\end{eqnarray*}
What this model simply says, is that the expected weight for any student can be estimated by $35 + 3$ times the value of student's age. So if the student's age was $7$, then the expected weight of this student would be $35 + 3\times 7 = 56$.
Now, let's say, instead of using age to predict weight, we were interested in predicting a student's weight based on his race. How could this be represented in mathematical terms? After all, we can't multiply categories of race with estimated regression coefficients. For example, how would this function make any sense if a student were black: $E[Y]=35 + 3 \times Black$, since it doesn't make sense to multiply "3 times black" or "3 times white" or any category for that matter?
Dummy coding is a way to handle this. Dummy variables are a simple way to "code" (or map or translate) categorical information or categorical representations in our dataset so that categorical groups can be represented in mathematical terms in a regression model. They also facilitate interpretation. If we have a categorical variable, for example, "Race," then we may have several different categories/levels of this variable, say, Black, White, and Asian. How can we create a regression model and include race as a predictor similar to the way we worked with age?
Well, it turns out, that with dummy variables, we create new variables (the dummy variables) that are coded either zero (0) or one (1) to represent the categories. Generally speaking when $c$ categories are present, we will need $c-1$ dummy variables. In the race example, we have three race categories, so $c=3$. This means we will need $c-1=3-1=2$ dummy variables to represent the $3$ racial categories in our regression model. We'll call the new variables $X_{black,i}$ and $X_{white,i}$ (if you are wondering where the Asian category went, hold tight: I'll explain shortly). Then we'll code the information in our dataset as follows:
\begin{eqnarray*}
X_{black,i} & = & \begin{cases}
1 & \text{if the $i$th student is black}\\
0 & \text{otherwise}
\end{cases}
\end{eqnarray*}
and
\begin{eqnarray*}
X_{white,i} & = & \begin{cases}
1 & \text{if the $i$th student is white}\\
0 & \text{otherwise}
\end{cases}
\end{eqnarray*}
Using this dummy coding, a regression model for weight, $Y_i$ based on this dummy would be:
\begin{eqnarray*}
Y_{i} & = & \beta_{0}+\beta_{black}X_{black_i}+\beta_{white}X_{white,i}+\epsilon_i
\end{eqnarray*}
and the corresponding response function might be something like:
\begin{eqnarray*}
E[Y] & = & 35+5X_{black}+3X_{white}
\end{eqnarray*}
where $\hat{\beta}_0=35$, $\hat{\beta}_{black}=5$, and $\hat{\beta}_{white}=3$. To interpret this model, it's instructive to write out the model that would be estimated for a black student. When a student is black, $X_{black}=1$ and $X_{white}=0$, so the response function becomes:
\begin{eqnarray*}
E[Y] & = & 35+5\times1+3\times0=35+5=40
\end{eqnarray*}
Now, when a student is white, $X_{black}=0$ and $X_{white}=1$, so the response function becomes:
\begin{eqnarray*}
E[Y] & = & 35+5\times0+3\times1=35+3=38
\end{eqnarray*}
If the student is Asian, then both $X_{black}=0$ and $X_{white}=0$, and the response functions just becomes:
\begin{eqnarray*}
E[Y] & = & 35+5\times0+3\times0=35+0+0=35
\end{eqnarray*}
As you can see the Asian category is represented by just the intercept in our model, so we don't need any $X$-value coded $1$ to represent it. By coding $X_{white,i}=0$ and $X_{black,i}=0$, we are representing the Asian racial category.
So, with the dummy coding, a black student is expected to have a mean weight of $40$ bounds, a white student a mean weight of $38$ bounds, and an Asian student a mean weight of only $35$ pounds.
As you can see dummy coding allow the regression model to change depending on the categories you are trying to predict. If you'd like to see some additional examples of how dummy coding works, this website has some excellent examples and explanations.
Lastly, it should be noted that you can use this type of coding universally in regression modeling, so it can be used with ANOVA models, mixed models, $3\times3$ factorial models, etc.
Best Answer
It is not "forbidden" to enter further control variables to a model with interaction terms. It just makes the model larger (more complex). What is referred to as mediated moderation or moderated mediation is just a certain linear model (see e.g. here). Whether this model represents your theoretic beliefs can only be judged by you but I strongly recommend the books by Andrew Hayes on that.
Independently from the exact model you are aiming for, you can enter further variables. (If you see it from a path model perspective, you could even decide which of the variables you want to control for your control variables.) You only slightly change the interpretation of your model in that all coefficients of the moderated describe the imaginary case that the other variables are held constant.
You mainly need to consider this when you illustrate your interaction effect. That is, when you plot your conditional regression coefficients you should also state which level of the control variables you conditioned on (typically "mean in all control variables").
Hope this helps - feel free to ask further clarification questions so I can optimize my answer.