Regarding 1. No, I don't think so. But any statistical program (R, SAS, SPSS) will do this for you. Unless you had numbers for the different categories, in which case what you did is just wrong. What software did you use? Can you show us code? (From question 2 it looks like you use SPSS).
Regarding 2. Can you please give us context? What variables are these? If M is "causing" X, then it sounds like X is mediating the relationship between X and Y.
Regarding 3. I Googled Hayes macro and found that he has written different macros for different kinds of mediation. Given that, it isn't surprising that they give different answers. What are you trying to do?
No, this is not possible.
An ANOVA aims to test differences between means of different groups/conditions on one independent variable. E.g. DV: number of strawberries/strawberry plant; IV1: three watering groups (100ml/day, 200ml/day, 300ml/day); IV2: three fertilizer groups (5g/day, 10g/day, 15g/day). Thus, you'd have a 3x3 factorial ANOVA without repeated measures, which you could turn into a 3x3 repeated measures ANOVA by measuring plant height on several points in time (e.g. after 7, 14, and 21 days). An ANCOVA aims to do the same, but in this case you introduce at least one continuous variable, a so-called covariate, for which you do not specify an interaction with the other factors. This reduces error variance in your IV (plant height) due to this covariate and thereby increases statistical power for the effects of your IVs. Yet, you could as well set up AN(C)OVAs as regression models instead (for which you'd have to code your factorial IVs differently), although in some statistics courses ANOVAs are taught differently, i.e. by formulas which actually compare different parts of the total variance in your DV (variance within treatment groups and variance between treatment groups), in order to receive an F-value, which lets you determine a corresponding p-value from the F-distribution. Well I am going into details. The important part is: AN(C)OVAs compare means!
Yet, a mediation analysis has a totally different aim. Here, you test whether your IV (e.g. water/day) predicts your DV (number of strawberries/plant). This, so far, isn's any different from the proceeding layed out above and you might predict mean number of berries/plant from the watering conditions. All of this you could analyse in an ANOVA But now comes the critical part: Here you are also interested in whether the effect of your IV changes the value of your DV mediated by some other value. E.g. you could reason that water/day affects the height of your strawberry plants, thereby leading to variation in sun exposure of the plants, and therefore to variation in berries/plant (For the simplicity of the example let's assume plant height and sun exposure to be one combined variable). This effect (water/day -> plant height|sunexposure -> berries/plant) is called the indirect effect, while the effect water/day -> berries/plant is called the direct effect. Both the direct and indirect effect should be tested for significance with specialized models.
I guess that your intention to test three ANOVAs could be inspired by the model by Baron&Kenny, which examplifies the logic of mediation analyses. Here, you test three mediations (IV->DV, IV->M, and M->DV). Yet, there are more appropriate procedures based on bootstrapping and I would suggest you to use the PROCESS-Macro by Andrew F. Hayes, which is available for both SPSS and SAS and furthermore includes a nice GUI for SPSS (I do not know about SAS). Furthermore, there is a nice template pdf available for many different mediation (and moderation) analyses, which allows you to easily figure out which model is appropriate for you. I hope this helps. Oh, and Andrew Hayes has also published a book on mediation and moderation. Check whether you have access to this book via an institution with which you are associated. It explains quite nicely what happens in mediation and moderation.
Best Answer
In regression analysis without categorical variables, it's straight forward to include numeric predictors in a regression model. For example, if we wanted to predict, say, weights of children in a school as a linear function of age, we could specify the following model:
\begin{eqnarray*} Y_{i} & = & \beta_{0}+\beta_{age}X_{age,i}+\epsilon_{i} \end{eqnarray*}
where $Y_i$ is the weight of the $i$th child, $X_{age,i}$ is the age of the $i$th child, $\epsilon_i$ is a random error term associated with the $i$th child, $\beta_0$ is an intercept parameter and $\beta_{age}$ is the slope parameter associated with the variable age. A fitted model for this might look like this:
\begin{eqnarray*} E[Y] & = & 35+3X_{age} \end{eqnarray*}
What this model simply says, is that the expected weight for any student can be estimated by $35 + 3$ times the value of student's age. So if the student's age was $7$, then the expected weight of this student would be $35 + 3\times 7 = 56$.
Now, let's say, instead of using age to predict weight, we were interested in predicting a student's weight based on his race. How could this be represented in mathematical terms? After all, we can't multiply categories of race with estimated regression coefficients. For example, how would this function make any sense if a student were black: $E[Y]=35 + 3 \times Black$, since it doesn't make sense to multiply "3 times black" or "3 times white" or any category for that matter?
Dummy coding is a way to handle this. Dummy variables are a simple way to "code" (or map or translate) categorical information or categorical representations in our dataset so that categorical groups can be represented in mathematical terms in a regression model. They also facilitate interpretation. If we have a categorical variable, for example, "Race," then we may have several different categories/levels of this variable, say, Black, White, and Asian. How can we create a regression model and include race as a predictor similar to the way we worked with age?
Well, it turns out, that with dummy variables, we create new variables (the dummy variables) that are coded either zero (0) or one (1) to represent the categories. Generally speaking when $c$ categories are present, we will need $c-1$ dummy variables. In the race example, we have three race categories, so $c=3$. This means we will need $c-1=3-1=2$ dummy variables to represent the $3$ racial categories in our regression model. We'll call the new variables $X_{black,i}$ and $X_{white,i}$ (if you are wondering where the Asian category went, hold tight: I'll explain shortly). Then we'll code the information in our dataset as follows:
\begin{eqnarray*} X_{black,i} & = & \begin{cases} 1 & \text{if the $i$th student is black}\\ 0 & \text{otherwise} \end{cases} \end{eqnarray*}
and
\begin{eqnarray*} X_{white,i} & = & \begin{cases} 1 & \text{if the $i$th student is white}\\ 0 & \text{otherwise} \end{cases} \end{eqnarray*}
Using this dummy coding, a regression model for weight, $Y_i$ based on this dummy would be:
\begin{eqnarray*} Y_{i} & = & \beta_{0}+\beta_{black}X_{black_i}+\beta_{white}X_{white,i}+\epsilon_i \end{eqnarray*}
and the corresponding response function might be something like:
\begin{eqnarray*} E[Y] & = & 35+5X_{black}+3X_{white} \end{eqnarray*}
where $\hat{\beta}_0=35$, $\hat{\beta}_{black}=5$, and $\hat{\beta}_{white}=3$. To interpret this model, it's instructive to write out the model that would be estimated for a black student. When a student is black, $X_{black}=1$ and $X_{white}=0$, so the response function becomes:
\begin{eqnarray*} E[Y] & = & 35+5\times1+3\times0=35+5=40 \end{eqnarray*}
Now, when a student is white, $X_{black}=0$ and $X_{white}=1$, so the response function becomes:
\begin{eqnarray*} E[Y] & = & 35+5\times0+3\times1=35+3=38 \end{eqnarray*}
If the student is Asian, then both $X_{black}=0$ and $X_{white}=0$, and the response functions just becomes:
\begin{eqnarray*} E[Y] & = & 35+5\times0+3\times0=35+0+0=35 \end{eqnarray*}
As you can see the Asian category is represented by just the intercept in our model, so we don't need any $X$-value coded $1$ to represent it. By coding $X_{white,i}=0$ and $X_{black,i}=0$, we are representing the Asian racial category.
So, with the dummy coding, a black student is expected to have a mean weight of $40$ bounds, a white student a mean weight of $38$ bounds, and an Asian student a mean weight of only $35$ pounds.
As you can see dummy coding allow the regression model to change depending on the categories you are trying to predict. If you'd like to see some additional examples of how dummy coding works, this website has some excellent examples and explanations.
Lastly, it should be noted that you can use this type of coding universally in regression modeling, so it can be used with ANOVA models, mixed models, $3\times3$ factorial models, etc.