In general, the existence of an interaction means that the effect of one variable depends on the value of the other variable with which it interacts. If there isn't an interaction, then the value of the other variable doesn't matter.
This is easiest to understand in the case of linear regression. Imagine we are looking at the adult height (say at 25) of a child based on the adult height of the father. We further include sex as an additional predictor variable, because men and women differ considerably in adult height. Let's imagine that there is no interaction between these two variables (which may be true, at least to a first approximation). We could then plot our model simply as two lines on a scatterplot of the data. We may want to use different colors or symbols / line styles for men vs. women, but at any rate we would see a football-ish (or rugby-ball-ish, depending on where you live) shaped cloud of points with two parallel lines going through it. The important part is that the lines are parallel; if someone asked you what the effect would be of the father being 1 inch (1 cm) taller, you would respond with $\beta_{\text{height}}$. If they further asked you what the effect would be if the child were male or female, you would respond, 'that doesn't matter, you would expect them to be $\beta_{\text{height}}$ taller as an adult either way'. That is because the lines are parallel (with the same slope, $\beta_{\text{height}}$) / there is no interaction.
Now imagine the case of anxiety on test taking performance when examining two populations: emotionally stable vs. emotionally unstable people. Lets imagine that there is an interaction such that emotionally unstable people are more strongly affected by anxiety. Then, if we plotted the model similarly, we would see two lines that are not parallel. One line (representing emotionally stable individuals) might be sloping downward gradually, while the other line (representing unstable students) might move downward much more quickly. If we had used reference cell coding, with the stable individuals as the reference category, the fitted regression model might be:
$$
\text{test performance}=\beta_0 + \beta_1\text{anxiety} + \beta_2\text{unstable} + \beta_3\text{anxiety}*\text{unstable}
$$
In such a case, the slope of the first line would be $\beta_\text{anxiety}$ (since $\text{unstable}$ would equal 0), but the slope of the second line would be $\beta_1+\beta_3$. If someone asked you how much test taking performance would be impaired if anxiety went up by one unit, you would have to say, 'that depends, emotionally stable students would score $\beta_1$ points lower, but emotionally unstable individuals would drop by $\beta_1+\beta_3$ points'.
This is the essence of what an interaction is. In addition, these examples illustrate the necessity of interpreting simple effects only when interactions exist, and the value of using plots of your model to facilitate understanding.
With a generalized linear model, the situation is essentially the same, but you may have to take into account the additional complexity of the link function (a non-linear transformation), depending on which scale you want to use to make your interpretation. Consider the case of logistic regression, there are (at least) three scales available: The betas exist on the logit (log odds) scale, whereas $\pi$ (the probability of 'success') exists only in the interval $(0,1)$ and behaves quite differently; in addition, the odds lie between them. So you need to chose which of these you want to use to interpret your model. For example, with respect to the log odds, the model is linear, and everything can be understood just as above.
If you were using the odds, you can get odds ratios by exponentiating your betas. For example, if there is no interaction, the odds ratio associated with a one unit increase in $X_1$ is $\exp(\beta_1)$. This would also be the odds ratio of the reference category (like the emotionally stable students above) if there were an interaction with a dichotomous variable, but the contrasting category would be associated with an odds ratio of $\exp(\beta_1)*\exp(\beta_2)$.
Unfortunately, neither of those are very intuitively accessible for people, and the non-linear transformation (the link function) makes life more complicated. It is important to recognize that this isn't specific to interactions; the change in the probability of 'success' associated with increasing $X$ by one unit is never the same as (say) decreasing $X$ by one unit (except in the special case where $x_i$ is associated with $\pi=.5$). In other words, the change in probability associated with a one unit change in $X$ depends on where you are starting from (in this sense, you could perhaps metaphorically say that it interacts with itself). The best way to determine the change in probability associated with moving from one level of $X$ to another, is to plug in those levels, solve the regression equation for $\hat\pi$, and then subtract. The same thing is true if you have more than one variable, but no 'interaction' with the variable in question. This isn't anything special, it's just that 'where you are starting from' depends on the other variables as well. Again, the best way to determine the change in probability would be to solve for $\hat\pi$ at both places and subtract.
Interactions in a GLiM should also be treated similarly. It is best not to interpret interaction effects, but only simple effects (that is, the effect of $X_1$ on $Y$ holding $X_2$ constant). In addition, it's best to overlay plots of the predicted values (say, when $X_2=0$ vs. when $X_2=1$) on a scatterplot of your data. Now, for a logistic regression, it is often difficult to get a decent plot of your data as the points are all 0's and 1's, so you might just choose to leave them out. Nonetheless, a plot of the two curves will typically be the best thing to use. After you have the plot, a qualitative (verbal) description is often easy (e.g., 'probabilities don't start moving away from 0 until larger levels of $X_1$, and even then, raise more slowly').
Your situation is perhaps a little more complicated than this, because you have two continuous variables, rather than a continuous and a dichotomous one. However, this isn't a problem. Typically in this situation, people will be thinking primarily in terms of one of the predictor variables; then you can plot the relationship between that variable and $Y$ at several levels of the other predictor. If there are theoretically meaningful levels, you could use those, if not, you could use the mean and +/- 1 SD. If you didn't have a preference for one of the variables, you could flip a coin, or plot it both ways and see which will be easier to work with.
I don't know if / how SPSS will let you make those plots, but if you aren't able to find a way, they should be easy to make manually in Excel.
Best Answer
The advent of generalized linear models has allowed us to build regression-type models of data when the distribution of the response variable is non-normal--for example, when your DV is binary. (If you would like to know a little more about GLiMs, I wrote a fairly extensive answer here, which may be useful although the context differs.) However, a GLiM, e.g. a logistic regression model, assumes that your data are independent. For instance, imagine a study that looks at whether a child has developed asthma. Each child contributes one data point to the study--they either have asthma or they don't. Sometimes data are not independent, though. Consider another study that looks at whether a child has a cold at various points during the school year. In this case, each child contributes many data points. At one time a child might have a cold, later they might not, and still later they might have another cold. These data are not independent because they came from the same child. In order to appropriately analyze these data, we need to somehow take this non-independence into account. There are two ways: One way is to use the generalized estimating equations (which you don't mention, so we'll skip). The other way is to use a generalized linear mixed model. GLiMMs can account for the non-independence by adding random effects (as @MichaelChernick notes). Thus, the answer is that your second option is for non-normal repeated measures (or otherwise non-independent) data. (I should mention, in keeping with @Macro's comment, that general-ized linear mixed models include linear models as a special case and thus can be used with normally distributed data. However, in typical usage the term connotes non-normal data.)
Update: (The OP has asked about GEE as well, so I will write a little about how all three relate to each other.)
Here's a basic overview:
Since you have multiple trials per participant, your data are not independent; as you correctly note, "[t]rials within one participant are likely to be more similar than as compared to the whole group". Therefore, you should use either a GLMM or the GEE.
The issue, then, is how to choose whether GLMM or GEE would be more appropriate for your situation. The answer to this question depends on the subject of your research--specifically, the target of the inferences you hope to make. As I stated above, with a GLMM, the betas are telling you about the effect of a one unit change in your covariates on a particular participant, given their individual characteristics. On the other hand with the GEE, the betas are telling you about the effect of a one unit change in your covariates on the average of the responses of the entire population in question. This is a difficult distinction to grasp, especially because there is no such distinction with linear models (in which case the two are the same thing).
One way to try to wrap your head around this is to imagine averaging over your population on both sides of the equals sign in your model. For example, this might be a model: $$ \text{logit}(p_i)=\beta_{0}+\beta_{1}X_1+b_i $$ where: $$ \text{logit}(p)=\ln\left(\frac{p}{1-p}\right),~~~~~\&~~~~~~b\sim\mathcal N(0,\sigma^2_b) $$ There is a parameter that governs the response distribution ($p$, the probability, with binary data) on the left side for each participant. On the right hand side, there are coefficients for the effect of the covariate[s] and the baseline level when the covariate[s] equals 0. The first thing to notice is that the actual intercept for any specific individual is not $\beta_0$, but rather $(\beta_0+b_i)$. But so what? If we are assuming that the $b_i$'s (the random effect) are normally distributed with a mean of 0 (as we've done), certainly we can average over these without difficulty (it would just be $\beta_0$). Moreover, in this case we don't have a corresponding random effect for the slopes and thus their average is just $\beta_1$. So the average of the intercepts plus the average of the slopes must be equal to the logit transformation of the average of the $p_i$'s on the left, mustn't it? Unfortunately, no. The problem is that in between those two is the $\text{logit}$, which is a non-linear transformation. (If the transformation were linear, they would be equivalent, which is why this problem doesn't occur for linear models.) The following plot makes this clear:
Imagine that this plot represents the underlying data generating process for the probability that a small class of students will be able to pass a test on some subject with a given number of hours of instruction on that topic. Each of the grey curves represents the probability of passing the test with varying amounts of instruction for one of the students. The bold curve is the average over the whole class. In this case, the effect of an additional hour of teaching conditional on the student's attributes is $\beta_1$--the same for each student (that is, there is not a random slope). Note, though, that the students baseline ability differs amongst them--probably due to differences in things like IQ (that is, there is a random intercept). The average probability for the class as a whole, however, follows a different profile than the students. The strikingly counter-intuitive result is this: an additional hour of instruction can have a sizable effect on the probability of each student passing the test, but have relatively little effect on the probable total proportion of students who pass. This is because some students might already have had a large chance of passing while others might still have little chance.
The question of whether you should use a GLMM or the GEE is the question of which of these functions you want to estimate. If you wanted to know about the probability of a given student passing (if, say, you were the student, or the student's parent), you want to use a GLMM. On the other hand, if you want to know about the effect on the population (if, for example, you were the teacher, or the principal), you would want to use the GEE.
For another, more mathematically detailed, discussion of this material, see this answer by @Macro.