Short answer: your gut-feeling is right.
Longer answer: The strength of imputation lies in the pooling procedure. If you read the MICE manual, the writers go in depth about this. They state that imputation is not a technique that you apply to a dataset with missing data to complete the empty cells, but that it is a combination of setting up a strategy to replace missing data (using chained equations in the mice case), performing the analysis and subsequently pooling the result which answers your research question (i.e. the reason you performed the analysis). As such, these steps are mandatory.
Now, more specifically to your situation. In the original data with missings there might be data which is selectively missing. This could lead to bias. Moreover, most analyses require complete data on all variables, so you'll need to exclude cases or handle them in some way. Using imputation you complete the missing data with 'guestimates' based on the assumption your data is 'missing at random, conditional on known and observed data' (MAR). However, because these are guesses based on your data, you add some randomness and repeat this completion process multiple times in order to create a distribution of guesses.
If you would analyse these data in the 'long' format you mention, you'd have basically pumped up your sample's size with a factor equal to the number of imputation sets! This will undoubtedly increase the precision of your estimates. But, this is wrong! Cases which were complete from the start will have been copied and more importantly, you did not take into account the uncertainty of your guestimates.
The better way therefore is to analyze the data per imputation set. This way you get an 'm' amount of results. However, you do not know which imputation set is the 'most correct' (if there even is such a dataset). As such the average coefficient of all models is your best estimate for the 'true' estimate. For the precision (and hypothesis testing/confidence intervals) you then need to appropriately handle the standard error. Now the uncertainty comes into play. Using Rubins rules you average the standard errors of all and add 'a little extra' to represent the variation of the estimates across imputation sets.
Conclusion
Finally, creating your confidence intervals and performing your hypothesis tests using these pooling rules usually decreases bias and biased inferences compared to a complete case analysis. Compared to your long-format dataset, the coefficient might be pretty similar, but as you and your gut-feeling rightly pointed out, the results are way to precise (too narrow confidence intervals; too low p-values) than can actually should be concluded from these imputation analyses.
Best Answer
There is no pooled dataset with multiple imputation in SPSS or any other software. Pooling is done on the results of the analyses for the separate completed datasets.
You might do this by doing some averaging or something, but you'd be missing some of the value of multiple imputation (as you'd be eliminating between-imputation variability, which is integral to the methodology).
As noted above, there's no pooling of datasets, only pooling of analysis results from different completed datasets. Pooling algorithms are given in the Multiple Imputation Pooling Algorithms chapter of the IBM SPSS Statistics Algorithms manual, which is available online (in the program, click Help>Documentation in PDF Format, select English or other desired language, then scroll down to the Manuals section and look for that title). The pooling of results is done using what are known as Rubin's rules. There's lots of information about those on the Internet.
To incorporate year into analyses, you'd probably want to go back to the original data with all three years in a single dataset, with cases for a given year properly identified, then re-impute data, with year included as an imputation variable. Then you could add year as a categorical predictor or factor in your GLM analyses.