Avoiding the numeric problem via logarithms
Usually the computations are done using logarithms to avoid underflow problems. Your posterior probability $P(k|x)$ is proportional to $P(k)p(x\mid k)$ - the denominator of Bayes theorem acts to normalize the sum of the probabilities to 1, but it is easy to see that it does not actually change the result if you multiply all $P(k)p(x\mid k)$s by a constant, equivalently, add a constant to their logarithms. So,
- Compute log-posterior: $\log P(k) + \log p(x \mid k)$ for all $k$.
- Add a suitable constant to all the log-posterior values (e.g., subtract the maximum so that the new maximum is 0) to bring them to reasonable scale.
- Exponentiate the $\log $s and normalize.
Statistical interpretation of Euclidean distance
I assume you mean that you put $(x-\mu_k)^T(x-\mu_k)$ to the exponent instead of $(x-\mu_k)^T\Sigma^{-1}(x-\mu_k)$. This is equivalent to setting the covariance matrix as identity, which in turn means that you assume that the distance of each component of $x$ from the corresponding component of the class mean is 1, and the different components are independent. This could be justified depending on your application, but I suspect the issue stems just from the numeric problem discussed above or that you are using some covariances that do not make sense (your question did not explain how the covariance matrices are obtained).
Caveat: if you have different $\Sigma$s for different classes, the $\Sigma_k$ in the normalization constant (the factor before exponential) of your modified pdf will scale the likelihoods depending on $k$ (but not on $x$), so in the identity covariance case this acts as a strange way of changing the prior distribution over classes. However, if your $\Sigma$ is constant over all classes, the normalization constant does not impact the results, and thus this is exactly equivalent to using identity covariance.
My reply is technically more relevant to fuzzy sets rather than fuzzy logic, but the two concepts are practically inseparable. I delved into the academic journal articles on fuzzy logic a couple of years ago in order to write a tutorial series on implementing fuzzy sets in SQL Server. Although I can hardly be considered an expert, I'm fairly familiar with the literature and use the techniques regularly to solve practical problems. The strong impression I gleaned from the published research is that the practical potential of fuzzy sets is still untapped, mainly due to a deluge of research on dozens of other families of techniques that can solve complementary sets of questions.
The Crowded Marketplace of Ideas in Data Science/Machine Learning etc.
There's been such rapid progress in Support Vector Machines, neural nets, random forests, etc. that it's impossible for specialists, analysts, data scientists, programmers or consumers of their products to keep up with it all. In my series of blog posts I speak at length on how the development of algorithms for fuzzy sets and logical are generally 20+ years ahead of the available software, but the same can be said of many related fields; I read intensively on neural nets and can think of scores of worthwhile neural architectures that were developed decades ago but never put widely into practice, let alone coded in easily available software.
That being said, fuzzy logic and sets are at an odd disadvantage in this crowded marketplace of ideas, mainly because of their moniker, which was controversial back when Lofti A. Zadeh coined it. The point of fuzzy techniques is simply to approximate certain classes of discretely valued data on continuous scales, but terms like "approximate continuous-valued logic" and “graded sets” aren't exactly eye-catching. Zadeh admitted that he used the term "fuzzy" in part because it was attention-getting, but looking back, it may have subtly garnered the wrong kind of attention.
How the Term "Fuzz" Backfires
To a data scientist, analyst or programmer, it's a term that may evoke a vibe of "cool tech"; to those interested in AI/data mining/etc. etc. only insofar as it can solve business problems, "fuzzy" sounds like an impractical hassle. To a corporate manager, doctor involved into medical research, or any other consumer not in the know, it may evoke images of stuffed animals, 70s cop shows or something out of George Carlin's fridge. There has always been a tension in industry between the two groups, with the latter often reining in the former from writing code and performing research merely for the sake of intellectual curiosity rather than profit; unless the first group can explain why these fuzzy techniques are profitable then the wariness of the first will prevent their adoption.
Uncertainty Management & the Family of Fuzzy Set Applications
The point of fuzzy set techniques are to remove fuzz that is already inherent in the data, in the form of imprecise discrete values that can be modeled better on approximated continuous scales, contrary to the widespread misperception that "fuzz" is something you add in, like a special topping on a pizza. That distinction may be simple but it encompasses a wide variety of potential applications, ranging from natural language processing to Decision Theory to control of nonlinear systems. Probability hasn't absorbed fuzzy logic as Cliff AB suggested primarily because it is just a small subset of the interpretations that can be attached to fuzzy values. Fuzzy membership functions are fairly simple in that they just grade how much a record belongs to a particular set by assigning one or more continuous values, usually on a scale of 0 to 1 (although for some applications I've found that -1 to 1 can be more useful). The meaning we assign to those numbers is up to us, because they can signify anything we want, such as Bayesian degrees of belief, confidence in a particular decision, possibility distributions, neural net activations, scaled variance, correlation, etc. etc., not just PDF, EDF or CDF values. I go into much greater detail in my blog series and at this CV post, much of which was derived by working through my favorite fuzzy resource, George J. Klir, and Bo Yuan's Fuzzy Sets and Fuzzy Logic: Theory and Applications (1995). They go into much greater detail into how to derive entire programs of “Uncertainty Management” from fuzzy sets.
If fuzzy logic and sets were a consumer product, we could say that it's failed to date due to lack of marketing and product evangelization, plus a paradoxical choice of a brand name. While researching this I can't recall running into a single academic journal article that tried to debunk any of these applications in a manner akin to Minksy and Papert's infamous article on perceptrons. There's just a lot of competition in the marketplace of ideas these days for the attention of developers, theorists, data scientists and the like for products that are applicable to similar sets of problems, which is a positive side effect of rapid technical progress. The downside is that there's a lot of low-hanging fruit here that's going unpicked, especially in the realm of data modeling where they're most applicable. As a matter of fact, I recently used them to solve a particularly puzzling language modeling problem and was applying them to a similar one when I took a break to check CV and found this post.
Best Answer
Perhaps you're already aware of this, but Chapters 3, 7 and 9 of George J. Klir, and Bo Yuan's Fuzzy Sets and Fuzzy Logic: Theory and Applications (1995) provide in-depth discussions on the differences between the fuzzy and probabilistic versions of uncertainty, as well as several other types related to Evidence Theory, possibility distributions, etc. It is chock-full of formulas for measuring fuzziness (uncertainties in measurement scales) and probabilistic uncertainty (variants of Shannon's Entropy, etc.), plus a few for aggregating across these various types of uncertainty. There are also a few chapters on aggregating fuzzy numbers, fuzzy equations and fuzzy logic statements that you may find helpful. I translated a lot of these formulas into code, but am still learning the ropes as far as the math goes, so I'll let Klir and Yuan do the talking. :) I was able to pick up a used copy for $5 a few months back. Klir also wrote a follow-up book on Uncertainty around 2004, which I have yet to read. (My apologies if this thread is too old to respond to - I'm still learning the forum etiquette).
Edited to add: I’m not sure which of the differences between fuzzy and probabilistic uncertainty the OP was already aware of and which he needed more info on, or what types of aggregations he meant, so I’ll just provide a list of some differences I gleaned from Klir and Yuan, off the top of my head. The gist is that yes, you can fuse fuzzy numbers, measures, etc. together, even with probabilities – but it quickly becomes very complex, albeit still quite useful.
Fuzzy set uncertainty measures a completely different quantity than probability and its measures of uncertainty, like the Hartley Function (for nonspecificity) or Shannon's Entropy. Fuzziness and probabilistic uncertainty don't affect each other at all. There are a whole range of measures of fuzziness available, which quantify uncertainty in measurement boundaries (this is tangential to the measurement uncertainties normally discussed on CrossValidated, but not identical). The "fuzz" is added mainly in situations where it would be helpful to treat an ordinal variable as continuous, none of which has much to do with probabilities.
Nevertheless, fuzzy sets and probabilities can be combined in myriad ways - such as adding fuzzy boundaries on probability values, or assessing the probability of a value or logical statement falling within a fuzzy range. This leads to a huge, wide-ranging taxonomy of combinations (which is one of the reasons I didn't include specifics before my first edit).
As far as aggregation goes, the measures of fuzziness and entropic measures of probabilistic uncertainty can sometimes be summed together to give total measures of uncertainty.
To add another level of complexity. fuzzy logic, numbers and sets can all be aggregated, which can affect the amount of resulting uncertainty. Klir and Yuan say the math can get really difficult for these tasks and since equation translations are one of my weak points (so far), I won't comment further. I just know these methods are presented in their book.
Fuzzy logic, numbers, sets etc. are often chained together in a way probabilities aren't, which can complicate computation of the total uncertainty. For example, a computer programmer working in a Behavioral-Driven Development (BDD) system might translate a user's statement that "around half of these objects are black" into a fuzzy statement (around) about a fuzzy number (half). That would entail combining two different fuzzy objects to derive the measure of fuzziness for the whole thing.
Sigma counts are more important in aggregating fuzzy objects than the kind of ordinary counts used in statistics. These are always less than the ordinary "crisp" count, because the membership functions that define fuzzy sets (which are always on the 0 to 1 scale) measure partial membership, so that a record with a score of 0.25 only counts as a quarter of a record.
All of the above gives rise to a really complex set of fuzzy statistics, statistics on fuzzy sets, fuzzy statements about fuzzy sets, etc. If we're combining probabilities and fuzzy sets together, now we have to consider whether to use one of several different types of fuzzy variances, for example.
Alpha cuts are a prominent feature of fuzzy set math, including the formulas for calculating uncertainties. They divide datasets into nested sets based on the values of the membership functions. I haven't yet encountered a similar concept with probabilities, but keep in mind that I’m still learning the ropes.
Fuzzy sets can be interpreted in nuanced ways that produce the possibility distributions and belief scores used in fields like Evidence Theory, which includes the subtle concept of probability mass assignments. I liken it to the way in which conditional probabilities etc. can be reinterpreted as Bayesian priors and posteriors. This leads to separate definitions of fuzzy, nonspecificity and entropic uncertainty, although the formulas are obviously similar. They also give rise to strife, discord and conflict measures, which are additional forms of uncertainty that can be summed together with ordinary nonspecificity, fuzziness and entropy.
Common probabilistic concepts like the Principle of Maximum Entropy are still operative, but sometimes require tweaking. I'm still trying to master the ordinary versions of them, so I can't say more than to point out that I know the tweaks exist.
The long and the short of it is that these two distinct types of uncertainty can be aggregated, but that this quickly blows up into a whole taxonomy of fuzzy objects and stats based on them, all of which can affect the otherwise simple calculations. I don't even have room here to address the whole smorgasbord of fuzzy formulas for intersections and unions. These include T-norms and T-conorms that are sometimes used in the above calculations of uncertainty. I can't provide a simple answer, but that's not just due to inexperience - even 20 years after Klir and Yuan wrote, a lot of the math and use cases for things still don’t seem settled. For example, there I can’t find a clear, general guide on which T-conorms and T-norms to use in particular situations. Nevertheless, that will affect any aggregation of the uncertainties. I can look up specific formulas for some of these if you'd like; I coded some of them recently so they're still somewhat fresh. On the other hand, I’m an amateur with rusty math skills, so you’d probably be better off consulting these sources directly. I hope this edit is of use; if you need more clarification/info, let me know.