Latent Analysis – When Are Latent Analyses Useful?

factor analysislatent-variablepca

As far as I understand, latent profile analysis, clustering or similar latent analyses are about finding something hidden in the data. Are there any guidelines or thoughts on when these techniques are useful? I have seen principal component analysis (PCA) being used for simplifying image data. There it was useful since the learned "structure" could at least be visually verified.

In most other applications, I feel that latent analyses don't provide much information. For example, I could ask a class of 100 students various questions to get, let's say, 6 variables. So, $n = 100$ and $p = 6$. Now, I could do very sophisticated latent analyses to figure out what lies "underneath" those students.

However, I would say that most latent things that I would find are just random noise. In other words, I could ask the same questions to very many classes and probably I would find 300 different latent profiles, but only a few of them would be theoretically meaningful.

Best Answer

Latent variable analyses, such as factor analysis, are useful when we want to analyze a construct that we can't measure directly in a single question, but which we think MIGHT be imperfectly measured by a whole bunch of different questions. They can be especially helpful if we're not even sure that the thing we want to measure even exists, but we need to find that out.

Here's an example. We think that people might suffer from this thing we are calling "depression," but we don't really know how to measure it, or even if it's just one thing - maybe there are a bunch of different states that we CALL depression but which are really distinct constructs. So how do we proceed? Well we can start by coming up with a list of questions that we think MIGHT measure depression:

Do you often feel sad? Do you have little interest in pleasure or doing things? Do you often feel tired or have little energy? Do you think about hurting yourself? Do you have trouble concentrating?

Of course, some of these questions might not actually measure depression (they might measure anxiety or something else). And some might be better measures than others. But that's what we're going to figure out.

Our theory is that there is some underlying construct "depression" that CAUSES people to give the answers to these questions that they do. If that's true then these variables should all correlate with each other, because they're being influenced by the same thing. If one or more of these variables doesn't strongly correlate with the others, then it's probably NOT being influenced by the same thing as the others (which we assume is depression).

So we throw all of these variables into an exploratory factor analysis. The FA tries to find out of there is one or more underlying latent factor related to all of these items and then it tells us how closely correlated each item is to the underlying factor. Let's say we do this and find that the FA finds only one strong factor, and all of the items "load" pretty strongly on that factor, this strongly suggests that the underlying latent variable is actually "depression." If one item didn't load on it then we would know that item is measuring something else, and could kick it out of the analysis.

Furthermore, since the results of the analysis tell us HOW strongly each item is correlated with the latent variable we can use this information to combine the items together into a new variable that measures depression better than any one item in isolation. We have therefore created a single observed measure of what was previously a unobserved construct that we weren't even sure existed.

This is one of the uses of latent variable analysis.

Another use is if you aren't sure of the structure of the latent variable. For example, if you are interested in "political ideology" but aren't sure if it's just a single "left right" scale, or if there are distinct "economic" and "social" dimensions. To figure this out ask a bunch of questions about both economic and social issues, and throw them all into a factor analysis. Is there just one factor or two? Or three?

(caveat: I'm really only talking about factor analysis or things like latent class analysis here. PCA has a somewhat different logic to it and isn't really designed for latent variable analysis per se from a theoretical perspective, even though you can use it for that. But that's another discussion)

Related Question