From beneath you, it devours

I’m showing my 90’s girl power roots in that title, referencing a theme from Season 7 of Buffy the Vampire Slayer. In this season, the big bad called “the First Evil” starts bubbling up from the Hellmouth (a doorway into our world for all sorts of demons and vampires) that the town was built on, intent on destroying the world. While I certainly am not claiming that statistical models are evil, although many graduate students would argue with me on that, there is a point to this nostalgia. What I want (what I really, really want) to get across in this post is that the specifics of the statistical model an analysis technique is built on have implications for what you can say about the results; that the statistical theory and methods that produced your results do matter and the devil may be in the details. And if you’re not judicious in choosing what you do and what you say about it, you may get devoured by reviewers and, in my experience, by Reviewer 3 in particular, who actually is evil.

As a prime example, let’s look at factor analysis (FA) and principle components analysis (PCA), which a lot of people wrongly believe are the same thing or interchangeable analysis methods. The equations below (from Gorusch, 1983) state how a correlation matrix (Rvv) may be described by the factor/component loading pattern (Pvf) and factor/component correlation matrix (Rff), for FA and PCA, respectively.

FA: Rvv = PvfRffP’vf + Uvv
PCA: Rvv = PvfRffP’vf

Without getting into the nitty-gritty of what they actually mean, just eyeballing them they appear to be very similar, verging on identical. However, that bit on the end of the FA equation (+ Uvv) is much like choosing the road less traveled, it makes all the difference. The Uvv represents item uniqueness, which is a statistic-y way to say error; the factor analysis equation is set up so that not all of the variability in the items has to be predicted. That is, the error term means each item may have unpredicted variability, something which is unique to that item alone. This further implies that the other part of the equation (the P’s and R before the plus sign) is predicting only the variability that is common among the items. The PCA equation, on the other, doesn’t include an error term – this implies that all variability is fair game and a solution that best describes everything, the variability that is due to a similar aspect among all items and the variability that is specific to individual items, will be output by your software of choice.

Despite how similar the equations look, how similar the results can be across methods for some datasets, or how mixed-in together FA and PCA are in popular statistical software packages (*cough* SPSS *side eye* SAS), these analysis methods are not interchangeable. Underneath, they do very different things. FA seeks to describe only the “stuff” that is common to all of the items, conforming to the idea of a latent variable, while PCA seeks to describe all the “stuff” which is why it is cited as a data reduction technique. So, if you do a PCA on your data and then present your results in the context of “factors,” “underlying constructs,” or “latent variables,” you can’t say I didn’t warn your when demon Reviewer 3’s comments come back and it eats you, starting with your bottom.