A rose by any other name is still a 0.8

In the “wild,” IRT scores are typically on the standard normal metric, meaning that bell-shaped distribution when it has a mean of 0 and a standard deviation of 1. Due to SCIENCE!, we know that almost all of the scores in such a distribution will fall between -3 (well below average) and +3 (well above average) with a lot of the scores piling up closer to 0.


For statistics-minded folks, this distribution is pretty familiar and easy to interpret (You got a 0.2 on the Awesome Scale of Awesomeness? Ah, I see you’re entirely average in your awesomeness; your mom got a -3? You know she stinks at being awesome, right?). But a lot of the world (and hence consumers of scores) aren’t statistics-minded. Generally speaking, if someone took an IQ test and got a 1.4, it’s hard for them to interpret that as “pretty good” without a lot of explanation from some nerd.

Because of this (and other assorted reasons), people in charge of reporting scores often “adjust” the metric. So rather than -3 to 3, they’ll do some math on the “wild” IRT score (say, multiply it by 10 and add 50). After this is done, instead of an average of 0, you get an average of 50, instead of a “low score” of -3, you have a low of around 0 (technically 5 = (-3*15) + 50) and a high of around 100 (actually 95). In short, instead of the -3 to 3 range for most scores I mentioned earlier, we’re talking about 0 to 100, with an average of 50 and THAT metric seems to makes sense to people.

Those familiar with psychological and quality of life research may recognize this as the “PROMIS metric” and/or the “t-scores” one gets, for example, from the Neuro-QOL family of scales. Of course, this is a common metric used with many assessments (e.g., MMPI & PAI).  After taking the long way around to my point, I hope it’s clear-ish that there is not anything special about these scores or the metric. They’re typical IRT scores, calculated in a typical way. Given nerd power and the right bits of information, any IRT program could be used to get the “wild” scores and then they can be math-ed up to put them on that 0-100 metric. The same is true for the summed score to PROMIS metric/t-score conversion tables that are notably available for PROMIS and Neuro-QOL scales; those conversion tables (which are a general IRT-thing) are what I’ll examine in-depth in my next post.