Scott Alexander has a post up arguing that we may be underestimating how big the effect size of anti-depression drugs is. But there’s another reason to think we’re misestimating the effect that cuts to the heart of how we measure depression and a lot of other things.
I like the level of detail you're going into here but I think the pattern of effect sizes we see for many conditions - not just depression - indicates some sort of general, conceptual problem with how we're measuring certain kinds of treatment outcomes. I came across this when I was researching my post on naltrexone (https://notpeerreviewed.wordpress.com/2021/05/10/can-we-take-the-devil-out-of-the-bottle-evidence-and-personal-experience-with-naltrexone-for-alcohol-abuse/); most of those studies used seemingly cardinal rather than ordinal outcomes and still showed effect sizes that don't seem to reflect the experiences people report, and I suspect we would see this for many treatments and conditions. I'll have to admit I don't have a good sense of what the answer might be.
What matters is not whether moving from 2 to 3 is as different as 3 to 4 for a sub item. What matters is whether it contributes to the same extent to the sum (and what it represents). This is a subtle but important distinction. Relatedly, you should think more about psychometrics, validity and reliability and how they apply to this situation.
Depression drugs might work better than we think (or worse) because depression scales have a severe flaw
I like the level of detail you're going into here but I think the pattern of effect sizes we see for many conditions - not just depression - indicates some sort of general, conceptual problem with how we're measuring certain kinds of treatment outcomes. I came across this when I was researching my post on naltrexone (https://notpeerreviewed.wordpress.com/2021/05/10/can-we-take-the-devil-out-of-the-bottle-evidence-and-personal-experience-with-naltrexone-for-alcohol-abuse/); most of those studies used seemingly cardinal rather than ordinal outcomes and still showed effect sizes that don't seem to reflect the experiences people report, and I suspect we would see this for many treatments and conditions. I'll have to admit I don't have a good sense of what the answer might be.
What matters is not whether moving from 2 to 3 is as different as 3 to 4 for a sub item. What matters is whether it contributes to the same extent to the sum (and what it represents). This is a subtle but important distinction. Relatedly, you should think more about psychometrics, validity and reliability and how they apply to this situation.