Self-report is a crapshoot- a tour through a large database of questions
Self-report just randomly doesn't stick a lot of the time
Nerd Alert. This article is about stats. Nonetheless, it gets kinda weird and entertaining- and it’s an important topic. So read on.
Self-report data is data about people gained by just asking them and recording their answer. Whether or not self-report is in general accurate is very important to some of my PhD research. I’ve done a lot of reading on the topic, but I wanted to get an intuitive handle on how “good” it is. So, I picked a large dataset with a lot of self-report questions on varied topics (the European Social Survey of 2016) and investigated.
There have been various previous attempts to the bottom of the general robustness of self-report but they have mostly suffered either from a highly restricted domain (i.e. self-report about symptoms, wellbeing or personality) or being relatively abstract (i.e. theorizing in general about biases that can afflict self-report). My approach was to find a massive dataset with questions on almost every area of life and look at those questions for which we could check consistency against other questions through correlation.
To give those without a social science background a bit of context, a correlation of less than R=0.3 is often considered weak. A correlation of about R=0.5 is often considered moderate. A correlation of greater than R=0.7 is often considered strong. We can square correlation coefficients to get the percentage of variance one variable can explain in another.
Let’s play a game. Try to guess the level of correlation between each of these pairs of variables. All data is from the European Social Survey, a representative sample of Europeans.
Consider the correlation between “How often do you use the internet” and “how many minutes per day do you use the internet”. Stop reading for a moment and write down on a piece of paper your best guess as to the correlation.
Ready?
The correlation is R=0.191, meaning that answers to one of the questions explain 3.6% of the variance in answers to the other. This particular failure to correlate is probably a reflection of restriction of range- about 60% of people gave the maximum answer that they used the internet every day. Even so, 3.6% of the variance is just 3.6% of the variance.
I’ll just include the full questions so you know there’s no hocus pocus going on here:
“On a typical day, about how much time do you spend using the internet on a computer, tablet, smartphone or other device, whether for work or personal use?”
And
“People can use the internet on different devices such as computers, tablets and smartphones. How often do you use the internet on these or any other devices, whether for work or personal use?”
EDIT: On even closer examination, there’s a big oddity in the data for this question. I won’t go into it in detail here to avoid breaking the flow, but suffice to say it makes the small correlation less terrible than at first appearance, but still pretty terrible. Disregard this one in adjusting your priors, about self-report, and focus on the others.
Next question “How often do you take part in social activities compared to others of same age” for which the answers range from “Much less” to “Much more” and “How often do you socially meet with friends, relatives or colleagues”? For which the answers range from “Never” to “Every day”.
The correlation is just R=0.36- about 13% of variance explained.
“Do most people try to take advantage of you, or try to be fair?” and the question “Most people can be trusted or you can't be too careful”.
Here the correlation is a robust R=0.56- around 31% of variance explained. You could argue this is still disconcertingly lowish. There’s scarcely enough space to cut the distance between these questions with a knife, yet one explains less than a third the variance in another.
“[What is your] subjective general health” and “[how often are you] Hampered in daily activities by illness/disability/infirmity/mental problem”.
Here we get a robustish R=-0.58, but again, these variables are extremely tightly intertwined- and yet the one only explains about a third of the variance in the other.
“How worried about climate change [are you]” and “Climate change good or bad impact across world”
A mere R=-.29 (about 8% of variance). One must imagine there are a lot of people who aren’t worried about climate change but think it will be very bad, and a lot of people who don’t think climate change will be very bad, but are worried about it.
“Large differences in income acceptable to reward talents and efforts” and “For fair society, differences in standard of living should be small”
R=-.29 (8% of variance).
Even more so than the climate change question, these are very nearly the same ducking question. Unless there’s a surplus of people who think that large differences are acceptable, but not to reward talent and effort? I suppose this is technically Hayek’s position, but it seems a little niche.
“[It is] Important to seek fun and things that give pleasure” and “[It is] Important to have a good time”
R=0.5 which might seem goodish, but really, a few subtleties aside, what’s the difference here (“Have a good time” is an idiom that means “enjoy yourself” after all). Is it really acceptable that one only explains 25% of the other?
Self-rated “Satisfaction with life” and self-rated “happiness”
R= 0.7, which is about what you would want, considering that there is some degree of randomness in answering questions. It is entirely mysterious to me why these correlate at this level, whereas many other questions which are arguably much less distinct, correlate at a fraction of this rate.
Conclusions
There are a few things to take from this. The first is that the degree of correlation between self-report variables is a poor guide to the degree to which the real variables are related to each other in the world.
You might say that this is not such a problem. Often we just want to know whether there is any correlation at all. If there is, it will confirm our hypothesis. If not, it will tell against it.
The problem is that with a large enough sample size, almost any two questions will be correlated. Correlation can be for almost any reason- even once it’s turned into regression and a bunch of different controls are added in. Correlation usually only provides substantial evidence when it is “reasonably hefty”.
But as we’ve seen, correlations can be smallish, even when we know for a fact that the real relationship must be large.
A question this leaves me with is, are there equally good cases of the inverse? Where we have excellent reason to think there is little relationship between variables, but nevertheless, they correlate strongly in self-report?
Another question- how are we meant to interpret other correlations now. The correlation between income and happiness is 0.22. In the past I would have scoffed and said that this was a small correlation, now I find myself thinking “that’s considerably larger than the correlation between saying you frequently using the internet, and saying you use the internet for many minutes a day”.
Weird, huh?
Still, I won’t be discarding self-report just yet. If anything, it makes the often largish correlations found in my particular areas of interest all the more impressive. If someone shows a biggish correlation between two self-report variables, I’d say it’s odds on that the real correlation is quite a bit larger still.
Well, this is damning. I think this probably applies to self-reports about myself to myself as well. If I ask myself, "On a scale of 1-to-10, how well are you doing?" vs. "How are you?" I probably get poorly correlated answers.