Below are some additional notes on four survey question topics that warranted more specific information than my 20 survey question tips could offer:
1. How even the most innocent of differences can produce a statistically significant effect,
2. One way to ask for sensitive information without respondents admitting anything.
3. The heaping phenomenon.
4. Where to find previously made and tested psychometric scales.
The effect of wording
Consider the following two questions asked in surveys by Quinnipiac University and Gallup, respectively.
1) "Do you think the United States is doing enough to
address climate change, doing too much, or do you think more needs to be done
to address climate change?" (Quinnipiac University Poll)
2) "Do you think the U.S. government is doing too much,
too little, or about the right amount in terms of protecting the
environment?" (Gallup Poll)
On the surface of these look like the same question but there are a few subtle differences.
Question (1) asks if the entire United States is doing enough. Question (2) specifically asks if “the U.S. government” is doing enough.
Question (1) also asks about specifically about climate change whereas (2) asks about the entire environment.
The order of the prompted responses in (1) is “enough”, “ too much”, and “needs more”. The order of responses in (2) is “too much”, “too little” and “about right”.
These differences may seem very minor or subtle but they can affect the rate of responses to such surveys as well as the responses themselves.
The website pollingreport.com reports that the survey questions (1) and (2) were asked until 1291 and 1041 responses were collected, respectively. Their 95% margins of error were 4 and 3.3% respectively, which makes the contrast margin of error about 5.1%.
These questions were both asked of the American public in March 2018, so the population is essentially the same. However (1), the Quinnipiac University question found 22% answered ‘doing enough’, while in (2), the Gallup poll question, 28% answered ‘about right’. That's a difference larger than margin of error just from minor differences in polling choices like wording and sampling. Furthermore 5% of the respondents to (1) answered as unsure, whereas only 1% of the respondents to (2) did.
Neither of the survey questions is bad or notably better than each other, and still we see differences like this. These differences can happen for surveys that have been developed and executed even at a professional level by impartial, unbiased polling groups.
Sensitive information
One way to collect proportion information about something
embarrassing or illegal is to allow respondents to mask their answers through a
more benign condition.
As an example consider the following question:
As an example consider the following question:
Flip a coin and answer this which of the following describes
your situation best?
A) My coin landed heads.
B) My coin landed Tails or I have used illegal drugs this year or both.
This sort of setup is useful when the variable of interest is the prevalence or proportion of the sensitive condition. If the true proportion of the condition, for example drug use, is P, then the expected proportion that would and answer “yes or drugs” would be E = (1 + P)/2
We can estimate P by inverting this formula to P = 2E - 1
The main appeal of this method is that respondents can collectively reveal a proportion of something without anyone responded admitting to a behavior.
However since you cannot know which respondents this have the sensitive condition further analysis like chi-squared test are difficult or impossible.
A) My coin landed heads.
B) My coin landed Tails or I have used illegal drugs this year or both.
This sort of setup is useful when the variable of interest is the prevalence or proportion of the sensitive condition. If the true proportion of the condition, for example drug use, is P, then the expected proportion that would and answer “yes or drugs” would be E = (1 + P)/2
We can estimate P by inverting this formula to P = 2E - 1
The main appeal of this method is that respondents can collectively reveal a proportion of something without anyone responded admitting to a behavior.
However since you cannot know which respondents this have the sensitive condition further analysis like chi-squared test are difficult or impossible.
From a survey wording standpoint this method also introduces
an increased potential for confusion or distrust compared to a simple question.
Another drawback compared to asking something directly is
that the margin of error is doubled. If the true proportion P is small it is
even plausible that the estimate of P would be less than 0.
The heaping phenomenon
Consider the following pair of graphs from the 2011 census on literacy in
India.
In the histogram of age counts there are spikes in reported
age frequencies every 5 years. Why might
this be? The size and regularity of the spike is far beyond what random
variation historic events or demographic Trends would suggest.
A key detail is that these ages are reported ages instead of
actual ages. Poor conditions mean many older Indians are unsure of their exact
age in years so they report on a proximate age. Some of the ages will tend
towards the nearest multiple of 5 because of the phenomenon called heaping.
Also consider the graph of literacy rates (Y) over age (X). The downward spikes are a result of the same heaping phenomenon as it interacts with another variable. Respondents that heap their age to the nearest five years are also less likely to be literate. Although age heaping is an extreme example, any open question that is asked is subject to heaping.
Heaping happens to any number that is not known exactly. For example, the answer to an open answer survey question on annual income is likely to be heaped to the nearest $5,000 or $10,000. Similarly, the amount someone is willing to pay for an item is likely to be heaped to the nearest dollar or price point. This is why such questions are often asked as ordinal ranges instead of open-ended questions.
Also consider the graph of literacy rates (Y) over age (X). The downward spikes are a result of the same heaping phenomenon as it interacts with another variable. Respondents that heap their age to the nearest five years are also less likely to be literate. Although age heaping is an extreme example, any open question that is asked is subject to heaping.
Heaping happens to any number that is not known exactly. For example, the answer to an open answer survey question on annual income is likely to be heaped to the nearest $5,000 or $10,000. Similarly, the amount someone is willing to pay for an item is likely to be heaped to the nearest dollar or price point. This is why such questions are often asked as ordinal ranges instead of open-ended questions.
Pre-made psychometric scales
There has been a great deal of work already done on psychometric scales, and
many of these scales have been published and validated for you to use for your
own server. These skills include the ECQ2 (emotional control questionnaire),
the PNS (Personal Need for Structure survey), the NPI (Nacrissistic Personality
Inventory), and the VLQ (Valued Living Questionnaire). These are all found the
Acceptance and Commitment Therapy measures package by Dr. Joseph Ciarrochi and Linda Bilich, along with notes on each
measure’s validity.
There are several types of validity and ways to measure them. Test-retest validity describes the level of consistency between test scores if the same test is taken by the same person at different times. There is also internal validity, which describes surveys for which each question is measuring the same underlying psychometric property, one way to measure internal validity is Cronbach's Alpha.
Also, it’s very hard to tell a survey is actually measuring the psychometric property that it’s intended to. Scores on tests for depression, for instance, can be vastly different for different tests on the same person.
No comments:
Post a Comment