Special thanks to Kahyin Toh, a researcher with Milieu, who helped with the experiments
Grid questions are enticing to use and understandably so. They condense multiple questions into a matrix. The rows consist of questions and the columns present a predefined response list. The response list is introduced just once at the top, which are typically single-select Likert-type scales or multi-select brand or attribute lists. Think customer satisfaction surveys or brand dipsticks.This way, survey creators can bundle questions in a visually compact format and respondents get to answer it in a supposedly easy and efficient manner. What’s more, research also shows that grid surveys lead to faster completion times.
There is one big problem though. There is a false sense of economy. Faster and easier does not always translate to high quality data. Furthermore, with surveys increasingly taken on smartphones, grid questions don’t provide the best user experience on a smaller screen size. Even survey companies like Qualtrics look at grids unfavourably.
We at Milieu feel strongly about this and have therefore stayed away from offering grids as a question type on our survey platform. However, we wanted to delve deeper to understand how exactly grid questions impact data quality and survey taking experience. Hence, we put them to the test in a controlled experiment!
In this experiment, respondents were randomly assigned to two groups. One group (n=332) was presented with a survey in a grid question format (via Google Forms). The other group (n=512) was presented the exact same survey but as a single-item design i.e. asking them item by item as separate questions (via Milieu Surveys).
Here is what the grid question format looked like on Google Forms
Here is what the single-item survey questions looked like on Milieu Surveys
Some context first. Respondents were shown a list of attributes (presented as rows), and for each attribute they had to indicate which retail brand(s) they associate it with (presented as columns). The question was configured as a multi-select, so for each attribute row they could select as many brands as applicable from a list of 6 brands + none of the above.
We found that respondents consistently selected fewer responses (in this case Brand names) across the different attributes in the grid question survey compared to the single-item survey.
The image below shows the difference in selection rates across all attributes for one brand between grid and single-item survey questions.
What does this imply?
In this specific case, if you are running a brand health or dipstick study, respondents may be under-selecting your brand across the different attributes in a grid survey versus single-item survey; in some cases up to 16 percentage points (we will back up the under-selection in the next section). This is despite the fact that respondents were asked the same question in the two surveys. While this question presents the brand list as columns, the concern could hold true even if the grid is flipped i.e. with brands presented as rows and attributes as columns. In that case your brand might be falsely underperforming for the various attributes.
The under-selection could be attributed to a couple of reasons.
One, grid questions are not optimised for a smartphone screen. As can be seen in the image below, the layout relies on respondents having to scroll horizontally to view all the brands which adds to the response burden. It may often lead to respondents not keeping all brands in mind with careful consideration while answering each item leading to an under-selection of brands.
Second, in grids, items are condensed into a matrix with the same response list giving the perception that there is less cognitive effort required. However, this may give a false sense of comfort tempting respondents to process the information and make judgements less thoroughly. This leads to satisficing, and in this case it means selecting fewer brands for each attribute to breeze through the grid. Favouring faster, easier responses. In single-item surveys respondents can carefully consider their answers one question at a time.
Under-selection of response items could mean failing to capture the accurate sentiment and behavior of respondents which can lead to misleading insights and erroneous decision making.
But how can we be more certain that they are not considering their responses carefully? We tested for attention levels across the two survey design groups. This brings us to the next observation.
To gauge whether respondents were carefully reading through the questions and not randomising their responses in a grid, we inserted an attention check question.
The below image is an example of an attention check question used in the grid survey. This was a 20 item grid question. YourChoice Mart is a fake retailer that does not exist in Singapore. Respondents should have selected “never purchased from here”.
In this grid, respondents were asked to rate the frequency of purchase across different retailers. In the response list we added in a cheater option, which is a fake brand name (i.e. YourChoice Mart). A similar question was asked in the single-item survey. Respondents who selected the correct answer option (i.e. “never purchased from here”) were classified as PASS. Respondents who selected a wrong answer option (anything apart from “never purchased from here”) were classified as FAIL.
We observed that in the single-item survey, 16% of the respondents failed the attention check question. However, this rate more than doubles in the grid surveys indicating that a whopping 34% of the respondents were randomising as opposed to carefully considering their responses.
When asked to rate a retail brand across a set of attributes (single-select agreement scale), respondents in the grid survey were significantly more likely to select the same answer across different questions (i.e. straightlining) compared to the single-item survey. While a meagre 4% of the respondents in the single-item survey selected the same response on the agreement scale for all 20 questions, this shot up to 10% for the grid survey.
Now those who select the same rating for ALL questions are serious offenders. What about those who may not provide the same response for all the questions but rather straightline for some and differentiate their responses to a lesser degree for others? This response pattern is referred to as nondifferentiation.
To test for this we measured the variance i.e. how spread out are the responses. We found that the standard deviation (SD; measure of variance) in the single-item survey was 0.93 whereas the SD for the grid survey was 0.76. This difference was significant at the .001 level.
The lower variance for grids meant that respondents were less likely to opine differently across the attributes, which leads to less differentiated responses. Respondents in the single-item survey were significantly more likely to spread out their responses along the rating scale which you would expect in a survey that tests them across 20 different attributes.
Last, who better than the respondents to let us know how grids compare to single-item questions when it comes to user experience?
When asked to rate their frustration levels on a 5-point scale, respondents who took the grid survey were much more likely to report greater frustration levels (Top 2 Box = 34%) compared to those who took the single-item survey (Top 2 Box = 15%).
On the surface, grid questions may appeal to many for their perceived ease of use and speed. However, this experiment sheds light on how the grid design can instead encourage poor survey taking behaviour. Your data quality risks being compromised because grids can lead to under-selection of responses in multi-select questions, and straight-lining or less differentiated responses in the single-select ones. Furthermore, grids entice respondents to randomise their responses leading to more fails in attention check question.
This problem is compounded by the fact that surveys are increasingly taken via smartphones, which is a huge drawback for grid design. A poor user-experience can frustrate respondents which is what we observed in this experiment. Respondents who took the grid survey reported greater frustration compared to those who took the single-item survey (34% reported that the experience was frustrating vs 15% for the single-item survey).
Fret not because an alternative exists! The single-item variant is not only optimised for mobile phones, they also give us better data quality which is of utmost importance above and beyond space efficiency and survey completion time. However, bear in mind that single-item surveys can lead to increased question count. A good thumb rule is not to go beyond 10 minutes (read more here from another experiment we conducted). Some platforms have modified grids such as accordion grids or progressive grids, however they too have been shown to have inherent issues such as higher breakoff-rates compared to regular grids or single surveys (read here).
One thing to keep in mind is that even though grids may make the surveys seem shorter, the number of decisions that a respondent needs to make and submit remains the same. The next time you design a survey remember to avoid the use of grids since they come at a cost of data quality and if you really can't find a workaround make sure you make them short and less complex.