How to design better cross-country surveys and avoid bias

Cross-country studies are a compelling avenue of research that allow one to compare opinions, attitudes, and behavior of people across different cultures. For instance, brands can derive meaningful insights on how their customers are similar or different across countries so that they can make better business decisions.

Cross-country research typically involves developing a questionnaire, translating it to local languages, running fieldwork across markets, analysing data, and comparing the results. While seemingly straightforward, there are inherent complexities in the process that can bias the analysis.

Central to this issue is the survey instrument used. Cross-country research often involves surveys with Likert-type rating scales that measure varying levels of constructs such as agreement levels, importance, satisfaction, etc. Top 2 box (T2B) percentages and/or aggregated mean scores are then compared to assess regional differences (or similarities) in the datasets.

However, this assumes that people from different countries understand and respond to rating scales in a similar manner. The reality is that there are inherent cultural differences in the way people use rating scales. The meaning attached to Strongly Agree or a 6 on a 10-point scale may differ depending on where you come from.

When culture-specific differences influence how respondents answer questions in a survey, the observed patterns in response styles is considered to be a form of response bias (e.g., Yang, Harkness, Chin, & Villar, 2010). ‍

What does this mean for cross-country research?

Ignoring cultural biases in survey responses can lead to unreliable and erroneous conclusions. It is important to gauge whether observed results are true differences in the construct measured or whether they are partly attributed to the differences in response styles.

Several studies have investigated this phenomenon. Research has shown that Latin American countries have a greater tendency to pick extreme responses (using only the most positive or most negative response options) compared to other countries. On the other hand, East Asian respondents are less likely to provide high ratings (Dolnicar and Grün , 2007) with Singaporeans being more likely to select mid-to-lower end of the scale, resulting in low aggregate scores .

Many more such research studies exist, however we wanted to build on the existing literature in this particular area by testing different types of response scales (e.g., agreement scales, likelihood scale, etc.) response scale lengths (e.g., 5-point scale, 7-point scale, 11-point sale), scale design formats, and survey topics (e.g., satisfaction survey, product test, general attitudes) with a focus on Southeast Asia, which is unique because it's a region with a richly diverse and heterogeneous set of cultural systems and languages.

To better understand if there is an effect of culture on survey responses, we carried out a set of controlled experiments across N = 15, 415 respondents in Singapore, Malaysia, Indonesia, Philippines, Vietnam, and Thailand. Below, we highlight some learnings you can apply to achieve reliable, robust results while carrying out cross-country surveys.

Considerations to keep in mind while carrying out cross-country surveys

1. Ensure consistent survey instruments. The single most important factor to take note of is to use the exact same survey instrument. It is not enough to ensure that you’re measuring similar constructs (for e.g., agreement levels, likelihood scale, etc.) and topics of interest. You need to make sure that the question items and response scales are identical. To be more specific, if you are using an agreement scale to measure a topic, ensure this is consistent with all countries as there may be multiple ways of measuring a behavior and/or attitude to arrive at a similar conclusion. Furthermore, it is also important to control the order of questions as varying orders can introduce order effects (read about order bias and other survey biases here).
‍

2. Ensure consistent survey modes. Methodological differences in survey administration can account for a large part of inter-market differences in responses. For instance, if data is collected via a mobile survey in one country but via desktop survey in another, the two modes can introduce a bias in response because the user experience can differ vastly. It is therefore important that you keep the mode of survey administration consistent across markets. In the event that you are forced to adopt a hybrid survey mode in order to reach more respondents (for example, part mobile part in-person), ensure that the proprtion of respondents are roughly similar across all markets.‍

3. Use the same response scale lengths. If a question uses a 5-point rating scale in one country, it should be the same for other countries. Top 2 box scores or means scores can vary considerably depending on the length of the scale used. For instance, a Milieu research experiment found that on average, 35% of Vietnamese respondents were likely to choose the top-box on a 5-point scale. However, this score drastically dropped to 23% on a 11-point scale. This can impact the inter-market differences as well and lead to erroneous conclusions.‍

4. Some countries tend to be consistent high raters irrespective of the response scale length and question content. For instance in our research study, we found that on average across multiple response scale lengths, Vietnam and Philippines were twice as likely to use the highest rating on a scale compared to Singaporeans (31 % for VN and PH vs 13% for SG). Notably, this pattern is consistent even when we look into each scale-length individually.

Philippines and Vietnam are consistently high top-choice raters compared to other markets across different scale lengths. However, the difference is considerably reduced we the scale length is reduced to a 11-point scale.

Additionally, even varying the type of questions asked produced similar results. Vietnam, and the Philippines were most likely to pick the top rating choice across general attitudes, product test, and customer satisfaction questions.

What this suggests is that irrespective of the question content and response scale length, some countries are likely to be high raters compared to others. It is good to get a sense of what the response patterns are like for the markets you are surveying, and whether there are consistently high raters such that you can account for it while analysing and interpreting the results.‍

5. Some markets tend to be more “neutral” in their responses. In our research study we found that on average, Malaysian and Singaporean participants were more likely to select the neutral response relative to other markets (ranging from 18% to 22% for Singapore and 18% to 25% for Malaysia). It is good practice to get a sense of what this number is for the markets you are studying as well since it can have implications on your results. For instance, if it is important to get an opinion (e.g., likability of a design or experience with customer service) you can consider using an even rating scale vs an odd rating scale. Sometimes, respondents choose a neutral response option when they are not provided with “I don’t know / not sure”. Ensuring a good response scale design can minimise the neutral response bias.‍

6. Design format of the response scale matters. Choosing the right response scale format can minimise response bias and yield more accurate responses to enable reliable cross-country comparisons.

Our research found that the format in which the scale is presented to the respondent can have a significant impact on the results. For example, we ran one version of our test using a standard single-select rating scale (shown on the left below), as well as another format where we show the respondents the same scale using a spinner format, which is native format on iOS and Android devices (shown on the right below). The spinner was also configured so the default was to display the middle of the scale (i.e. 5) upon loading the question, and the respondent needed to interact with the spinner in order to proceed to the next question. Our findings showed that the spinner design helped to reduce top-choice bias by a significant margin for countries that exhibited the strongest bias (in our study Vietnam and Philippines).
‍

For a 11-point scale (which is commonly used in NPS ratings), Vietnam is approximately twice as likely to pick the top-box on a standard scale relative to the spinner scale (40% on a standard scale vs 23% on a spinner scale).

‍

What does this mean for cross-country comparison of data? The size of differences in cross-country results for certain countries might be a research artefact and not reflective of actual observed results. In this case, the difference between say Singapore and Vietnam might be exaggerated due to poor response scale design that contributes to response bias.

Some other platforms find that using a visual format for rating scales can reduce inter-country variance by almost a third. Visual formats often work well on desktop screens, however they translate to a poor user experience when used on a smartphone screen. Avoid the standard single-select format for long response scales and explore alternative rating scale formats to see what suits your research methodology and needs the best. What’s important to consider is that smartphone screens have limited real-estate. Scales that can be optimised for it will yield better results.‍

7. Translation matters. The importance of a rigorous and thorough translation process cannot be underestimated. It is crucial that the intended meaning and nuances of language are preserved. While it is common to outsource translations to a 3rd party translation service, we highly recommend reviewing it, especially by someone who has an understanding of the context and rationale of the survey.

Another point to note is that just because a survey is being run in a country with its own local language(s), not everyone may be comfortable with the local language. It is always a good practice to target it based on the language preference of the respondent and not assume everyone is fluent in the most commonly spoken local language. For instance, in Malaysia we often run surveys in English in parallel to Malay (national language) because a considerable proportion of the panel is more comfortable in it.‍

8. Clean out potentially inattentive respondents. A thing to watch out for is respondents who randomise their responses without carefully consideration. One example would be straight-lining or respondents selecting the same response for all questions regardless of the question content or what their actual opinion is. This can exacerbate response biases for certain countries and can render inter-market comparisons inaccurate if not accounted for.

The format of response scale used too can influence this. For example, we found that compared to standard rating scales, using a spinner rating scale led to significantly lower straight-lining for 11-point response scales. Thus, keep an eye out for respondents who display poor survey taking behaviour and exclude them from the data set so that you have more reliable data to make accurate cross-country comparisons.

Learn with Milieu: Watch the full video below where our COO, Stephen Tracy and Senior Research Manager, Antarika Sen share findings on the impact of language and culture on survey response scales:

‍

Milieu Insight is a reputable survey software and panel research agency, aiding businesses in leveraging data for strategic growth.

Get in touch

If you have questions related to cross-country surveys, Milieu's survey platform and/or other product/service offerings feel free to reach out to sales@mili.eu.

What does this mean for cross-country research?

Considerations to keep in mind while carrying out cross-country surveys

Get in touch

Author

Antarika Sen

Ready to elevate your insight’s game?