The Global Perspective: Addressing the Challenges of Cross-Cultural Scaling
By Pete Cape, SSI
The survey results are back. The Japanese branch of your client's hotel chain rates an 8.34 out of 10 for cleanliness. The same survey, given to Germans rating the hotel's locations in Germany, scores 7.46. The difference between the two is statistically significant. So, do you tell your client that its Japanese employees are keeping the hotel cleaner than its German ones?
While the data seems to indicate that's true, making that assessment right away dismisses the potential impact of cross-cultural scaling. That is to say that respondents' interpretation of your survey language and how they use scales may have a lot to do with the country and culture where they live, and that may affect the answers they give – irrespective of the service they receive.
Here's the first cultural question to consider: Do the Japanese consistently rate the same item (in this case, cleanliness) more highly than Germans do? Our survey has two variables (the actual cleanliness of the hotel and the way cultures use scales) and therefore four possible outcomes:
- Japanese and Germans rate the same way; the hotels are different
- Japanese and Germans rate the same way; the hotels are the same
- Japanese and Germans rate differently; the hotels are different
- Japanese and Germans rate differently; the hotels are the same
Depending on which of these statements is true, you'll recommend that the client take a different course of action. Therein lies our dilemma – filtering out the cultural meaning in survey scales to understand what their results are really telling us.
Cultural Meaning in Scales
It is common in market research to use numeric scales to represent degrees of difference. They are (or at least look) mathematical: The spaces between the points are equal (or at least appear so), and the progression of the numbers from low to high suggests an improving picture, from bad to good. Still, numeric scales used for rating are not equivalent to centimetres on a ruler. Even the size and meaning of numeric scales are steeped in culture.
For most of us, it begins at school. In the United States, the scale works between 0 and 4 – though children receive their actual grades in terms of A to F. In France, the scoring is 0-20, where the top and bottom grading sections are wider than the middle three, and "good" marks start around 14 out of 20 (70 percent up the scale). In Italy, the scale is from 1 to 10, and in Russia, 1 to 5. Then, of course, in Germany students are rated from 6 to 1, with 1 being the highest. Now add a bit of pressure on a panelist asked to make decisions repeatedly and quickly, and they'll very likely revert to inbred rating rules of thumb.
There are mathematical ways of dealing with the varying scale issue, but they are not without their own problems. Normalising to unity (i.e., getting a score between 0 and 1) can be a relatively simple data transformation – for each data point, calculate the distance between it and the minimum of all the data points and divide this by the maximum range of the entire data set. Doing this separately for each culture's data set should, in theory, solve the mathematical problem. To do this, however, presupposes that there is scale usage effect on the data, and that panelists' observations are not empirically true.
Problems with verbal scales are fewer but they include, of course, issues with translation. Is "assez bien" in French better (or worse) than "befriedigend" in German? Evaluating scale meaning increases our initial set of possible outcomes from 4 to 8, since we now have 3 variables (the actual cleanliness of the hotel, the way cultures use scales, the meaning of the scale points).
It's also worth looking at cultural expectations around the item being rated. Firstly, the survey author's cultural and linguistic perspective decides what gets rated and how the item is described. Secondly, if (as a survey taker) my expectation of cleanliness is low, then a "somewhat good" cleanliness rating from someone else"s perspective is going to look "extremely good" from mine. This issue can be solved in part by considering service "gaps" rather than absolutes – that is, the "gap" between expectation and delivery – but still, adding in the issue of expectation takes our set of possible, real outcomes from 8 to 16.
Finding a Control
Once we better understand the dimensions of the cross-country rating bias, it is obvious there needs to be some calibration among survey responses. We might do that by running a test. First, find an item that is culturally neutral. There should be something that we all, the world over, agree on as an essential human truth (social values and moral psychology may be fertile ground). Then, find some measuring stick (our scale) that we can agree has the same meaning at each of its points. Any difference in ratings between cultures must then be due to cultural bias (plus sample error, and we already have a paradigm for dealing with that). Applying the calibration factor to any future ratings will adjust the scores, making them equivalent.
Simple, right? Hardly. But it's worth looking outside of our own market research frame of reference to find a small sample of cross-cultural items; that ought to draw equal responses around the world.
Calibrating for Cultural Bias
If researchers work with cross-culturally validated scales and take the time to evaluate whether their items for rating are equally salient, then the probabilities work in their favour that what they observe (given a confidence interval) is actually the truth. However, in real life, given the plethora of unknowns in cross-cultural scaling, their possibility of being wrong is higher than the statistical test suggests.
Nonetheless, as an industry, the better we become at calibrating our scales for cultural bias, the better we"ll be at advising our clients about where to concentrate their scarce resources.