Big data can support smart market research, but only if researchers embrace the basics of understanding what it is they want to measure — and how.
Today’s businesses see market data as a commodity. Readily accessible information about consumer activity and preferences allows market researchers to develop large data sets to mine for consumer insights. And indeed, a look through recent market research industry publications shows that discussions in the field have been dominated by a focus on data analysis.
But more often than not, insight into what customers really care about is hampered by the quality of the data being collected. Some market researchers conflate the idea of data quality with sample size, with the belief that reliability, validity, and other characteristics of “good measurement” derive solely from the amount of data collected. This is certainly not the case.
A heavy emphasis on data collection and analysis is irrelevant if it omits the first and most important step of market research — the design of the metrics. In the psychometric tradition, survey development and the construction of specific survey questions has been emphasized as the most important step in the research process. Unfortunately, this step is getting short shrift by most market researchers today.
Failing to assess the measures that are the foundation of business decisions poses a colossal risk. Making data-driven decisions based on poor measures can be infinitely worse than making decisions without data at all.
To help organizations think more critically about the measures they use to collect information about consumers, we’ve outlined four common misconceptions held by many market researchers and provide suggestions for how to break away from these mistaken beliefs.
Historically, when market researchers wanted to measure a construct, such as how consumers feel about a particular brand (for example, “brand love”), they would ask respondents to rate questions that directly describe the construct, such as “How much do you love this brand?”
This kind of “measurement by describing” has its share of problems. For instance, many constructs are too abstract for regular consumers to report on in concrete terms. Think about how you’d reply if you were asked how much brand love you have for Tide laundry detergent. Most people couldn’t get more specific than reporting general approximations such as “a lot” or “a little.”
Researchers have begun to move toward methods that use self-reported data in better ways. Instead of asking, “How much do you love this brand?” today’s best practice is to pose statements that a consumer might endorse if they loved the brand. For example, “disagree/agree” statements like “I would drive 20 miles to purchase [Brand]” would be fully endorsed only if the respondent really loved the brand. We derive the level of the construct from behaviors respondents say they would engage in.
The “measure by deriving” approach requires a deep conceptual understanding of what is being measured. But many market researchers still ask questions the old way, descriptive of what’s being measured (“I like the ad I just saw”) rather than descriptive of derivative behaviors (“I showed the ad to friends”).
If market researchers continue to write surveys that measure constructs overtly instead of by their derivative behaviors, the data will likely be subject to uncertainty and error that could easily be avoided. They need to put more thought into what exactly they want to know, carefully consider what behaviors should be the consequence of that construct, and develop the measure from there.
Market research is not exempt from the financial pressures of business. Cost-consciousness trickles into our work when clients ask for short surveys that cover as many topics as possible — which often results in a single question for each topic. For example, “How satisfied are you with your experience?” might be the only question in a survey that assesses “customer experience.”
There are a number of challenges with this kind of thinking. First, the single question might not be a unique measure of “customer experience” but instead a measure of some other construct such as “agreeableness.” It is not uncommon in the world of psychometrics for items to “cross load,” meaning they can be a measure of more than one thing. Second, assuming that a question does measure what we want it to measure, it rarely ever measures all aspects of a construct. In the example of customer experience, many aspects — including price, promotions, interaction with employees, and perceptions of the brand — join to influence an individual’s global impression of his or her experience. Asking just one question fails to assess these multiple aspects and neglects the variations in people’s impressions.
As well, limiting a measure to only one question doesn’t always allow an organization to measure the change in a respondent’s impressions, because it forces a measurement ceiling on respondents. Imagine a scenario where an organization wants to test new ads by measuring the effect the ad has on brand impression. Respondents who already love the brand might have “maxed out” on their measure of brand impression if they gave the brand a 7 on a seven-point scale prior to ad exposure. Even if they’re shown really good ads, they can’t increase their score in the post test — they can only stay at a 7. This limitation to the measure does not correspond with the way reality works: More often than not, brand loyalists’ passion for a brand can be increased with ads and brand actions that resonate.
Having multiple questions that measure different aspects of the same construct is a fail-safe way to make sure data-driven decisions are actually based on all of the aspects of the construct of interest. It’s a way to capture all of the information that the measure can supply.
Even if market researchers heed the advice to use multiple questions for each construct they are measuring, they must remember that different questions provide different information.
Consider the standardized tests that are used widely in the U.S. for college admissions. The SAT and ACT are designed to include easy items that most test-takers will answer correctly and difficult items that fewer can answer correctly. Market research questions work in a similar manner. There are some behaviors that almost anyone would do whether they love a brand or not, such as read some of the brand’s posts on social media. There are other behaviors that only those who really love the brand would do, such as spend a large portion of their disposable income on it or travel a considerable distance to purchase it.
This point can be illustrated visually. (See “Easy- and Harder-to-Endorse Statements Provide the Whole Picture.”) For example, assume that the construct (also called latent trait) “brand love” ranges from -4 to 4. People who dislike the brand are on the lower end and people who love it are on the higher end. Respondents in the middle, at 0, have a 51% probability of endorsing the “easy” statement of “I would read [Brand’s] posts on social media” and only a 6.8% probability of endorsing the “difficult” statement of “I would drive 20 miles to purchase [Brand].” A higher level of brand love is required to endorse the more difficult statement. In this example, a respondent with a higher latent score of 1 has a 75% probability of endorsing the easy item and jumps to a 90% probability of endorsing the hard item.
Marketers need to select questions and statements with a range of difficulties to ensure that useful information can be captured from all respondents.
One of the most important jobs that market researchers have when doing research is to provide an interpretation of what they are measuring.
One way to do this is by using norms, which are created when a researcher has collected the same measurement on a number of individuals or groups. By knowing how others score, researchers can see where an individual falls relative to everyone else. Another approach is criterion-reference scaling. For example, tests for a driver’s license are criterion-referenced, where “proficiency” is established using a cut-off score (for instance, getting 85% of answers correct is required to pass). Rather than considering how an applicant scores relative to others, the focus is on whether the driver is proficient based on scores that are predetermined by state agencies.
But while both norms and criterion-reference approaches help aid with the interpretation of a score relative to other people (via norms) or relative to some other external criterion (via criterion-reference), they do not aid in the interpretation of how an individual scores relative to the latent trait being measured — in the example “Easy- and Harder-to-Endorse Statements Provide the Whole Picture,” that would be how much an individual dislikes or loves a brand to begin with on the scale of -4 to 4. Pairing how individuals score on a measure with their level of the latent trait is a step that market researchers often skip.
In this plot, we can see two statements from a survey. The red line is the “easier” statement, requiring a lower level of brand love for a respondent to endorse. The blue line is the “harder” statement.
Imagine you’re analyzing survey results about “brand loyalty” looking at data from a five-question survey, where each question was ranked using a seven-point Likert scale. If respondents gave the questions an average score of 5 (or a sum score of 25), how would you interpret that?
The truth is that you simply might not know what constitutes a high score without scaling. Some might see this as a bad score because it is far from the maximum they could have gotten — an average of 7, with a sum score of 35. However, just because the scale maximum is 35, that doesn’t mean that a respondent who exhibits a high level of brand loyalty will say 7 across every question. It could be that a 25 is a high score on this scale.
Regardless of what they’re measuring, market researchers must recognize that scaling is a necessary step. While most market researchers use norms, it is of increasing importance to use statistical models such as item response theory to establish a correspondence between responses on a measure and level on the latent trait.
Measurement is a tough thing to get right. The more we work with clients and their vendors, the less emphasis we see being put on how measures are created. While big data gives us safer ground for generalizing our results, it is no substitute for the careful crafting of a measure that has been tested for reliability and validity.
By thinking carefully about what is being measured and adopting psychometric best practices, researchers and executives can make data-driven decisions that stand on the strongest possible footing.
By: Ken Faro and Elie Ohana