In brief
- A new study shows LLMs can mimic human purchase intent by mapping free-text answers to Likert ratings through semantic similarity.
- Method achieved 90% of human test–retest reliability on 9,300 real survey responses.
- The study raises questions about bias, generalization, and how far “synthetic consumers” can stand in for real people.
Forget focus groups: A new study found that large language models can forecast whether you want to buy something with striking accuracy, dramatically outperforming traditional marketing tools.
Researchers at the University of Mannheim and ETH Zürich have found that large language models can replicate human purchase intent—the “How likely are you to buy this?” metric beloved by marketers—by transforming free-form text into structured survey data.
In a paper published last week, the team introduced a method called “Semantic Similarity Rating,” which converts the model’s open-ended responses into numerical “Likert” ratings, a five-point scale used in traditional consumer research.
Rather than asking a model to pick a number between one and five, the researchers had it respond naturally—“I’d definitely buy this,” or “Maybe if it were on sale”—and then measured how semantically close those statements were to canonical answers like “I would definitely buy this” or “I would not buy this.”
Each answer was mapped in embedding space to the nearest reference statement, effectively turning LLM text into statistical ratings. “We show that optimizing for semantic similarity rather than numeric labels yields purchase-intent distributions that closely match human survey data,” the authors wrote. “LLM-generated responses achieved 90% of the reliability of repeated human surveys while preserving natural variation in attitudes.”
In tests across 9,300 real human survey responses about personal-care products, the SSR method produced synthetic respondents whose Likert distributions nearly mirrored the originals. In other words: when asked to “think like consumers,” the models did.
Why it matters
The finding could reshape how companies conduct product testing and market research. Consumer surveys are notoriously expensive, slow, and vulnerable to bias. Synthetic respondents—if they behave like real ones—could let companies screen thousands of products or messages for a fraction of the cost.
It also validates a deeper claim: that the geometry of an LLM’s semantic space encodes not just language understanding but attitudinal reasoning. By comparing answers in embedding space rather than treating them as literal text, the study demonstrates that model semantics can stand in for human judgment with surprising fidelity.
At the same time, it raises familiar ethical and methodological risks. The researchers tested only one product category, leaving open whether the same approach would hold for financial decisions or politically charged topics. And synthetic “consumers” could easily become synthetic targets: the same modeling techniques could help optimize political persuasion, advertising, or behavioral nudges.
As the authors put it, “market-driven optimization pressures can systematically erode alignment”—a phrase that resonates far beyond marketing.
A note of skepticism
The authors acknowledge that their test domain—personal-care products—is narrow and may not generalize to high-stakes or emotionally charged purchases. The SSR mapping also depends on carefully chosen reference statements: small wording changes can skew results. Moreover, the study relies on human survey data as “ground truth,” even though such data is notoriously noisy and culturally biased.
Critics point out that embedding-based similarity assumes that language vectors map neatly onto human attitudes, an assumption that may fail when context or irony enters the mix. The paper’s own reliability numbers—90% of human test-retest consistency—sound impressive but still leave room for significant drift. In short, the method works on average, but it’s not yet clear whether those averages capture real human diversity or simply reflect the model’s training priors.
The bigger picture
Academic interest in “synthetic consumer modeling” has surged in 2025 as companies experiment with AI-based focus groups and predictive polling. Similar work by MIT and the University of Cambridge has shown that LLMs can mimic demographic and psychometric segments with moderate reliability, but none have previously demonstrated a close statistical match to real purchase-intent data.
For now, the SSR method remains a research prototype, but it hints at a future where LLMs might not just answer questions—but represent the public itself.
Whether that’s an advance or a hallucination in the making is still up for debate.
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.