Synthetic data, is it the end of market research as we know it, or is it just synthetic hype?
The marketing world loves a hot new trend, as we saw last year with the metaverse. We’re seeing the same hype now with synthetic data and its promise to fundamentally change the game – this time when it comes to insight.
Now yes, to an over-stretched, budget-squeezed marketer there’s undeniable appeal at the surface level, but it’s important to dig a little deeper into the pros and cons. While it might offer speed and ease, will it also offer accuracy? Will it really be able to capture the nuances of real-life customer conversations? In a dynamic market like B2B technology, will it be able to keep up with trends and attitudes in the same way as asking a human being?
Any effective marketing places the customer at its core – so it seems rather counter-intuitive not to listen to that customer and instead replace it with a synthetic version, all for the sake of convenience.
In the world of B2B tech, our clients come to us to discover what they don’t know, to test hypotheses and measure reactions and perceptions among a hard-to-reach tech decision maker audience. Will synthetic data be able to deliver what they need?
Now you might think, as a “human-first” market research agency, we would say all of that – and you’re right to an extent, but we also believe it’s not as simple as choosing one over the other. The two can work best combined.
In fact, the concept of synthetic data in research isn’t actually that new. While not common in research for marketing communications, and therefore something Vanson Bourne has never done, for decades, our industry has used “augmented” data, such as when our weighting underrepresented parts of a sample. But it is thought that synthetic data offers a new level of precision to this sort of practice.
So amidst the hype, we must ask: will synthetic data revolutionize research, or will it lead to misunderstandings?
In this blog, we’ll explore:
- What synthetic data is and how it’s created
- Synthetic responses vs. synthetic respondents
- Use cases and limitations
- Vanson Bourne’s current stance on synthetic data
What is Synthetic Data?
At its core, synthetic data is artificially generated using AI tools, machine learning, algorithms, and large language models (LLMs) to mimic real human data. These tools operate on probabilities, predicting the most likely data point or next word in a sequence. However, as data scientist Indi Young notes, these models don’t understand meaning—they generate outputs based on programming and patterns identified in training data.
In market research, synthetic data can take two primary forms: synthetic responses and synthetic respondents.
Synthetic Responses
A synthetic response is an AI-generated answer to a single question. This can fill gaps in survey data by leveraging existing survey data points and making a best guess. For instance, if a respondent misses a question, an AI tool can generate a probable answer based on the other data from the respondent in that survey.
Alternatively, synthetic responses can be generated by scouring the internet to answer unasked questions. This method works well for consumer studies with ample reference data but falls short in B2B tech research due to limited publicly available data. It’s easy to find reference points for a consumer brand preference, for example, but harder to find a credible AI-generated response about decision makers’ opinions on their IT budgets.

Read the blog: ‘unartificial intelligence’
Synthetic Respondents
Synthetic respondents are AI-generated personas built based on existing data. These virtual respondents can represent different demographic groups or be interviewed as individuals. In consumer research, it’s relatively straightforward to train models using social media and internet data and create a “person” to interview.
In customer research, organizations with sufficient data can also create synthetic respondents. Some tools even claim high similarity scores between synthetic and real customer data. While synthetic data can be quicker and cheaper, the critical question remains: is it worth potentially alienating or misunderstanding a portion of your audience?
In the B2B tech context, synthetic data faces more significant challenges. Trust is paramount in B2B buying cycles, and marketers need credible and distinctive content that resonates with their target personas. How can you achieve this without real human input? Also, most B2B surveys combine opinion-based and factual questions, making it difficult for AI to generate credible responses without sufficient reference data. Even with enough data, there’s a risk of outdated insights – especially in the case of big incidents like the recent Microsoft/Crowdstrike outage. Conducting fieldwork with humans during the outage would capture real-time experiences, whereas using synthetic data wouldn’t include the outage as a reference point in its data. Consequently, the synthetic data would not have the outage programmed into its reference. Luckily, we have an expert network of IT leaders who can offer us their timely opinion.

An Alternative to Weighting Data
Market researchers have used data weighting for decades, and synthetic respondents may enhance this practice. By generating respondents from existing data, synthetic data can increase response numbers in underrepresented groups, potentially reducing inaccuracies. This approach can complement primary research with real humans, offering cost and time benefits while maintaining integrity.
However, transparency is crucial. Researchers must clearly explain when and how synthetic data is used to ensure credibility, especially in B2B tech research where trust is critical. Guidance on this is changing and developing all the time, and we keep up with it so that our clients don’t need to.

Read the industry regulations.
Summary of Benefits and Limitations of Synthetic Data in General

Read about the different AI tools producing varied responses.
Vanson Bourne’s Current Stance on Synthetic data
For 25 years, we’ve built expertise in reaching senior IT and business decision-makers. Synthetic data doesn’t yet offer the same level of insight in our opinion. Human respondents provide unique B2B insights that are not available online, and primary research is therefore essential for valid, credible insights – especially in such a dynamic market like technology.
While synthetic data holds potential, it’s not yet a complete substitute for real human insights in B2B tech research. A balanced approach, combining synthetic and primary data, ensures credible, reliable research outcomes. Stay tuned as we continue to explore this evolving landscape.
Talk to us about B2B research with real respondents