We’re all starting to hear about it, but what is synthetic data? Simply put, synthetic data is “fake” data that is mathematically manufactured to mirror the patterns and behaviours of real-world datasets.
For B2B marketers, this means using AI to create a digital “twin” of your customer base. Instead of waiting weeks for a survey to come back, you use a model trained on your existing customer calls, ads, and behavioural data to predict how an audience will react.
However, it isn’t a silver bullet. Marketers need to be aware of significant concerns: the “rubbish in, rubbish out” risk (if your training data is biased, your AI will be too) and the danger of oversimplification, where nuance gets lost in statistical significance.
In this episode, Kate O’Keeffe, CEO and Co-Founder at Heatseeker, joins Jodi to explain how to navigate these challenges and put synthetic data to work. Kate walks us through how to maximise the effectiveness of your data to minimise risk in your decision-making. We explore the fascinating process of chatting with an agent that embodies your ideal target persona – learning their thoughts, feelings, and opinions on your brand as if they were in the room with you.
This conversation gets to the heart of how synthetic data can do truly immense things for large organisations, providing a unified perspective that aligns boards and teams. Kate and Jodi don’t shy away from the controversial bits, covering data quality, the “honesty” of AI personas, and how to handle rapid consumer shifts.
Listen below, on Apple Podcasts or Spotify
Or watch on YouTube
And once you’re done listening, find more of our B2B marketing podcasts here!
The FINITE Podcast is sponsored by Clarity, a full-service digital marketing and communications agency. Through ideas, influence and impact, Clarity empowers visionary technology companies to change the world for the better.
Find the full transcript here:
aJodi (00:01)
Hi everyone and welcome back to the finite podcast. Do we have a treat for you today? Synthetic data. You must have heard it creeping into marketing and board conversations. The ability to maximize your behavioral data and minimize the risk in business decision-making, putting your audience and what they want first. So I’ve had the absolute pleasure of talking to Kate O’Keefe, founder and CEO at Heatseeker.
They are in the business of synthetic data and Kate does a wonderful job of breaking down exactly what synthetic data is, how it works, and what it does. Kate is also really transparent about some challenges and concerns around synthetic data and how businesses can overcome these to really make best use of their data. Kate has a wonderful storytelling ability and is able to really break down these complex concepts. So I hope you enjoy.
Jodi (01:01)
Hi Kate, welcome to the finite podcast.
Kate O’Keeffe (01:05)
Hi, Jodi, excited to be here.
Jodi (01:08)
I’m very excited to have you here. Today we are talking about synthetic data, which is a relatively, I mean, I’ve just started hearing it pop up everywhere. So it’s kind of just entering the ether, I feel like for marketers. So I’m sure listeners will be very interested in exploring this topic with you. But before we do that, I would love it.
if you could give us a bit of an overview, a highlight reel of your background and experience so far in marketing, how you got into marketing and where you are now.
Kate O’Keeffe (01:38)
Sure. Well, say, yeah, I spent my 20s with a consumer brand. I started my own shoe company, Jody. I was a shoe designer. had stores in a couple of cities. Loved that work. But I felt like I wanted to do something more deeply technical. I was a obsessed with technology. And funny thing about making shoes for a living is that it’s not a very technical craft. So I was lucky enough to join a company called Cisco, a deep tech company.
Silicon Valley as their head of innovation. They were looking for an entrepreneur in residence and with my startup background and consumer background, I fit the bill. So I spent 10 years building innovations and companies with them. And what was interesting is that probably really started my journey about how hard it can be to get a view of the customer needs and wants and desires. That’s not just the right answer about what our customers want, but
we need an answer that everybody can agree on especially as our organizations get larger it’s not enough to get the right answer you got to get the right answer that everybody can be comfortable with and so that can be some of the problems we have whether it’s synthetic data or even like a survey response if folks around our business don’t believe it don’t get it our agency doesn’t believe it doesn’t get it we’re not going to get very far with it so this is kind of
where I started to understand this problem really deeply when I was trying to build, you know, products and startups and the like for Cisco. It’s like, I don’t just need the right answer. I need an answer everybody gets and can get behind. And so, yeah, ⁓ from my time at Cisco, I spent three years as a partner in BCG’s digital ventures business, where it was my job to create digital attacker brands for really big brands. And that’s when I felt that pain even more keenly. You know, I would be in a
tiny little startup on the edge of a big bank or a big telco and you know, I had heard something from the customer but the minute it came time to move everybody move the hundreds of people with me on this journey often I found oh but is that really our customer and is that the customer that matters and oh that was you know that survey you did but you know, I don’t know if I believe in the technique or the technology that you use to get that answer and I think what I thought of this morning in the
shower is much more important and so that’s what I want us to do now. And so what I found is especially as organizations get to a certain size, we need the right answer and then we need it to be something everybody can get behind if we’re to get anywhere with it. So that’s where I kind of started to kind of fall in love with this idea of like we need the right answer and we need it to be something that we believe in, but we also need it like right now. Like we need it in this moment. Like while
you and I are arguing, Jody, you know, you’re a spicy character. I’m a spicy character. We’ve both got strong views. You know, the time to get the answer to what the customer really wants is right now in this moment. And that’s when I started to get really intrigued by the power of technology. How can we get synthetic data that’s even better quality than what we might have been able to get if we’d gotten a dozen customers in a room? And how might we use that to move faster and move out?
brands faster in this common climate. ⁓
Jodi (05:01)
Yeah, that’s a really interesting story there. And it’s really poignant how you kind of encountered that in your own experience at Cisco. We’ve had the VP of, ⁓ SCP of marketing at Cisco on the podcast recently, and she only just scratched the surface of all of the kind of momentum that’s going on there and how everything has to kind of move in these huge waves, to, make change. she’s kind of.
leading that and at the top of that wave. So yeah, really interesting. I’m sure it’s a challenge that many of our listeners have dealt with even in kind of smaller startups and scale ups when you’ve got multiple big fish in the small ponds, there can always be those like differences there. So it’s interesting that you see data as a source of truth that decisions can depend on. I’m, yeah, I mean,
Yeah, is, I mean, it’s also quite malleable. You you can pick up some bits from it while other people see other bits. It’s context dependent, you know. Some people have different perspectives even when leaning on the data, I guess. So, I mean, I’d love to hear more about that in a bit and how you…
and have used synthetic data to kind of solve those issues as well. So, I mean, I wanted to acknowledge first of all that you do work with and produce synthetic data. So we’re getting a very kind of set perspective, a very, very valuable one. And I’m so interested to hearing it more, but I just wanted to put it out there that.
You know, you’re a leader in synthetic data and that’s the angle. I would love to hear your definition of synthetic data. You know, what actually is it? Can you be as clear as possible?
Kate O’Keeffe (06:50)
Yeah, so Luke, let’s start from the beginning. Not all synthetic data is created equally. And I want to start by acknowledging those listeners today that have started playing in this space and felt really disappointed by some of the outcomes. Like all nascent technologies.
If the data synthesizes the wrong thing, then we’re going to end up with terrible results. I just want to start there. where I want to start is that so much market research that is done using kind of shaky techniques to start with. know, surveys where panels are chasing gift cards and, you know, interviewees that want to be a good participant. They want to tell us that they like our product so
many parts
many parts of market research that are shaky and there are so many parts of market research that are really rich and incredibly valuable. I for one are a huge fan and still conduct for Heatseeker right now. We conduct incredibly high quality ethnography where we spend a lot of time belly to belly with our customers, with marketers that use our data, following them around in a really weird way. That form of research, I really believe observational, behavioral,
market research is incredibly valuable. However, we all know that there’s a lot of preponderance to buy, surveys and do you like this over that? We all know that that kind of data is disastrous in quality.
as marketers we’re often involved in a dance. So often I hear if I don’t give my CFO a survey that says the likelihood to buy is above X, I won’t get my budget, so I won’t be able to run my campaign, so I won’t sell anything and nothing will work. And so we know it’s a bit shaky, the CFO probably knows it’s a bit shaky, but there’s nothing else available to us as marketers, so we kind of have to roll with it.
feel as a marketer myself, that makes me really sad. mean, CFOs now that I’m picking on them, you know, they have first class tooling, you know, at any given moment, a CFO can tell you down to the decimal point, how many dollars and cents are in the bank and what the discounted cash flow is and what the dollars are going to be next week and what the likelihood of the pipeline is. Like they have all of that clarity, but we as marketers, we often don’t, we don’t have often have it in real time and there’s often data in
that’s really shaky and so that makes me really uncomfortable. So I think before we even start about synthetic data we need to we need to talk about what are those synthetics trained on and so I
I believe in leveraging the highest quality data to train our synthetics and that that is incredibly important part of the process. It’s a rubbish in rubbish out situation when it comes to synthetic. So let’s start with that. Your data should be purely behavioral. I know that’s bit of a shock to those of us who do a lot of insights work or come from an insights background. That was my role at BCG was running like, you know,
all of that strategic design work and research is that we really need to be purely behavioral. So let’s not ask a customer or a user to speculate about what they might do. Let’s present them with stimulus in their natural habitat and see how they respond to it. So that’s what HeatSeeker uses. We use a technique where you give us a growth question. What’s the most important job to be done? What’s the most important human friction that you’re facing?
you know, what’s the buying driver that you would respond to and we craft a panel of ads that we publish in the real world and we see how people respond to those ads. They don’t know that they’re in a study. They click on those ads, they fill out forms, their lead will be processed like any other lead by the brands that we work with. And in this way, we get purely behavioral data. So there’s very few questions that you would put in a survey that we can’t actually
structure into a behavioral study in that way. So that’s the first thing is like let’s move to a purely behavioral way of collecting data. So experiments is one way first party data is another. Let’s get your shopping cart data. Let’s get your loyalty plan data. Let’s get let’s get your customer interviews. You know, it’s funny. We train a lot of our synthetics on customer calls and we all know this.
often a lot of complaints in there. sometimes the synthetic personas that we build based on that data are very spicy and have very strong views about your product and how they’re trained. But that’s because we’re using that truly behavioral data. So we find a blend of purely experiment-based behavioral data, plus your first-party data gives you a pretty full
picture of between of everywhere from awareness through to advocacy on the customer journey we can train using real behavioral data. So let’s start with getting you the best quality data to train your synthetics and then the next step when it comes to creating you know a synthetic persona in our case most of our customers consume their synthetics by talking to them you know talking to their segment for the first time. rather than
than your segment being on a PowerPoint deck that sits in the drive somewhere that I click on occasionally. We turn your performance marketing data, we turn your customer calls, we turn the experiments that we run to you into somebody that you can talk to. It turns out people know how to talk to people, so we don’t have to train them on anything fancy when it comes to their synthetics. Talk to an avatar of your
customer and use that to riff with why didn’t you like my campaign six months ago? You know, it was my favorite. Why don’t you like my idea? And can you be my co-pilot and help me write the copy, design the campaign brief? Can you evaluate this creative I’ve gotten from my agency to tell me how it’s going to respond? And do that as yourself. Do that as the segment that you represent so that I can move faster as a marketer. What we found at Heatseeker is
is
when we originally started our company, we just used to be in the insights business, but it turns out, Jodie, that the truth hurts. When you give people an insight, all you’ve done is make them sad and give them a lot of homework. What we find now is that putting your personas to work, help me craft what you need, help me write a call center script that you will respond to, help explain my outage of the internet for my telco.
words that you will understand and that you won’t find offensive and those are all real world examples of how our customers are using their synthetic personas right now to solve the problems that marketers have.
Jodi (13:54)
That is absolutely fascinating that you’re kind of distilling all of this data into, is it one agent? How many agents or kind of personas would you, a typical client have to represent the kind of broad set of their, of their target audience?
Kate O’Keeffe (14:12)
Yeah, I mean, they usually, they usually, we usually build a persona for every high potential segment that they serve.
And so we have some customers that have many, many. And every time, you know, we have one customer in the consumer space that does a lot of big partnerships with other big brands. And as you know, when you do a big partnership with another big brand, there’s your customers, there’s their customers, but who on earth would be both? And that’s a new customer that we need to understand really fast. We need to find out exactly who it is that is responding to that. And so that’s a really great example of, you know, it takes
just eight to ten days to do all of the experiments that we need to build you a synthetic persona from from scratch and so it’s a great example you got a brand new partnership it’s gonna launch in six weeks or eight weeks or whatever it is and my god you better make sure that the whole campaign launch hits right the first time that the performance marketing is just right that the website’s
just right, the landing pages are just right, and the best person to do that is the persona that represents the exact audience that’s going to respond to that. So it’s a really clever way that people are putting our personas to work.
Jodi (15:31)
Yeah, that is super clever. And I didn’t kind of imagine it as so kind of personal and also almost like ⁓ human like and like you can kind of interact with the audience who are totally representative of your target audience. How does it account for nuance within that audience? is it could it potentially be an oversimplification? You know, how much can we really kind of
Kate O’Keeffe (15:42)
Yeah.
Yeah.
Jodi (15:59)
create broad strokes over a whole kind of population of people and kind of predicts them to be acting in the same ways and thinking the same things in every scenario.
Kate O’Keeffe (16:01)
Yeah.
Yeah, so that’s a fantastic question. So first of all, the personas are supposed to be mathematically representative of the group they’ve been chosen to represent.
So I want them to some degree to lack, you know, some of the nuance because I want you to count on what they’re saying. Like we’ve all listened to a focus group where like noisy Frank in the corners sounded like he was making so much sense. But we didn’t want that qual and all of a sudden, you know, we launched the thing and it’s not quite right. You know, noisy Frank in the focus group didn’t actually represent the many folks in that segment.
we deliberately make sure that everything the persona says is representative of the group. However, there’s a couple of caveats to that. We do listen to customer interviews if you’ve done, you know, anthropological style research, ethnographic research. We do listen for the exact verbatims. We listen to the customer calls and as much as we can we try to use those verbatims in the persona so that you hear the exact words that
use in real life when they’re speaking to you. So that’s one way that we’re really trying to humanize the response and it is really human you know. We survey like one of the
meal delivery companies and you’ll hear about you know feeling anxious about the mom tax and you know it’s 517 and you know Junior’s got sport tonight and you’ll hear about that because that will have shown up in an interview and when we ran the experiment we were able to see that a statistically significant portion of that segment feels that way and that’s the job that they’re doing. However to your point like how do we capture you know some of the
delicate nuance and find the 17 % of folks in the segment that do not represent. So the other piece is just making more personas. Like we have another function in our product where we will run a synthetic panel for you and we will actually train a representative sample of the entire segment. So you can still see, wow, you know, like 59 % of this particular segment feel this way, but 17 %
don’t and those that don’t feel that really deeply and then we can make some decisions about well should I should I run some live experiments and not in the panel and really find out who they are and maybe build another persona around them so I can speak to them you know more deeply so yeah we feel that pretty keenly and so you know there is like the real time you know while you and I are arguing in the meeting you know I just want an answer right this minute and
that is okay that that answer is just statistically relevant of the whole group. And sometimes when we get into the deeper nuance, I want to know is that 100 % of that group or is it 73 % of that group? Am I interested in what that last mile had to say? And then it’s a bigger question. Then we run live experiments. They take about eight minutes to set up. They’re done in like four to six days. And then we get the real people
in the real world and then we can say shit you know we’ve got a group here that we really want to listen to so it’s a really great question and one that we think about really cleanly.
Jodi (19:31)
Yeah, absolutely. can definitely sense that. And yeah, it’s interesting that you look for statistical significance as a first point, but then noticing those kind of big spikes where people feel really strongly. That really explains it. I was going to ask, do you ever kind of…
overlay anything kind of, do you ever manipulate the synthetic data in some kind of way to make the agent say maybe more critical than they would be or, you know, more honest than they would be or anything to kind of make like optimize say those synthetic survey participants to be better survey participants.
Kate O’Keeffe (20:06)
Yeah.
Okay, we start with the quant. So if you think about the training data we have, we start with these live experiments we do in the real world. And the beauty of that…
There’s no one in there that’s worried about being a good participant. There’s no one there who’s worried about hurting the interviewer’s feelings. There’s no one in that mix that’s like, well, if I don’t tell this person what I want, I won’t get my gift card. So we actually remove a lot of that lack of truth telling, you know, because when you’re doom scrolling at midnight and you’re clicking on something, it’s because you want it. It’s not because, you know, I’m trying to get something out of, you know, this study that I’m participating
So like first and foremost, know the personas that we have I mean they’ll break your heart, you know, no break mine I mean, it’s my products and I’m watching them speak to a large group of executives that that about their brand and and the persona will will very articulately explain, you know your brand stinks, you know, I You know, I don’t love it. And this is why I prefer your competitor
I mean You’ll you’ll you’ll realize Jodie why I had to like keep building my product So we would start doing something about that bad news because for a long time we would often just in the bad news business You know because because these Synthetics pull no punches and they just share exactly how they feel about a brand And so yeah, we don’t have to manipulate, know what?
They say you know they they what they say is what mathematically They feel you know and what they respond to you so as I said I mean we do look for quotes from from real customers in that exact segment You know in that scenario Whether they’ve spoken to an interviewer or they’ve called the call center to complain and they’re giving the scenario You know I’ve been waiting for my meal for an hour now. It’s cold and my kid has allergies and what am I supposed to do now?
Like, you know, if we hear that, you know, we will give that to the persona if we know that that’s statistically significant and relevant to the group. that way, you know, when we’re handing a brief over to an agency or to someone, you know, they’re getting some of that inspiration. You know, I think that’s really important, you know, and it might be where we, you know, where we wrap is this idea that that synthetic should reduce your risk, which allows you to take bolder steps, you
When we talking about how this idea came about, it’s not enough to have the right idea. You’ve got to be able to bring everybody with you. The problem with synthetic data is often, can everybody trust it? Can we count on it in the same way we used to be able to count on other forms of research? What we’re able to do now is share, here’s all the live market experiments.
that were supportive of this insight is everything we heard in the performance marketing. We can see a competitor ad that we understand must have performed well because it killed ours that was running at the same time. So we’re able to give all of that data to back up the insight that we have. And then when we have bold ideas, we’ll often say, I mean, of course we can test them synthetically, but we can also test them perhaps unbranded under an incognito brand in the real world.
And that’s what we want to give marketers the permission to do. I really believe AI marketing is not about a race to the bottom. It’s not about a race to the most boring, most sad, creative, that’s because AI doesn’t know how to swing for the fences. With the right testing and the right data, we should be able to take a really bold strategy to finance and to the board and say, look, we tested this last
in the market, you know, we can swing for the fences here. This is something that lights this segment up. This is really important to them. And this sort of edgy creative that we’re using with this edgy celebrity is really who they want to hear from right now. And so we really want to support marketers with not just the right answer, but the evidence that they need and getting it in a timely way too so that they can take everybody with them.
Jodi (24:31)
Absolutely. Thank you so much for that summary. kind of took the words out of my mouth. I was going to ask like about the distinction between, know, using AI to interpret real data versus kind of maximizing data with, with synthetic. So yeah, it definitely seems like it’s all about kind of risk minimization in a way. And almost like the, the more data you have.
the more you can rely on it’s interpretation and what AI interprets from it. yeah, great summary. have so many more questions, but yes, I think you’re right. I think it’s about time to wrap up. I’m sure that the audience has loads of questions as well. Maybe we can get you on a follow-up episode to kind of get more into those and talk about it more. But for now.
Kate O’Keeffe (25:07)
Thank
I would love that.
Jodi (25:21)
So much learning, so much to take away. Thank you so much,
Kate O’Keeffe (25:22)
Yeah. Thank you, Jodie.
Jodi (25:25)
Kate. It’s been an absolute pleasure to have you on the show.
Kate O’Keeffe (25:29)
Thank you.









