126 – Designing a Product for Making Better Data Products with Anthony Deighton

Experiencing Data with Brian O'Neill (Designing for Analytics)
Experiencing Data with Brian T. O'Neill
126 - Designing a Product for Making Better Data Products with Anthony Deighton
Loading
/

Today I’m joined by Anthony Deighton, General Manager of Data Products at Tamr. Throughout our conversation, Anthony unpacks his definition of a data product and we discuss whether or not he feels that Tamr itself is actually a data product. Anthony shares his views on why it’s so critical to focus on solving for customer needs and not simply the newest and shiniest technology. We also discuss the challenges that come with building a product that’s designed to facilitate the creation of better internal data products, as well as where we are in this new wave of data product management, and the evolution of the role.

 

Highlights/ Skip to:

  • I introduce Anthony, General Manager of Data Products at Tamr, and the topics we’ll be discussing today (00:37)
  • Anthony shares his observations on how BI analytics are an inch deep and a mile wide due to the data that’s being input (02:31)
  • Tamr’s focus on data products and how that reflects in Anthony’s recent job change from Chief Product Officer to General Manager of Data Products (04:35)
  • Anthony’s definition of a data product (07:42)
  • Anthony and I explore whether he feels that decision support is necessary for a data product (13:48)
  • Whether or not Anthony feels that Tamr qualifies as a data product (17:08)
  • Anthony speaks to the importance of focusing on outcomes and benefits as opposed to endlessly knitting together features and products (19:42)
  • The challenges Anthony sees with metrics like Propensity to Churn (21:56)
  • How Anthony thinks about design in a product like Tamr (30:43)
  • Anthony shares how data science at Tamr is a tool in his toolkit and not viewed as a “fourth” leg of the product triad/stool (36:01)
  • Anthony’s views on where we are in the evolution of the DPM role (41:25)
  • What Anthony would do differently if he could start over at Tamr knowing what he knows now (43:43)

Quotes from Today’s Episode

  • “We are at the beginning of a really big wave in data and data management and it’s an exciting time to be involved in the space.” Anthony Deighton (01:56)
  • “If I simply deliver the data to you and say, ‘Well, knock yourself out,’ that’s not going to solve your decisioning problem; you need a mechanism of analyzing that data.” Anthony Deighton (14:50)
  • “Data products have value. And that value is you should pay for them. And we could argue—or I think you might agree—that they are actually more valuable than the platform in the sense that they’re more closely connected to the business value, the decision value that you’re going to get out of that data.” Anthony Deighton (18:19)
  • “Data management professionals spend a lot of time knitting together tools and never actually delivering value to the user who needs to use that data to make decisions. So, a huge amount of work—to your point—not a lot of value.” Anthony Deighton (20:45)
  • “Yes, we have product managers, yes, we use a lot of design resources as part of this process, and ultimately, the source of the inspiration needs to be the customer and looking at how they’re using these data products. This raises one of the really important product questions and challenges, which is, it’s really important to listen to the challenges [customers] have, but [remember that] customers are generally not good designers.” Anthony Deighton (31:34)
  • “The world is full of software companies that get excited about their own technology and attempt to sell that technology. [However, it is] not a great way to build and scale a business because the vast majority of buyers could care less about the quality of your machine learning or AI; what they want is the outcome.” Anthony Deighton (37:25)
  • “In the technology and software space, it’s so easy to anchor on the technology, the code and the whatever—it’s always more successful when you anchor on the customer and the pain associated with that customer—you know, where customers complain and get frustrated—and teasing out the root why of that pain.” Anthony Deighton (44:52)

Links

 

Transcript

Brian: Welcome back to Experiencing Data. This is Brian T. O’Neill. Today I have Anthony Deighton on the line from Tamr, just sort of down the street from my house. We’re both in the People’s Republic of Cambridge. I don’t know if you live in Cambridge, but I know Tamr is right down the street, basically in Harvard Square.

 

Anthony: Right in the heart, heart of Harvard Square on Church Street. So, super great place to be.

 

Brian: Yeah. Yeah, that’s great. So, we met at the CDOIQ Symposium that just happened here in Cambridge, I don’t know, June or July, or something li—I think was in July, and you had given a presentation in the data product topic, which was kind of the talks that I was chasing in the time I had to be there, which was not a ton. But I enjoyed that. And one of my hobbies is collecting definitions of data products for fun, and so you had a slide up about that, and I posted that on LinkedIn and it generated some conversations.

 

So, I kind of wanted to dig into your perspective on this. And also your work on Tamr, the product itself. So, thinking about how you blend—especially now that you’re using machine learning, as I understand it in Tamr—how you’re blending, the traditional trio of you know, engineering, product, and design, and maybe now with data science as well. So, that’s kind of broadly the topics today. So, welcome, and yeah, I’m looking forward to jumping in.

 

Anthony: Yeah. Oh, those are all totally fascinating topics. And I think, you know, we are at the beginning of a really big wave in data and data management and it’s an exciting time to be involved in the space. And so—and very much to your point and introduction, it’s the convergence of a lot of really important themes around AI and machine learning, around how people use and consume data, how businesses manage and generate data, and those things coming together. Yeah, it’s a great time. And the conference is a great conference. It’s a really good collection of people that are thinking deeply about the future of data. So, great conference to be at.

 

Brian: Speaking of deep, actually, I want to ask you about shallow. So, at the end of our meeting recall, you said something like, you going an inch deep and a mile wide after BI analytics self-service. And we didn’t have time to get into that, but I just wrote that down. There was something there that I wanted to get into. And I just—do you remember saying that? If you don’t, we could just totally move on, but you were referring to something and I wanted to unpack that a little bit.

 

Anthony: Sure. So, I suspect what I was referring to, in the data and analytics space, the tooling that we’ve delivered for analyst business users to consume and analyze data, like, it’s really never been better. Like, it’s really fantastic. So, if you want to create bar charts and line charts, it is like, trivially easy to do that. In fact, I make light of it, but like, you can actually create forecasts and you can do machine learning models and you can do, you know, sophisticated scatter plots with millions of points where it does aggregate. Like, the level of tooling we’ve delivered to users for that side is tremendous.

 

But the quality of the data that goes into those analyses, that’s the shallow problem. That’s the part where most organizations have not spent a lot of time and energy on it. And so, it’s like we’ve over-invested in that sort of front-end experience. And I’m as much to blame as it relates to that as anyone, having worked for many years at Qlik, in terms of trying to transform the front-end market so that it’s something that anyone and everyone can do sophisticated analysis. And we’ve avoided the elephant in the room, which is the quality of the data is junk and most people don’t believe this enterprise data.

 

And we could talk about this, but I think this, it’s not because people were self-delusional that they avoided this elephant in the room; it’s that it wasn’t technically possible to solve that problem and that prior approaches, which are largely rule-based manual approaches had failed and we needed a new approach. And we can talk about, you know, what that means.

 

Brian: And I just—it’s worth mentioning, at the time when we—when I invited you to come on the show, you had a different job title. So, you were Chief Product Officer at Tamr and you’ve moved into a new role as General Manager of Data Products. So, it sounds like you’re really focusing on this data product space now, correct?

 

Anthony: That’s absolutely right. And the innovation that Tamr has been pushing, it sort of reflects a little bit in the job transition, as you mentioned. So like, the history of Tamr, where it started, is out of academic research at MIT around, how do we build AI to do what’s called entity resolution? So, it’s the term of art is, how do I find common examples of data records across many different sources? And, you know, Tamr started as an academic project called Data Tamr, and you know, over the last bunch of years, has actually generated 16-plus patents on that core AI for doing that—again, term of art—entity resolution.

 

What we’ve been doing at Tamr from a product perspective, and when I was Chief Product Officer—or as I was Chief Product Officer as I started working at Tamr—is taking that core technical innovation and packaging it as a data product. So, you can think of this as the Tamr platform isn’t just doing AI for entity resolution. It’s also doing AI for entity resolution, but layering on with that third-party data enrichment services, data quality services, and a set of user interface for people to provide feedback and curation of the data that the product is creating, and then delivering that data to consumption endpoints. And so, that’s a bigger solution that, again, we call a data product. And now we’re shifting the business to really focusing and selling that data product.

 

And the transition I would describe here is very similar to what we saw in the ’90s in application software. In the early ’90s, or late ’80s even, the way you built user applications is you licensed an Oracle Database, a 5GL programming language, a bunch of tooling, and you built an application that solves your problem. And along comes Siebel and Salesforce and PeopleSoft and a bunch of application software companies, and they said, “That’s stupid. Like, we’re all building the same application over and over again. Why don’t we just build a CRM application”—or as Siebel called it in the beginning, an SFA application—“And we’ll just deliver a packaged solution to that problem.”

 

We’re seeing the same transition in the data management space. Today, largely, the data management space is about delivering a bunch of tooling and leaving it to the customer to knit that tooling together to deliver value to data consumers. Our view—Tamr’s view—is that we can deliver packaged application around common use cases and we can call those data products, and we can apply product thinking to the challenge of data and data consumption. And that’s really the shift.

 

Brian: Just for—to baseline the conversation, I was hoping you could either read or explain your definition of data product. I have a slide up from it. I’m curious if that slide represents your own view. But could you kind of give us your—when you talk about data product, what do you mean when you say that?

 

Anthony: Sure. So, the slide that we used at CDOIQ is still quite valid. A data product is a clean, curated, and continuously updated data that’s consumption-ready. And so, there are a couple of important ideas embedded in that, I think, simple idea. Really, let’s the start from the end and work our way backwards.

 

The focus of data products is on the consumption of the data and not trying to manage and govern sources. The data management space seems obsessed with this idea that we’re going to sort of go and get—you know, like, index all of the sources of data or govern all of the sources of data or trying to manage all of the sources of data. And sometimes that obsession takes the form of we want to put it all in one place. So, like, we’re going to put it all in—God forbid—Teradata, right? Or sometimes that obsession takes the form of, we want to put rules on top of all of the sources or it takes the form of, we want to create a semantic definition of every single source.

 

And all of these are a road to nowhere because it’s not humanly possible to go across hundreds or even thousands, or hundreds of thousands of sources in the enterprise and try to apply this work to it, and especially if you’re trying to do it manually. In contrast, the idea of the data product is to focus on a simple single table of the best information that you have across all of these sources in a consumption-ready endpoint. And so, it could be all of the information you have about your customers or all of the information you have about your suppliers or all of the information you have about your parts or your products or your patients if you’re in the healthcare space or payers. And if you think about your business from the perspective of what data really matters to make decisions, think about that in the context of how do you consume that information, not what sources hold it. So, that’s this consumption idea.

 

And then cleaned, curated, and continuously updated. So, the idea is, how do we get the best version, the most clean, ready-to-go version of that data? How do we let users apply feedback and curation to that data so that they’re updating values, they’re providing feedback on the data, say, “Well, this value is wrong.” And so, how do we go resolve that? And how do we continuously update that meaning, apply the concepts of agile product development—software development—to the data?

 

So, whatever version of the data exists today, make sure tomorrow’s version is, you know, incrementally better and continuously update that. So, a data product is a methodology and a way of thinking about your data that focuses on how you consume that data and applies this concept of agile software development to continuously cleaning and update it and deliver it to those consumption endpoints. That makes sense?

 

Brian: Yes. The first thing that came to my mind when you talked about it—especially—is the word ‘set.’ And I wonder, like, do you see it as, this is data as a product? So, we’re talking about a bundle of information that’s in a great curated form that’s updated that I can then use to do other tasks with it. So, it’s the set? It feels more like data as a product, which isn’t wrong or right. That’s what I’m hearing. Is that correct or is that not how you see it?

 

Anthony: Yes. I like this, your term ‘set.’ I would have probably said, ‘table.’

 

Brian: Okay.

 

Anthony: But the idea is the right one, which is these are the attributes, the columns, these are the records, and these are the values. The best version of those three concepts that we have to use our language about a specific entity. To use more business-oriented language, these are the records, attributes, and values associated with a business topic area that matters to you as a business. So, to make that simpler, this is everything we know about our customers. Across all of the sources, these are the attributes we know about our customers, these are the list of customers, these are the records, and these are the correct values for each of those customers. This is the correct phone number for this customer, the correct address for this customer, the correct company name for this customer if it’s a B2B example.

 

And that set—to use your language—or this table—to use my language—is the output. That is the result of product thinking associated with generating that set. And so, the mechanism here is applying what we would think of as very, very logical software development techniques, but applied to data. And so, this means, you know, continuously updating and not trying to do in one big go and then never touching it again. Getting feedback from users.

 

So, you know, you don’t build a software product, and then not ask your users whether they like it. Users provide lots of feedback, you update it. So, apply product thinking to that. And then to relate it Tamr for a second, if you have this concept of a data product and you want to build it and deliver it to your data consumers, what system do you have for managing that process? So, if we’re building software, we have a system. We call it, you know, Jira and Confluence and Atlassian. There’s a whole set of software. And by the way, it’s not the only one. There are other examples.

 

But the point is, there’s a system for managing that process. You also have a compiler and an IDE and you have a whole set of tooling for building software. What’s the equivalent for data? And in many cases, that doesn’t exist. And so really, the idea behind Tamr at its core is how do we provide a framework for delivering these consumption-ready datasets, continuously cleansed and ready to go? And that’s really what we’re focused on.

 

Brian: There was actually, we had a thread about Tamr this morning in the Data Product Leadership Community, which just launched last week, and I was asking them, I was saying, “Hey, I’m having Anthony come on the show. And is Tamr a data product?” This is just something I threw out itself. Like, they’re purporting to provide data products as the result of what they do, but it’s Tamr itself one? And then this spawned a whole discussion.

 

Because we’re talking a lot about does the DPLC even have its own shared definition of what a data product is? And most of the people that are in there have some connection to my framing and definition of what that is. But like, one of the open debates is whether or not analytics or decision support of some kind is a fundamental thing that must be present in order for something to be a data product. So, I would say in your case, it sounds like analytics or decision support is not necessarily something that needs to be there for it to be a data product, in your definition. Is that accurate?

 

Anthony: Sort of. I don’t think that Tamr needs to provide the analytics front end, but I do think the analytics front end needs to be available. And what I mean by that is, if I simply deliver the data to you and say, “Well, knock yourself out,” that’s not going to solve your decisioning problem; you need a mechanism of analyzing that data. The good news is that software is readily available. There’s a wide range of different choices in an organ—and many organizations make multiple choices.

 

They use Excel, they use Google Sheets, they use Qlik, they use Power BI, they use ThoughtSpot, they use—they may have a machine learning platform they’re using. You know, and there’s a myriad—and in many cases, they’re using multiple of these. And that’s great. And the world doesn’t need more of those. There are plenty of really capable sets of tooling out there for that.

 

Brian: Sure. No debate on that. It’s more whether or not when you talk about data products and your worldview, are we assuming that inherently there must be a decision support capability built into it. And it sounds like it’s not. Like, that’s a downstream thing that someone may enable, but it’s not a requirement to kind of fit—some of this doesn’t matter that much. It’s more of a curiosity for me because there’s a lot of different definitions out there and I find when different people talk about it, they’re picturing different things in their head, different facets.

 

You have the data mesh world, which has very specific test criteria, mostly about the asset to me about the thing itself, the output. You talked about applying product thinking, which is very much in line with what I call—you know, my definition, I refer to it as, “The product-y one,” because it’s actually more about the verb and not the noun. It’s more about the act of applying design and product thinking to something to make it valuable enough someone would pay for it, which most the definitions don’t talk about that. And I’m like, it’s not really a product of someone won’t exchange something of value. “I’ll switch from my old way of doing it because it’s so good for me,” or, “I’ll pay for it.”

 

And that’s just my—you know, we’re here to talk about your version. So, that’s kind of where we were digging in, and we had a nice, long thread [laugh] about this. And, you know, so I was curious about whether you thought Tamr itself is a data product. But it sounds like it probably wouldn’t be is my guess. I don’t know. Unpack that for a minute.

 

Anthony: Yeah, yeah. So, a couple of important ideas. First of all, is Tamr a data product? So, Tamr is a… company and it produces a SaaS application that itself is a platform for delivering data products, but we sell the data product; we don’t sell the platform. And just to draw an analogy, Salesforce is a company that delivers a product called… Salesforce—

 

Brian: A whole bunch of them. Yeah [laugh].

 

Anthony: Let’s just start with a simple one, which is Salesforce—

 

Brian: 75 products [crosstalk 00:17:35] [laugh].

 

Anthony: Right, exactly. In the olden days, like, you know, back when it started. And it’s built on a platform that Salesforce built, called—and they eventually started selling the platform called S-Force or Force or whatever it was. But the important point is that they didn’t sell the platform; they sold the application. But they also needed to build a platform for building and delivering that platform.

 

And that analogy works perfectly for Tamr. We’re a software company. We’ve obviously built a SaaS platform. We use that SaaS platform to build data products. We sell the data products; we don’t sell the platform. Whether we do that eventually, that’s a whole different question, but in the near term, that’s what we’re doing.

 

By the way, the second point you make there, which I think is arguably even more important, is the idea that data products have value. And that value is you should pay for them. And we could argue—or I think you might agree—that they are actually more valuable than the platform in the sense that they’re more closely connected to the business value, the decision value that you’re going to get out of that data. And again, I go back to this thing I started with, which is, the data management space is unique in the enterprise software space in that this community seems to think that what customers want is many, many, many different point tools that they need to knit together to produce a solution. And each of those tools themselves is not that valuable. It’s, you know, it’s a very pointed piece of pointed capability that they’re not willing to pay a lot for.

 

And the real cost comes from trying to knit them together into something that works. And that, to me, is a massive opportunity to do that knitting on behalf of people who care really about the data and making better decisions and delivering that as a product. And by the way, again, to go back to my 1980s, 1990s analogy, we’ve seen this movie before, right? This is not the first time that we’ve seen a software space evolve from a set of disparate tools, which we expected customers to knit together into a packaged application that solves a problem.

 

Brian: I totally hear you on the tool stuff, and frankly, I think it keeps a lot of people employed [laugh] having all these tools to knit together, and, “Look at all the work we’re doing over here. Look at all the stuff we’ve done.” And it’s like, but all of that [insofar 00:19:56] as co—like, what I see is, like, yes, you’ve done a lot work, but that’s just cost so far. You haven’t—there’s no benefit. And so, I think the product orientation helps us think about, we have to deliver benefits and outcomes. And the output is tied to that, but the output is only good if there’s a benefit that gets derived from it.

 

Anthony: And the surest fire way of figuring out if there’s a benefit is are you willing to pay for it?

 

Brian: Exactly.

 

Anthony: And that’s where that creates that linkage back to where you started. And this idea of confusing working hard with delivering value—

 

Brian: Right.

 

Anthony: —is something that we see all over the place. I used to make this joke that business travel is extre—you know, it’s a lot of work: getting to the airport, figuring out how to get through security, get on the plane. Like, there’s a lot of work. In many cases, not that valuable. And so, people who confuse travel—business travel—with work often fall into a very similar trap as we see with data management for professionals that spend a lot of time knitting together tools and never actually delivering value to the user who needs to use that data to make decisions. So, a huge amount of work—to your point—not a lot of value.

 

Brian: I’ll also qualify, to me it’s not just about paying for it. I think paying for it can also be exchanging value where the exchange may be, “I’m willing to give up the old way I do stuff”—

 

Anthony: Sure.

 

Brian: —because change management is such a challenge for analytics and data science teams when they try to get the business to trust a model or trust a new dashboard or to use it, they may not write a check to you, but if you can get that marketing person to stop doing it the old way and to start doing it the new way, that is an exchange of value. They gave something up to use your version because the benefit was higher than. That or swiping [laugh] the credit card or as it may be, if you’re a commercial software company, or whatever, like in your case, either one of those to me, kind of, fits the spirit of this product-y approach to it. So.

 

Anthony: Yeah, couldn’t agree more.

 

[midroll 00:21:56]

 

Brian: Let’s talk about this chain of—there’s such a chain of users and makers involved with this stuff, right? Like in your case, it’s like, I don’t know, maybe there’s a data engineering team gets its hands on the Tamr stuff first and then it gets packaged up into something that data scientists could use to build a model which goes to the application engineers, which goes to the front-end designer, which finally goes into the hands of some user. So, there’s this whole chain of people there. And I’m curious, kind of like, how does that influence how you decide to package up your, we’ll call them tables in your case, but like, let’s take customer data, for example, Propensity to Churn. Well, that was an analytic that you could, I assume Tamr could probably come up with some way to do that with all the data you have and put in put a column called ‘Propensity to Churn’ in the master data set.

 

But that’s actually an analytic about the customer data. So, when you ship and sell your customer data table to the data engineering team and then it goes down the pipeline or whoever, how did you decide whether or not to include a calculated value such as that, as opposed to last name—which is not a calculated value—you put Propensity to Churn in there? And I’m just making that one up, but you get the point. How did you know whose need will I satisfy: that downstream business user, the designer, the front-end developer, what the data scientists need, what the analysts need, what the engineer needed? No, no, no disambiguate. Keep everything raw because we might want to do something on our own and we don’t want your version of Propensity to Churn. We want to build our own, so just give us all the raw customer data. I’m sure you’ve had this discussion before. How did you decide, like, what goes in and what doesn’t?

 

Anthony: That’s a great question. And in way, I have a very specific and simple solution, or answer to the question, which is Tamr wouldn’t produce a calculated value like Propensity to Churn. There’s a challenge associated with calculating that metric, building a model, figuring out, back-testing the model, all kinds of really interesting problems that, you know, again, there’s great tooling for solving. The biggest problem in calculating that metric is getting high-quality, clean data into that model so that it produces high-quality output. Or the inverse is often said, “Garbage in, garbage out,” meaning the problem in many organizations isn’t calculating a Propensity to Churn metric. They have great people to figure that out or you can even buy software for doing it or whatever.

 

The problem is, their data is organized by business unit or organized by geography. It’s chunky, like, it’s missing value, so you go to calculate the Propensity to Churn metric and you’re missing an important dimension of that customer. Or you have five copies of the customer, and so you’ve calculated incorrectly because you’re only looking at, you know, one-fifth of the customer information. Or you’re missing an entire—you’re looking at Propensity to Churn, but you’re only looking at the US data, you’re not gathering data from Europe because well, that’s over in a different database, right? So, the problem is not the calculation of the metric; the problem is that the quality of this data coming in is poor.

 

And I think this challenge is getting even more acute as more organizations use data for making better decisions. And, you know, if you think about something like what we see with large language models, the biggest challenge in generating these models is getting high quality, clean data into the—to train it on. So no, we’re not focused on building a Propensity to Churn metric as part of the data product; we’re solving what we think is the root challenge there, which is, in the example, clean, curated, continuously updated customer information, so that when you calculate that Propensity to Churn metric, it’s actually correct and that you’re actually getting an accurate measure of that customer’s Propensity to Churn in your example and then making the best decision coming off of that. Oh, this customer is likely to churn; I’m going to go take this next best action. So again, we’re very clear on this point; we’re not calculating that metric. There’s great tooling for doing that. We’re making sure that that tooling has access to the best information so that when you calculate that metric, it’s actually correct.

 

Brian: So, I guess the spirit of my question wasn’t so much about that particular calculation; it was more, I guess, when we talk about customer data, I can see a business user having a logical view of what customer data [unintelligible 00:26:34] that does not map to database tables, literally, right?

 

Anthony: Exactly.

 

Brian: How many products’ lifetime have they ever bought? That’s a calculated value, probably, but for them, it’s just, like… it’s like last name, first name, city, number of products, bought lifetime value, they seem like logical customer attributes. And so, how do you decide whether or not your—is your, kind of, line, like, if it involves joining a table, we don’t do it because we only talk to your—at the lowest level, we talk to the tables and we clean the data and then provide a clean version of that and we don’t ever join or do anything? Because there’s different needs along that chain of human beings involved. And that was kind of my curious, as a product person, how do you—

 

Anthony: Right.

 

Brian: Whose need do you want to satisfy? Is it the first line, which is the I don’t know, the DBA or the analyst? I don’t know all the technical side so much. But there’s—the first person that interfaces with Tamr is probably a fairly technical person in the plumbing department at the enterprise. Do you keep it at that level? Is that kind of your—

 

Anthony: Yeah. Maybe a way to think about this is, are you looking at an aggregate or a calculation, or are you looking at the underlying data that would power that calculation? So, our view is, where Tamr ends is the view of the data that can then power some calculation or aggregate thereafter. I want to be a little careful and say, having multiple input tables is a key piece of that requirement. I mentioned this at the CDOIQ Summit, this idea of [Dayton’s Law 00:28:09], this idea that the way data is organized reflects the organization structure of the organization that created it.

 

So, if you’ve organized your business by geography, then you typically have data in tables that are organized by geography. Or if you organized your business by product line, you typically have your data in tables that are organized by product line. And often it’s some combination of multiple dimensions. The important idea here is bringing data together from multiple sources is a key requirement in producing this clean, curated, continuously updated view of that data that can then power these aggregates or calculations, like, you know, how many products do I buy? And also, probably worth noting that there are two concepts of a join. One is joining source tables, okay, and the second is joining across entities or [unintelligible 00:29:04] or looking across key business topic areas that matter.

 

And you mentioned one, which is, what products has this customer bought? Or an aggregate would be, how many did they buy, but first-order questions, ha—what are they, those are really important to us. So, we want to think about creating key relationships between these entities, and that’s a key requirement of data products. We have this concept in the product of a Tamr ID, which is to say, an ID field that creates that linkage between those entities. Many of our customers refer to their customers by their Tamr ID.

 

That becomes a kind of language internally inside the organizations say, “This is customer—this is the ID for that customer.” That ID was generated by Tamr. And importantly, it’s different than—or it’s not the same as the key relationships in the source tables. There’s a relationship there; we can say this Tamr ID is composed of these source table IDs, but it itself is independent. And that’s extremely valuable.

 

And then creating relationships across those entities, between product and customer, between product and part, or between parts and sub-assembly, or between a manufacturing plant and [laugh] product. Like, you can think of many of these relationships. And you could even represent them as a graph and then show the data structure of the organization. Those are really important ideas, but they’re not done at the source table level. And that’s really one of the key differences of the way we think about it, what makes it a data product and not trying to manage source data, which is a road to nowhere.

 

Brian: How do you think about the design of a product like Tamr? And where it where does the experience of Tamr end? Because you’re not an endpoint, right? You’re actually kind of the beginning of something, but you have to end your product at some point.

 

Anthony: Sure.

 

Brian: So, there’s an experience that happens. It’s kind of bridge to another experience. I’m just curious, how did you go—do you have designers on staff? Do your product managers do this work? How do they—especially for technical tooling, I’m just kind of curious—who does the work? How do they do it? How do you see that work happening?

 

Anthony: Yeah, so that’s a great question. And it’s, in a way, an even more challenging question because we’re building and designing something new. So, there’s no sort of prototype to build from that makes sense. So, to answer your question, yes, we have product managers, yes, we use a lot of design resources as part of this process, and ultimately, the source of the inspiration needs to be the customer and looking at how they’re using these data products. But this raises, I think, one of the really important product questions and challenges, which is, it’s really important, obviously, to talk to customers, to listen to the challenges they have, but customers are generally not good designers.

 

Brian: [laugh].

 

Anthony: So, if you ask them to solve the problem, they will produce a solution in the context of what they know and understand, typically of how things have worked in the past.

 

Brian: Right.

 

Anthony: The really hard work of software engineering in general—and product management and design—is to understand the underlying or latent need of the customer and then design a solution that’s better than they imagined. Again, I’m sure you’ve heard people have always—I don’t actually—I’ve heard that Henry Ford didn’t even say this, but anyway, this adage that is attributed to Henry Ford which is probably not actually something he said, which is that, “If I ask my customers what they wanted, they’d say faster horse.” The core truth there, whether he said it or not, is an important one, which is, again, customers are actually typically not good product designers. They typically think of solutions in the context of the current solution. And great innovation comes when you solve a problem in an unexpectedly efficient or effective way.

 

And that’s hard. And that requires testing and prototyping and listening deeply, asking Five Whys, there’s a lot of techniques for how to get there. Shameless plug for a second. One of the best people I’ve ever worked with in regard to this is a gentleman named Donald Farmer that I worked with at Qlik, and he just wrote a book called—on Innovating. You can—if you Google or go to Amazon and look up ‘Donald Farmer, Innovation,’ he has a great book that gets into very practical techniques for doing this work. I highly recommend it.

 

Brian: Yeah. You’re speaking my language here. So, I’m glad because I get tired of saying it; it’s nice when someone else says some of those [laugh]—some of those these things. One of the books I read recently that I really like—and I do some advising for MIT Sandbox venture fund and I often recommend this to the young startup founders there, The Mom Test.

 

And one of the foundational concepts in The Mom Test is this idea that the customer owns the problem; you own the solution. But you’re not there to talk about the solution. And that’s kind of the boundary. You also don’t get to tell them what problem they should care about. They have their problem; it’s urgent, and it’s maybe worth paying for and you need to really lock into that and put aside—I think data people maybe struggle with this, sometimes because they can see how, “This could be so powerful for you.”

 

But, “Yeah, I don’t care. I just want X.” “But there’s a better way to do it.” But that’s a problem that they don’t feel right now. And so, a product person has to be aware of the feelings here and the desirability aspect and all of that. And so, I really liked this idea of the dividing line of: they on the problem, you own the solution.

 

Customers are not designers. I would say, I think co-design can be valuable as long as, if you’re doing co-design with a customer, you look at their solutions that they give you as really a way of figuring out what the problem is. And you can design to figure out what the problem is. And sometimes we do prototyping and we do do a little bit of work just to keep the conversation going to say, “No, no, no, no. That—I don’t care about that.” “Oh, okay, good. So, we just learned something here.”

 

We’re designing to learn, but not, like—we’re never going to throw that away. We’re just going to keep going with whatever direction we set. So, it’s there’s different kinds of design there. And anyhow, so I like that you—how you framed this. And it very much, you know, clicks with how I think modern product is done.

 

Anthony: No, I think that’s a really important idea. And I think this is often done very badly, unfortunately.

 

Brian: [laugh].

 

Anthony: So, I think you ended up seeing a lot of product companies that sort of, you know, I don’t want to say give up, but they sort of like, “Well, let me just let the customer build the solution and then”—and in fairness, like, as a consultancy, like, if your job is to deliver, you know, if you’re a contract development organization, it’s not a—you know, I mean, there’s nothing wrong with that, but it’s not innovation. And so, that’s really about understanding in a deep way, the pain or latent requirements of the customer. Yeah.

 

Brian: Talk to me a little bit about—so you mentioned a little bit about your design and product development process there. Is there anything different about this when you add machine learning or a data science component to this? Like, the traditional—in modern software, we think about the trilogy of product, user experience, and engineering or some tech lead coming together. And it’s like, they all pull on each other a little bit, but somewhere in there is how we arrive at good products because each of the triangle is looking at different components of the product. Is data science just an extension of engineering and the technical part or is it, like, a fourth leg of the stool that used to be a three-legged stool; now it’s four legs. Talk to me at all—I don’t know. How do you do—is data science, like, a significant new leg to the stool at Tamr or in your perspective with data products?

 

Anthony: Yep. Maybe controversially, I don’t think of it as a new leg to the stool. I think it’s just another tool in the toolkit—

 

Brian: Okay.

 

Anthony: And arguably the newest tool, arguably the shiniest, the coolest, the whatever—like, there’s some, you know—but it is not different in that context. If you think about early database design, you know, nobody bought a database because it had a planner [laugh], a particularly efficient query planner. The value they would get, the queries ran faster, [clicks tongue] yes, got it. Like, I’m going to buy. And maybe, I’ll say this in an even snarkier way.

 

The world is full of software companies that get excited about their own technology and attempt to sell that technology. And there are even buyers, like, especially early adopters—and would count myself among them [laugh]—that love technology for technology’s sake. It is not a great way to build and scale a business because most—you know, the vast majority of buyers could care less about the quality of your machine learning or AI; what they want is the outcome. And so, focusing energy on describing the outcome is a much better use of energy and time than attempting to get into the minutiae and explanation as to what the underlying tech is. And machine learning and AI is no different in that regard. So, that’s why it’s not another leg.

 

Brian: But they also don’t talk about the engineering, or while the product management that went into Tamr is so amazing, nobody says that, right? So, I guess my question was specifically more about how you make the sausage. So, I was thinking, you know, inside the company, the way you go about designing Tamr as an enterprise software product, that’s more what my question was, what was whether or not the data science component now—you know, because I know that you guys are leveraging some of that with your entity resolution—is that a significant new skillset that it needs to be part of this inner ring of product stakehold—this product team or not? It’s really kind of a sub-specialty of the engineering area. Just curious.

 

Anthony: Yeah, I think again, I think it’s a new and shiny tool in the toolkit and arguably requires different skills and different engineering talent and these sorts of things, but do customers buy because we have world-class product management and design and machine learning and AI? No. They buy because the product solves the problem they have. Now look, I will also say great software companies are very customer responsive and they’re good at engaging the customer, they take feedback well, and they integrate it quickly into the project, the pace with which we innovate, these are important metrics and measures of whether these processes are efficient and effective. You can very easily go down a wrong road and build bad innovations.

 

And if you ignore your customers, you’re probably not going to understand their problems. Like, there’s lots of ways to screw this up. And so, in that sense, yes, they customers do buy because we do these things… well, but they’re not coming in and evaluating the skill sets inside the organization and then buying the wrong product, just because they like, you know, the product management [laugh] organization. They’re buying because they liked the outcome.

 

Brian: Correct. And so, maybe I’m not articulating my question enough. I actually wanted to peel back the onion, not for a customer standpoint about benefits, but just rather, how do you make product at Tamr, your process and method. And that’s a big giant conversation that’s probably too big to get into right now, but it was more whether or not the discipline of data science in a tool that’s leveraging that in a heavy way, needs to be part of that core. And I’m making an assumption that you—product, engineering, and user experience design are kind of your core trilogy in how you were. I’m just making that assumption—

 

Anthony: Sure.

 

Brian: —that that’s how you work. And is there a fourth leg of that stool for getting what customers think about it really just looking inside the sausage casing here [laugh]?

 

Anthony: No, I don’t think that some—again, I don’t think there’s some new set of skills there. The work we do around, you know, machine learning and AI is fairly classic software engineering skills and fits cleanly into this idea of product management, design and design thinking. It’s a strong point of differentiation in terms of the outcome, but in terms of the process, there’s nothing unique there.

 

Brian: Not changing the process a lot is what I’m hearing? Okay.

 

Anthony: Yeah.

 

Brian: Cool. Understood. I know we’re getting close to time here, but I’m curious about your conversations with data product managers in the customer space. Now, thinking back into your phone calls and your work on the road or whatever, are you interfacing with DPMs a lot? Are you seeing more of them? What are their challenges like? You know, that sort of thing.

 

Part of the reason I formed the DPLC, the community, is because it’s a place for these people to get together and realize, hey, there are other people like me out there. We’re not this rare breed that there—it is fairly rare, but I’m just kind of curious what you’re hearing in this space. Do you kind of feel like you’re in the company of product people when you meet DPMs or do you feel like no, I’m in the company of data people who are trying to apply product stuff to their [laugh] data science and analytics work? Kind of just curious about your thoughts about the DPMs that you might need out in the wild in the customer space.

 

Anthony: Yeah. I think it’s closer to data people trying to apply product thinking today. But that’s not a criticism. I think that’s a—we’re on a journey.

 

Brian: Evolution. Yep.

 

Anthony: An evolution. And we are—you know, if this is a hundred-year journey, we’re on day one, meaning it is early, early days. So, it is not a surprise to me that the community is not huge. It is not a surprise to me that it is not well-optimized and, you know, all those kinks are worked out. It is not a surprise to me that the—even that the tooling isn’t well figured out.

 

This is very consistent with what you see early in the evolution of a new category of work and of—and eventually, hopefully, software and how organizations are going to change the way they work with data. And I mean, personally, I think this is a good thing. It’s exciting. It’s a good time to be building a community as you are and engaging that community, and in a way, self-organizing a solution to these problems and thinking about how to work differently. So, this is all, it’s all a good thing.

 

But we should not be delusional to believe that it’s all figured out. And you know, if you join the community, like, there’s a playbook and we’re just going to hand you the playbook; you read the playbook and you’re off to the—no. But that’s, I think, what makes it fun because it’s like, yes, we get to go build this together.

 

Brian: Yeah, yeah. No, you’re right on there. Last question and then I wanted to kind of give you the final word if you have anything that you’d like to say to the audience. But any lessons learned? Like, if you started over as CPO at Tamr today, is there something you would change or is there something you wish, you know, Anthony from five years ago, knows [laugh] now—or knew back then that you know now? Just any—in your role as a product person in the data space?

 

Anthony: That’s a tough question. You know, at some level, product people—and I would include myself in that regard—are relentlessly self-critical. And I always make this joke: I’m a much better investor backwards, in retrospect.

 

Brian: [laugh].

 

Anthony: But I think it’s true of product management and product design as well. Like, solutions to problems are obvious in retrospect. They’re just way harder to look at going forward. And so, it’s really easy to look back on any product experience at any company, or internally at any work you’ve done, and think to yourself, “Well, geez, if I knew what I know today, it would just happen a lot faster, better, cheaper, all the rest of it.”

 

Brian: Sure.

 

Anthony: So, I’m always wary of that mode of thinking. That said, I think the thing I would anchor on—and I think the in the technology and software space, it’s so easy to anchor on the technology, the code and the whatever—it’s much better, it’s always more successful when you anchor on the customer and the pain associated with that customer—you know, where customers complain and get frustrated—and teasing out the root why of that pain. This technique of Five Whys, of sort of asking why over and over again, which, you know, three-year-olds are so good at, you know, for some reason, we lose that skill as we get older, I think that’s something we look backwards, more of that and less around the actual tech is always a good anchoring point. So, that’s probably my one, my one piece.

 

Brian: Awesome. Any closing thoughts you want to give? And how can people stay in touch with you, as well, if they want to reach out?

 

Anthony: So, I’m on LinkedIn. Easiest way to keep in touch is via LinkedIn. And obviously, I’m reasonably easy to figure out my [email address at tamr.com 00:45:57].

 

And my parting thought—and maybe as a way to capstone all of our conversation—is, I think it’s an exciting time to be a part of the data products space, whether as a customer and user of data products, as a way applying that thinking internally for how your organization manages and uses data, or as a software company as Tamr is and thinking about how we build and manage a platform for delivering these data products to users and customers. This is the beginning of what I think is a pretty big shift in the way the data management space works, and I think that’s an extremely exciting thing to be a part of, and I’m glad for you to be building community around it and gathering people to think and talk about it. But, you know. This is a—you know, if I were early in my career in the data space, this is the place to be, and driving and pushing and innovating in this space is going to produce a lot of really valuable outcomes for people and that’s going to be—that’s an exciting thing to be a part of.

 

Brian: Cool. Well, thanks. I appreciate the closing words. Anthony Deighton, newly minted General Manager of Data Products at Tamr. Thanks for coming on the show. It’s been great to have you.

 

Anthony: Likewise. Thanks for having me.

Array
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Subscribe for Podcast Updates

Join my DFA Insights mailing list to get weekly insights on creating human-centered data products, special offers on my training courses and seminars, and one-page briefs about each new episode of #ExperiencingData.