Today I’m joined by Vishal Singh, Head of Data Products at Starburst and co-author of the newly published e-book, Data Products for Dummies. Throughout our conversation, Vishal explains how the variations in definitions for a data product actually led to the creation of the e-book, and we discuss the differences between our two definitions. Vishal gives a detailed description of how he believes Data Product Managers should be conducting their discovery and gathering feedback from end users, and how his team evaluates whether their data products are truly successful and user-friendly.
- I introduce Vishal, the Head of Data Products at Starburst and contributor of the e-book Data Products for Dummies (00:37)
- Vishal describes how his customers at Starburst all had a common problem, but differing definitions of a data product, which led to the creation of his e-book (01:15)
- Vishal shares his one-sentence definition of a data product (02:50)
- How Vishal’s definition of a data product differs from mine, and we both expand on the possibilities between the two (05:33)
- The tactics Vishal uses to useful feedback to ensure the data products he develops are valuable for end users (07:48)
- Why Vishal finds it difficult to get one on one feedback from users during the iteration phase of data product development (11:07)
- The danger of sunk cost bias in the iteration phase of data product development (13:10)
- Vishal describes how he views the role of a DPM when it comes to doing effective initial discovery (15:27)
- How Vishal structures his teams and their interactions with each other and their end users (21:34)
- Vishal’s thoughts on how design affects both data scientists and end users (24:16)
- How DPMs at Starburst evaluate if the data product design is user-friendly (28:45)
- Vishal’s views on where Designers are valuable in the data product development process (35:00)
- Vishal and I discuss the importance of ensuring your products truly solve your user’s problems (44:44)
- Where you can learn more about Vishal’s upcoming events and the e-book, Data Products for Dummies (49:48)
Quotes from Today’s Episode
- “The way we see Data as a Product is more of a principle, so applying product principles on creating data sets. A data product is a more tangible, reusable element which can be consumed and searched and discovered by data consumers.” — Vishal Singh (03:57)
- “The way I think is [the purpose of] data analytics, whether it's data products or not, is to create insights. Which means that anytime a data is created, it's supposed to be created to give insights and value for someone else. Nobody creates data or data products for themselves, but [rather] so their consumers can get value out of it, use it, and then they can actually create another set of secondary or tertiary data products on top of that.” — Vishal Singh (08:35)
- “Data Product Managers are responsible for collecting and asking all their stakeholders to understand the requirements [for the potential data product]. And the first piece of understanding is whether we should be going on this path of actually creating a data product.” – Vishal Singh (17:32)
- “[By] understanding how many support tickets are coming in, what kind they are, and what kind of feedback is coming in … we can understand whether this data product is solving the right problem, creating more frustration, or creating more support issues [meaning it] is not valuable for the end consumers.” – Vishal Singh (34:39)
- “In my opinion, the designer is the uber umbrella of actually creating [and maintaining] a standard for every data product being used, like a design pattern for the data product. While the data product itself is going to follow the design pattern, which the designer has actually created. And I don't believe the designer has to be involved on every data product being created in the organization.” – Vishal Singh (39:29)
- “What is the value of the data product? Like, if I spend a dollar to create a data product, did I actually get the dollar back? Did I only get fifty cents back, or did I get ten dollars back? Because that is what the business cares about.” – Vishal Singh (47:30)
- “I'm not going to reuse the same thing again and again if the data product is not providing me value. So reusability is really important.” – Vishal Singh (48:23)
- Starburst: https://www.starburst.io/
- Data Products for Dummies: https://www.starburst.io/info/data-products-for-dummies/
- “How to Measure the Impact of Data Products with Doug Hubbard”: https://designingforanalytics.com/resources/episodes/080-how-to-measure-the-impact-of-data-productsand-anything-else-with-forecasting-and-measurement-expert-doug-hubbard/
- Trino Summit: https://www.starburst.io/info/trinosummit2023/
- Galaxy Platform: https://www.starburst.io/platform/starburst-galaxy/
- Datanova Summit: https://www.starburst.io/datanova/
- LinkedIn: https://www.linkedin.com/in/singhsvishal/
- Twitter: https://twitter.com/vishal_singh
Brian: Welcome back to Experiencing Data. This is Brian T. O’Neill. And today I have Vishal Singh on the line from Starburst. How are you doing? What’s going on?
Vishal: Thank you, Brian, for having me on your podcast and your show. I’m really doing great. I’m pretty excited for our conversation during this hour.
Brian: Yeah, yeah. Now, I hope we don’t have to think of our audience as dummies, but you wrote a Dummies book [laugh]. I’m sure most people know what we’re talking about here. There’s the ‘For Dummies,’ which I guess is a Wiley brand and I didn’t realize that until I saw yours. But you have helped contribute to a little ebook called Data Products for Dummies, so that’s why we’re jumping in. Where did the impetus for this book come from? Like, who said, “We need to have an ebook about this?” Like, what’s the origin story?
Vishal: The way I see it, like, the book was written for many—talking to many customers, chatting with many customers. And data products as a concept has existed I probably will say more than a decade, but the concept itself has picked up a lot in quite—in last few years. And if I chat with many folks, especially my customers or my prospects and even folks in the data industry, I come across that the definition of data products is very different. But in the end, if they ask what problem they’re trying to solve, they are trying to solve exactly same problem of collaboration, how to actually different teams can talk to each other, how can I understand the data, where the data is coming from, but then the problem somehow actually mushrooms into different definitions of data products. So, that actually made Starburst to actually encourage myself and Ryo and Andy, who are other two co-authors on the book, to write a book explaining what is our aspect on the data products.
Brian: Yeah, yeah. And I do want to recognize it’s Ryo Komatsuzaki and Andrew Mott were co-authors on this. You mentioned definitions here, and so as one of my lame hobbies, I collect definitions of data products. I also collect surveys about data product failures and machine learning and analytics failures in the enterprise. Those are, like, my hobbies.
I’m kind of let that one go because I’m tired of that one, but the data product definition one is one that I always like, so I—anyhow, guess what my question is? I’m curious if you have a up to a one sentence definition you can give us an idea of a data product. Or a very short one. I’m curious to hear how you frame that, just so listeners have an idea of when you’re talking about it what you mean when you say that.
Vishal: The way I see it is data products is a somewhat curated and reusable data set which actually empowers the data consumers to actually create the insights from the data. But not just empowers, but the data consumers can continuously keep trusting the data, while the data is being iterated and re—consumed and re-versioned again and again. So, how can we expose the business contexts with the trust? And those are the two main pillars of data products if I shortened the [version 00:03:35] of the definition?
Brian: Got it. Got it. Is this not what we sometimes hear as ‘Data as a Product’ where we’re talking about almost like a building block, a large Lego piece that gets assembled into a car at some point, but we’re talking about a building block for an end solution? Is that what you’re saying there, or did I not get that right?
Vishal: No, you’re right. So, when I think about Data as a Product, I think what data is a product more of a principle of product-related principle on the data ecosystem. And you take those principles to create data, a product. So, it’s kind of goes into the—and there has been many [ways 00:04:12]. And people will use those terms quite often to define each other. But the way we see Data as a Product is more of a principle, so applying product principles on creating data sets. Data product is more tangible, reusable element which can be consumed and searched and discovered by data consumers.
Brian: But is the endpoint though still that—let’s assume I can find it, the one that you just made for me, I was able to find it in some catalog—
Brian: —that thing that I just found is still a building block for an end-to-end solution that a consumer or user would use. Is that correct?
Vishal: Absolutely. Absolutely.
Brian: Okay. Okay.
Vishal: In fact, the reusable term is main components of data products because applying the full principles of creating a data product which is not being used by anybody in the organization but myself, this defeats the purpose of, you know, searchability, self-service because if I’m not going to actually put in front of another users who is not going to get benefit out of it or reuse it as a building block, then I don’t—there is a lot of work goes creating in data product, that’s [unintelligible 00:05:18] that’s time really [unintelligible 00:05:20] to the customer that you end up using as a source of datasets to create the final data product which can be reused and consumed across the organization as a building and foundational block.
Brian: Understood. I’m super glad we had that. Like, my definition is a little different. I tend to think of, there has to be an end-to-end solution and there has to be an exchange of value.
Brian: There has to be usage of the thing. That’s totally okay. I’m just, I’m glad that you have a clear definition because it helps us have a conversation here, even if it’s different than mine, which is totally fine. I’m probably in the minority. But.
Vishal: [laugh]. No, I mean, you may be right. The way I see it, like, if you look at data product, even at the foundation element, to your point, like, as, like, where the data is coming from, the source data set, those information is still valuable as a data consumer, but they may not be looking for where the data is coming from, they might be looking for a sales data set or marketing data sets or the AI framework. But then when they look at it, they might understand that there are many other aspects of data sets were used to create the data products. And that is also important to understand when I’m trying to use a data product, but as a self-service component, they probably will never search for the raw data set; they’ll probably search for the curated and the more business context use case of the data product.
Brian: And I think just for listeners, you know, because I hammer on, like… this idea of it has to be end to end and there has to be an exchange of value means someone would be willing to pay to use it or they’d be willing to give something up to use the thing. And in this case, if your customer is say, a data scientist who’s going to build a model and you’re a data product organization with data science customers, then I think you could argue that there is an exchange of value that potentially happen. That data scientist may say, “Yeah, I would, like, give my right leg to be able to use this and not have to go engineer all this stuff myself.” You could make an argument that in this little ecosystem here, that data product, your model of it actually fits into mine well. I tend to think of it as, like, until there’s an end consumer or a business user that has direct economic impact, the value really hasn’t been created yet.
Because in this case, your data scientist still needs to make something of value for the business or the end-users of that company before economic value really would be created. Otherwise, we’re mostly talking about funny money inside the business. But what I care about though is, in the ebook, you’re really talking about—you start to talk about the design of these, the experience of these, some product ideas, and that’s really what I want to get into is how you see that work happening.
So, for example, I think you said under, “Designing Data Products for Value,” you talked about capturing user feedback around user experience. And so, I’m curious, how do you do this today and how do you make sure you’re getting useful information back in order to go and create a data product of value? Like, especially if your consumer is, like, a machine-learning engineer, or something, or a data scientist, they’re not actually the ones that are going to get the benefit of that data scientist’s work. It’s someone else that’s going to get it.
Brian: So, how do you do that research work to know how to build a successful data product in your definition?
Vishal: The way I think is the data analytics, whether it data products or not, is a message bus or to create insights. Which means that anytime a data is created, it’s supposed to be created to give insight and value for someone else. Nobody creates someone data or data products for themselves, but they want to create the products where their consumers can get value out of it, use it, and then they can actually create another set of secondary or tertiary data product on top of that where their consumers can get value out of it. It goes into—if we remove the term as data, just think about just a simple product, as a product manager, when we build something, we go through the priority aspect of who is asking, what needs to be built, what is the demand in my consumers look like? What feature sets and what value they’re trying—what problems they’re trying to solve? What is the scale of the problem they’re trying to solve? Because my list of consumers could be five, or am I trying to solve for a hundred or am I trying to solve for a million?
Because that all, defined, goes into definition and creation of data products, which is the scale which with the data products should operate. And then also, when I’ve created the data product, I had built some hypothesis that I’m expecting this data product to be used by let’s say, five people on day 1 and ten people on day 20. Can I actually understand and track those metrics, and can I actually validate with my earlier hypothesis when we prioritize that data product? So, those things become really important.
And also talking to the customer. When I say talking, in this world, it’s hard to get feedback from one-on-one talking to customer, but actually catching the feedback in one collaborative place where you can document and you can comment, when you can actually say this was helpful, not helpful, and also understanding the full downstream and upstream lineage [to think 00:10:23] how the data was created and how the data was being used. Because going back to the early point, once the data product is created, the value may be captured with the initial hypothesis, but how do I actually maintain and evolve the data products to capture another set of values? So, evolution and also understanding how this has being used over time is also really important when the data products have been shared [between 00:10:50] consumers. So, it’s going back to the very basic product principles, like how are [unintelligible 00:10:55] consumers using it and are they getting value out of it? And am I close to my consumers? And are those consumers giving me the right metrics and am I collecting the right metrics from them or not?
Brian: You mentioned how they’re using it, but let’s step back in front of it, like, before we’ve created something, right, in that discovery piece. Like, you mentioned it’s difficult to get one-on-one feedback. Like—
Brian: I find, like, that’s, like, the lowest bar of research is literally interviewing somebody, getting qualitative information.
Brian: Why do you think that’s difficult? I guess I’m curious.
Vishal: So, it’s not difficult when it comes to creation piece. Is very difficult when it comes to evolution piece. So, what I believe is that, like, for example, in a product world, the numbers and metrics are really important, but they’re also outliers. Having one-on-one feedback can lead to your outlier piece, too, so there are chances that—for example, if I understand something and I understand it really well and if I try to explain that situation to someone else, that person may or may not understand it. It’s possible I am the outlier because I’m on the wrong path.
So, it’s like, who’s the outliers, the biggest component of building any product piece? That comes to data product, which means creating an initial feedback can come from a smaller set of groups, but once it’s been built as an MVP and the right data product, how do I go from version one to version two to version three? And that’s when the bigger set of data sets are important to actually give the right value back to my organization and also to the consumers. And that’s where the one-to-one can lead to driving for a very narrow use cases, and capturing the broader use cases can actually evolved data products into solving a broader use cases for more than one customer at the moment.
Brian: But I mean, doesn’t that just boil down to maybe you haven’t talked to enough people if you’re not sure if you’re reacting to, you know, a one-off situation? You know, one of the—a basic guideline for doing user experience research would be, well, how many people you have to talk to? Well, when you stop hearing the same information, you’ve probably talked to enough people. That’s a rough guideline.
Vishal: Yeah, absolutely.
Brian: And you can learn a lot, even with a handful, 10 people, there’s no exact rule of thumb there, but you can generally learn a lot there. So, what I get concerned about is running too much into the implementation phase early and then saying we’re going to iterate because what most teams don’t do is they—all they do is they add more. No one erases and take stuff away. They just tend to add because sunk cost bias kicks in. And if one person is getting value out of it, now there’s an argument to be said, “Well, we can’t take it away because the sales team data science team is now using that in this propensity model, so we can’t get rid of that stuff.” That’s my only concern with that is [laugh]—
Vishal: No, you’re right, actually. This was like, what—the initial value has to come from that, [due 00:13:49] to your point. Like, the one person value, which is what goes to point, like, outlier, like, a sunk cost can come from if we actually build the whole thing for the outlier. And there is always an outlier because there’s always—everyone has a different need. But the point becomes, like, the collecting the initial sample of the data set, but the initial sample cannot be as big as when the data is actually being used in production.
What I mean by that, like, we have—try to use different tools. Well, instead of just asking customers—because the one thing which I have also seen when we did the most of the requirements of data products come from the business owners and the business directors and the CXOs, like, how it should evolve and what they’re looking for. But the implementation side of how the data product has being used is probably going to be used by data consumers, data scientists, and they are actually going to be using it. So, there are also two sets of requirements which actually we need to—like, how can we sell the data products and go to market side of the data product, and how can also make sure this is usable by the real users and who are collecting it. Those requirements, the second set of the real user requirements does come from the automated metrics which can collect, like, how the queries are running, how they’re clicking the data set, did they add something on the top of the of to evolve the data sets?
And those behavioral data are also important to actually take this data product? Do we need to come up with the version to or does this data product will have an end of life? And we need to rethink into how to evolve in different [sectors 00:15:17]? That is a biggest challenge when it comes to the maintaining and the lifecycle management of data product itself.
Brian: Sure, yeah. I appreciate that clarification. I think one of the things I wanted to really get out of this conversation with you is what I’ll call the zero-to-one phase and not the one-to-two phase. You’re talking about, now, analytics on data products to understand stuff. I think getting to something that’s even usable and useful to somebody at all tends to be a challenge for a lot of teams.
So, like, even in your example, here, like, let me try to phrase this as a really simple question. The business, the sales team, wants to build a propensity model. They want to be able to forecast who should we call next. In our giant database of leads, who should be called next? Who’s close to close, right?
The model will be built by a data science team. The data scientists needs a data set to train the model and do that kind of work. So, who is building the data product? And are they getting their problem space defined by the data scientist or are they getting it defined by the salespeople who are going to use this? Like, how do you handle that to make sure you don’t build something that the data scientist says that they want, but the data scientist doesn’t really know what the problem spa—they don’t—for example, they don’t know what the model explainability interpretability requirements are because they were told, “We want to predict who to call next,” and they’re like, “Excellent, we can totally do that.”
And then when they ship it, the sales team doesn’t use it because they don’t trust it and they weren’t involved with how to make it. Their actual risks, their fears were not discovered at all. So again, when we make a data product, in your definition, whose job is it to figure out the data science requirements, the data scientist requirements, and the sales team’s requirements? Who does that work?
Vishal: I think the product manager. I mean there are different names for a data product manager, data product owner. I’ve also seen the different definition [coming on 00:17:22] data product owner and manager, but I’m going to use both as a single definition for the sake of clarity at this moment.
Brian: Yeah, that’s fine with me, too.
Vishal: Yeah. So, for example, data product managers, they are responsible for collecting and asking to all the stakeholders to understand the requirement. And that is the first piece of understanding of whether we should be going on this path of actually creating a data product at all. And the requirements are collected—let’s say, I’m going to take an example of sales—like, sales and marketing, the two department that are trying to understand, like, how can we get more leads by the Google Analytics data coming in or by the Salesforce data, and analyze and create a data product of which we can understand the projection for Q4 or Q1 or something like that. When the salespeople, they are they are actually thinking about, I want to actually understand the projection and understand that pipeline for Q4 so I can go target those customers and actually get more revenue for my business.
That is a requirement would maybe come from the sales team. When the requirements [that become 00:18:21] the product manager’s responsibility is to actually understand the problem statement completely clearly and also understand, like, where the datasets are, where what they care about, where—which ecosystem they live into. Because the salespeople, if you talk to them, they might live in Salesforce ecosystem. If you talk to marketing, they may live in their very different ecosystem. There are other folks may live in a very different cloud and different ecosystems. So, that also becomes a part of when collecting the requirements, like, who is the persona we are talking to, who will be using this data product, and what use cases?
Once that has been collected, actually the team—the data product manager goes to the data team and actually comes to, like, I have these set of users who wants to use this data sets and want to actually understand analytics on the top of that. Then the data scientists, then the team itself—which is data team, probably have data scientists, data engineers, and whatnot, and also infrastructure engineers—they come up with some basic requirements, also understanding—they explore the data too, kind of from a [base 00:19:22] exploration, comes with the MVP data products, which the product manager should actually go back, continuous validation with the end-user who’s going to use it. Like, are we on the right track or not? Because one of the things people can make mistake is that building something all on the wrong path and then validating in the end and realizing that there is—to your point—sunken cost of building something incorrect while at the same time never had been validated.
So, continuous validation with the end consumers is also need. The last piece is also [unintelligible 00:19:51] need to understand when people think about data product, there’s a lot of focus on the data itself. There’s less focus on the infrastructure which can support the data product. What I mean by that, like, what if my set of users is 5, 10, 15, or 100? Like, how many people are going to—what does the SLA looks like? Are they okay with the data coming early in the morning or do they want the live data set as the data is changing?
So, those also becomes a part of the requirement that persona, which means the infrastructure piece is very important into understanding of what actually is needed when the data is [refreshed 00:20:27]. And the other piece, which I’ll say, like, there may terms actually have resonated right now, like, data contract and support and things like this, but what I think about is that what happens if data product goes down, which is also a part of requirements that data product needs to think about because as a product, every product goes down. At some time, there’ll be a bug in the product, infrastructure is broken, maybe the source dataset where data is coming is stale, things is, we are still in the software [unintelligible 00:20:55] goes down. Which also comes into the requirements of validating the [unintelligible 00:20:59] and consumers that if things goes down, what is the expectation of how the data product should behave and how can we actually restore the trust?
So, those continuous requirements, the way we call it, continuous product planning, happening with the end consumers and working with the data scientists and data team, giving them the real problem, and what is what the expectation to the end-users are and the rest of the technical side of it. As a product manager, I trust my technical team to bringing them close to the end consumer and building the right product for them to consume.
Brian: Got it. So, do you advocate for the data science team and the data engineering team and this infrastructure team to be wholly focused, in this situation, on the sales team? Or are they wholly focused on helping the data scientists get what they need to build the model? Like, who’s their customer and, like—or who’s the one to satisfy the most? Like, which game are they playing? How do you score points? Is it with the data scientist or the sales team?
Vishal: I think the sales team. So, the way I think a data scientist team is the implementation side of it. So, they—the way I see the data team itself contains the infrastructure team, data engineering team and data scientist team. The product manager actually is who’s the one who’s actually communicating with the sales team and the consumer all the time on the continuous basis and bring those requirements and actually filtering and creating the right set of requirements for the data scientists, data engineers, and the infrastructure pieces so that they can actually build the right model which can be consumed by the final, the sales team. So, sales team is the one who’s going to get the value out of data product. The implementation of data scientists is actually being done based on the requirements collected by product manager from what has been told by the sales team.
Brian: But is that exposure, the exposure to the sales team is only through the DPM? There’s no direct exposure between sales and the enablement team.
Vishal: No, not the way I see it. The exposure to the [sales team 00:23:04] way, when we talk to the customer, the product manager has to be involved in every conversation, but when there are technical conversation and also understanding during the implementation of those requirements, then there has to be interaction with data scientists and the sales team because that’s when the data scientists can team can come and work with the product manager and the sales team to figure out the models they’re actually trying to build and explore, and does that resonate with the sales team or not? So, interaction has to be there between the implementation and consumers, but implementation also has to make sure that the data product manager of what kind of iterated requirement that person has actually implemented is also following, or we have to figure out, does it need to be changed? The intent of involving data product manager at every stage because data product managers' work does not end with just creating data product, but also how can we actually position this data product, sell this data product, and market this data product. So, this is why the data product manager needs to be involved in the full lifecycle, and implementation is one part of the lifecycle of a data product is created and consumed.
Brian: You talked about self-service usability as being important as well as, quote, “Good design.” So, when you were talking about that, were you talking about the Data as a Product, the data container which the data scientists is going to use, that’s the thing that needs to be easy to use and self-service usability and have good design or were you talking about the sales team’s interface, whatever that is? It’s, I don’t know what that—that looks like an application, a dashboard, or something, were you talking about that?
Vishal: Both, both. So, here the reason I’m both because they kind of [there are 00:24:48] two sets of data product: a source data product and the consumer data product. So, the data scientists may also be using the data products instead of going back to the raw data set which the different—coming from a different team. What I mean by that is that, like, there might be data scientist team may say that I need to create this model based on two different data sets or two different models. And in [unintelligible 00:25:11] to actually try to figure out those models in any organization, they get blocked on the creation piece of data products.
What I mean by they get blocked, like, they will actually have to figure out where data lives, where the model is. And that’s where the self-service component is actually really important for data scientists because it has to be primary, secondary, and tertiary data products, where the data products when they’re created, can also give value to the end consumer but can also be used to create another product or the top of it as an embedded system. So, data scientists can now often need to really value or get the value of the—from the data set itself in a self-service component. The sales teams should also actually needs to understand the self-service component of the data products because if the sales team [lands 00:25:54] and also start demanding that, let’s say, “I need to understand the Q4 projection or Q1 projection,” but instead of them actually starting and creating a new backlog and new tickets for the product manager and data team to actually go build it, they need to go understand, that data set, does it exist? Who is using this data set? Has this data set already been used in organization? Is this data set already available in some kind of dashboard we can already use?
So, if that information is available to the sales team, the chances of tickets being created in product manager backlog and product manager going back to the sales team, like, via the way of the ticket we have created already exists, I’m going to [get it 00:26:31] to you. So, how can we actually change the friction of even empowering the non-technical users, like a sales team, who will come in to actually understand what is the main data is, what business cases can I use the data sets in which dashboard this data set is being actually used for it, and how can I access those datasets? Or if I don’t have access to the dashboard, who can I actually go ask in the organization so I can actually run my projection for Q4? So, there’s a different component of there with the non-technical user coming in just trying to figure out the data exists or not, but then the other component which is technical users trying to consume this data product to create a whole new set of data products which can be consumed for a different set of consumers.
Brian: Did I hear you right that—it almost sounded like you’re saying the sales team may want to directly access the data product containers, the assets that you’re talking about directly? I guess I assumed those were building blocks, but they were not complete solutions. So, it’s more like here’s a wiki page describing this bundle of data and its SLA and its stuff but, like, to a salesperson that doesn’t give me the answer who to call. That is a foundational piece to it. But it’s not a solution. Or is it? Maybe I misunderstood.
Vishal: So, with Starburst, you can do both. That’s where maybe differential comes [unintelligible 00:27:46] when you create the wiki page documentation of data products [unintelligible 00:27:51] Starburst, not only you can understand the business use case of this data product, complete information, but also some basic ways of, like, if I want, like, basic queries, we can write, events like—also the link to, like, a Tableau dashboard or Power BI dashboard, right? So, what the sales team may be looking for, they may not be running the queries, but they may be actually going to, like, is this data product already being used in some downstream dashboards? Can I actually go access and use and see that this—can—this is useful for my use case?
So, those information can actually be packaged in a single way where not only I can actually see the raw data set—if the user have access to it because [governance layer 00:28:30] is important for it—but also can this data has already been used to create some different dashboards, which can be useful already for me, instead of actually going back. So, both it can be packaged in a Starburst data product as a single entity, which can be consumed by sales team.
Brian: I see. So, around this topic of good design and making things user-friendly and all of this, how do you measure—or how is the DPM—I guess it’s their job—how are they supposed to measure whether the design of the solution for say this fictitious sales team example we’re talking about, how do they measure that it was good or user friendly?
Vishal: [laugh]. That’s a really hard question to measure the user feedback, and this is where the metrics come in. I mean, I’m going to give my answer, which is my own opinion into what is the right approach to do it. The approach is, like, how many—what is the increase in number of traffic of the users we are tracking on the data product? Are they increasing or decreasing? That’s a basic number.
If the same person actually [unintelligible 00:29:29] we’ve seen a drop in the user coming [on 00:29:31] the data product when they actually come and uses it, there’s a chance is are the way that their product is created is incorrect, is now solving the right use cases, or their somehow they actually design on the data product itself is so convoluted or confusing that they’re not getting the value out of it. So, that’s a basic first way to understanding of when the user lands on the data product, how many times they’re coming back.
The second piece, which I also think about is that if the sales team is actually using the data product or whatnot, are they—those are being used in any kind of sales dashboard? Are they using Salesforce or not? Like, how many times does data product is actually being consumed with the [new 00:30:10] use cases on which the data team were actually initially trying to make the use cases. So, those things becomes really important when [understanding 00:30:18] the design of it. Design also starts with first thing which I see, the business context, which means that nobody comes and look for the columns model, you know, the raw value of the data set. They actually look for the—what is the business context?
And second thing, as a user, they actually look for is, like, where else is being this data product has been used. And exposing that information, this is a basic principles is that, like, now I actually get the trust that I am not going to be the first few person going to be using this data product that has been already been used in many different use cases, and [then I am 00:30:54] now going to use and extending into the use case I want to evolve the data product. So, those things matters a lot. But as I said, the first basic piece which I look for is the number of user traffic is growing or increasing, or same user is [unintelligible 00:31:10]—when the user comes back to data product, does that users coming back? Or is it that dropping the user over some time?
Brian: Got it. I guess the only thing I’d wonder about there is that we can make the assumption that zero usage means something is probably really wrong or there’s a discoverability problem. They don’t know it’s there, or they know it’s there and they don’t care. But as soon as you get to some usage, how do you know that usage is goal time and not tool time? And so—and this is Jared Spool’s framing, which is, “I’m here using this data product and I’m actually getting the value out of it,” versus, “I’m here futzing with it trying to get it to do what I need. I’m trying to get the information out of it and I can’t, and I got to come back every single week to do this because the thing won’t email me an alert, which is really what I want. So, I come back every week.”
And if you look at analytics, it doesn’t tell you why. It just tells you that there’s a lot, whatever a lot is. There’s more. But it doesn’t tell you if that’s goal time. It doesn’t tell you why they’re there. So, is there no qualitative piece there to understand whether or not the usage that’s happening is actually goal time, that value is being created as opposed to frustration, it’s more of a tax that someone has to come back? How do you separate that or, I don’t know, do you think about that at all, or?
Vishal: We do. I don’t have the perfect answer at this moment how could we because the—that’s the analytics actually differs from the [unintelligible 00:32:37] different product. Because, you know, any different way you can actually count the [dead 00:32:42] clicks, multiple clicks, frustrated click. The data product is, “Oh, actually how am I getting consumed from a different users, different tools.” The way we have done is that we have actually embedded a way to comment and give the feedback once you’re running, which can be exposed directly to the manager.
So sometimes, actually exposing even the users that what was your experience in the context of that data product itself allows the data product manager or the data team to understand what the user has given feedback. So, that’s where the feedback cycle in the data product becomes really important. And that’s where the written feedback—they can write a comment, they can say this was helpful, even select, “I tried to use this data product over last week’s data set. Did not work.” The few things which we are thinking about actually putting a red/yellow/green light on the data product, saying that this is actually not going to work for the use case that had been solved and the business context which had been adding in data product seems misleading.
And those kinds of feedback can only come to the point by directly understanding the frustration of our users in the plain English, which is they can provide the feedback or they can reach out to the product manager directly saying that this did not satisfy the requirements initially I was going to use the data product for.
Brian: Got it. It sounds like you advocate or rely a lot on self-reported feedback coming from the—
Brian: —end-users of these solutions. The sales team complaints or comments, you’re relying on written or self-generated. Is that correct?
Vishal: Absolutely. In fact, in future this is the plan—which does not exist in the Starburst product at this moment—but the ability to see that how many support tickets were created, who was created support tickets, the ability to actually even attach those to the existing data products. Because every—as a product, a data product will also have problems. How do we actually understand and solving those problems with the right resolution time is also a metric on data products. So, even exposing and understanding that how many support tickets are coming in, what kind of support tickets are coming in, what kind of feedback are coming in, taking those principles and putting on the context of data product, we can understand whether this data product is solving the right problem or is it creating more frustration or is it creating more support issues and is not valuable for the inconvenience.
Brian: In terms of trying to get it right earlier in the process, am I correct in your guidebook that when we talk about design and usability and utility and self-serviceability and all these kinds of things, that the product manager is the primary person that’s doing that work, as opposed to a designer or an engineer or anything like that? Like, I didn’t see designer mentioned, but there’s a lot of focus on these value attributes from the perspective of a non-technical user, which I think is great. But I’m like, how do they do—like, who does that work of designing it to make it usable and ensuring that stuff in your universe of people that are involved? Like, whose role is that to do that?
Vishal: I think the designer in my context when I’m talking to my customers, there’s two piece of design. So, we—the design actually implement around the full experience of that marketplace, or the self-service component or data product, which is not itself related to one data product, but actually related to all the data products across every ecosystem I can see it. That’s actually a different level where the designer has to come in because designer to actually make sure the experience of searching any data products across different domains, different use cases, different categories, should be very easy and useful. That’s where the self-service component comes in. So, the platform—and this is where we talk also about in the sys—in the book—is about not just data product, but data products platform.
The designers are really important to actually create that data product platform and data product ecosystem where all the data products can be easily searched and consumed. The data product managers becomes important [unintelligible 00:36:38] every single entity of the data product in that ecosystem. Which means that they are actually following the design pattern which has already been created by the designer into how this should look, how it should be consumed, and how the flow should work, and they are using that pattern to create the building blocks of different data products in a single place, in a self-service manner, and also coming up—like, what kind of data products be created in order to satisfy the business requirements.
Brian: If I’m hearing that right, you’re saying if you have a designer involved, they should be focused on the platform itself and not particular instances, or what maybe we call them projects—
Brian: Like the sales team propensity model example we keep coming back to.
Brian: That’s not where you would put that. You would put it on some ecosystem thing. But does that mean then you’re saying putting design on the solution that mostly will be used by the data scientists and analysts is where the value is? It’s not put it where the business, non-technical users are on the instances? Is that what you’re saying, or?
Vishal: No. So, [I’m saying 00:37:44] a very different way. So, I’m saying is that the platform design is more important because the sales team—and thinking back to example—sales team may request six different data products, which means that experience of one could be Q4 projection, one could be churn risk, something else, and whatnot. The point become, the experience of any data product or sales team and marketing team should be very consistent so they can actually understand how to, if I actually have learned to, like, you know, how does data product and ecosystem should work, I do not have to relearn again when a new data product coming. And there’s a piece, the experience of understanding the data product, want the data product, too, should do very consistent, just because that will actually drive the users, end-users, to actually find a new set of data set, which they may not be thinking in the self-service component piece.
They actually may be thinking about just Q4 [unintelligible 00:38:37] the churn risk project, too, and they might combine to come up with a whole new analysis which was not there on their mind. So, having and building that platform experience of almost every data products should have the similar user experience actually will enable more and more consumers to go and find the more data products and search and consume data product. But the entity, going back, like, the baseline design when it’s created, let’s say by the user experience with data products that contain sample queries, business context, who is using data sets, and whatnot. Once the whole template of design has been created by the designer team and this is what the data products should look like, as the data team when they are publishing that data product when they are publishing that wiki page, they can choose which one is valuable for this piece and may skip or add based on what the user is looking for. So, in my opinion, the designer are the uber umbrella of actually maintaining not one, but actually creating a standard for every data product being used, like a design pattern for the data product, while the data product itself going to follow the design pattern that the designer have actually created, and I don’t believe that designer has to be involved on every data product being created in the organization.
Brian: Isn’t it possible though to—I mean, this is kind of like this—like, right now in the design world, we talk a lot about design systems which are all about enabling reuse. It’s almost kind of like data products, right? Like, don’t build another wizard component because we already have one of those elsewhere. Use the standard. Use the regular kind when possible.
And what happens is, I think you can actually get to a point where everything is trying to be shoveled into these components and we stop using our heads, we stop using our heads and it’s possible to create a very extensive design system that doesn’t necessarily have any instances that are providing value. The design instances—the actual products and experiences—aren’t cobbled together in a way that any value is created. Now, that’s an extreme statement and I’m not saying that happens regularly, but I guess I’m kind of like, how do you guarantee that—here’s what I think I’m saying. In order to come up with a standardization, you need to know what’s working.
Vishal: Yeah, agreed.
Brian: And the only way to know what’s working is to know that the instances—or the projects or whatever we call them, like the sales team instance—that there was some successful delivery there and that we learned something about how does the sales team want, if at all, to go search for and use these data products? Or do they only want the end solution? I don’t—I mean, even just that question right there. Does the sales team really want to go look at this catalog? Is that actually something that they find value in doing or is that a tax, which is we don’t really want to be in here, but we have to be in here because we don’t know how to get the stuff we need or it doesn’t exist yet? It’s like, even if it’s a tax or a benefit, like, answering that seems to be something you would need to have in place before you’d want to replicate a design over and ov—as a standard, over and over again. I guess that’s where I’m trying to—I can’t get my head around that.
Vishal: No, it’s a very valid question. And I think the question goes back to the—if I rephrase the question and please let me, Brian, if I’m incorrect—if the platform itself is following the wrong design pattern, which means that there’s a chances of creating something which may not give value, and how do we actually bring the designer to actually make sure the data product being created, they’re also getting feedback on the user experience of complete platform where the data product can be consumed? So, when I think like that—I mean this is really, really good question because what happens is that, if some sales team—and I’m going to take an example of two different sales team—may actually look for categories and different tags and different ways to actually find the data product. Or they just care—they may not even care about the data set; they may care about the dashboards where they can be used because they actually just want to look at the visualize the data and able to use the data and to also understand the data. They may not be writing queries.
But then they have same data product which comes with data set, they get a consumer, they might be [interested 00:42:43] in creating a new dashboard and attaching that to the same data product so they can empower the different use cases for the data product. So, the design team—and then this goes back to, like, if I have, like, 20 data products in the organization, that may have, like, five DPMs driving that based on different use cases. Which means that five DPMs do not have to work with the every different designer. The design pattern of the five DPM should be consistent with the design umbrella which has been defined for the ecosystem of delivering data products. And that’s where I believe the DPMs are more responsible for the actual component of the data set and the models on which we deliver, while the designers are actually working with all the five of them because the use case of data product manager one could be very different data product manager two, but when they are landing into self-service component of able to serve the data set, data products, the models, the views flow should be not that different from one to two, and as a designer come actually talking to the many data products, they are not linked to the one team, but they are the overall umbrella of actually creating that piece of usability where it can be consumed by different folks.
The other piece going back to the sales team—to your point—they may not be consuming the data directly, but they may still search the data product where the designer are, like, not within the Starburst platform, for example, but also looking Tableau because Tableau also have the—some kind of way to understand the data set exists and have the business context. And they may also—there might be other, too, Power BI, and Looker or whatnot. How can we also make the design of the ecosystem of Starburst also compatible with design of how the tools of the choice of the sales team, and able to work with those tools will becomes also important. And as a designer coming, too, so, like, how do you make sure the experience of Starburst has also been transmitted with the experience of Tableau which we don’t have a control, but are able to work with that experience to expose the self-service component even with the tools of their choice?
Brian: I totally agree with the idea that, the way I frame this with—when I talked to product people and sometimes design teams is, like, we want to get rid of arbitrary differences between things that are a collection of similar things. So, if there’s ten data products, where is there unnecessary differentiation between them? That we want to remove because it’s noise in the experience. So, having a consistent layout of the wiki page or a consistent way to understand, like, oh, what are all the source datasets that went into this data product or this model that’s an off-the-shelf thing that can be used in a bunch of different charts, you’d have consistency there primarily to just get rid of noise and not have to relearn. But it’s—to me were the thing that I’m always curious about is, like, you can still do all of that consistent-ising work and getting rid of the noise and making patterns, but did the sales team feel empowered to call the right people, is still—to me, like, if that’s the baseball game, that’s the way you score a run.
Brian: There’s no other—everything else is counting stats, like, how many strikes were thrown, how many errors, how many bunts there were. Great, but, like, did a point get scored is ultimately, like, kind of what we want to track to. And I’m always curious how do teams—no matter what they call these data products, how they define them—are they keeping score that way? And if so, how are they interfacing with end-users and doing all that? And that’s—anyhow, there’s not really a question in there; that was more just me thinking out loud and kind of expressing the arc of this conversation, is really that getting the humans-in-the-loop involved in the creation, getting empathy between the makers and the users, to me is critical.
Otherwise, it’s like, you could just build a giant catalog of stuff that doesn’t make any difference. And I feel like that’s still happening a lot where, where’s the value? Where’s the promise? Then it was AI. Where’s the promise? Now, everyone’s past the rush. It’s like the hype is over. When are we going to see something of value here?
And it needs to come down as somet—as simple as, like, the sales team feels like they’re kicking ass and they have proof of that now because they feel empowered to have this thing, this agent, next to them that they never had before. “I have a secret weapon that no one else has. Our close rates are higher, and we feel like the data is doing it.” It’s a very human… and yes, you can quantify all that stuff numerically, but it’s also the human part of the team feeling like, I have this special power in my pocket that I didn’t before. To me that’s, like, what we’re going for. I don’t know if you agree, but, like—[laugh].
Vishal: No, I completely agree. I mean, one thing we did talk in the book about total cost of ownership of the data product. But one thing we did debate a lot, the total cost of ownership just tells you how much cost it costs to the business to own the data or create the data. But what is the value of the data product which is also—which is what you’re touching about. And like, if I spent a dollar to create data product, did I actually get the dollar back? Did I create—[unintelligible 00:47:39] get 50 cents back? Or did I get a $10 back?
Because the—that is what the business cares about it, to your point. If I actually created the most complex and most amazing data product and spend some dollar amount and did not come back any value to the business—which means it’s actually we lost money there. For some time, that piece is more important, and that comes from the consumer. And something which is where when I talk about downstream impact of the data product becomes really important. But there could be point, like, you know, how do we understand downstream is actually also being used, and how—who is using it downstream?
But the question, like, how many times that the data products has been used in downstream? And second, how reusable this data product is? Which means that I’m not going to reuse the same thing again and again if the data product is not providing me value. So, reusability is really important. In the piece which we struggle—and I don’t have a—like, how do we actually capture that value piece? Not the ownership, but the value piece of data product?
Right now, we talk about value piece on number of users using it, number of downstream path, number of usability, but the biggest value piece in the end [unintelligible 00:48:50]. And insight. How—is the company making value, making business out of data product which was created for the company? That dollar amount, I get to be honest, I don’t know how do we capture it, but that is something I believe once we capture it—I mean, once anyone who can gets to capture, that is the real metrics which any business or CXOs care about it.
Brian: Well, if I can put a plug in for a past guest and episode, if you go back to episode 80, it’s called “How to Measure the Impact of Data Products with Doug Hubbard”. So, Doug wrote a great book called How to Measure Anything. It’s very good read and I think it provides a lot of guidelines on how to take these things that feel immeasurable when we’re talking about, like, decision support, like, when you can’t see the decisions and count them necessarily, there are ways to do this. But there’s both art and science that goes into doing that kind of work.
And it’s not easy necessarily, but there are ways to do it. So, check that out. Vishal, it has been so great to talk to you. Thanks for coming on Experiencing Data to talk about this. The book is called Data Products for Dummies: Starburst Special Edition by Vishal Singh, Ryo Komatsuzaki, and Andrew Mott. That’s available on the Starburst we—if you just Google I think, “Starburst Data Products for Dummies,” you’ll find the link to that. You have some stuff coming up, some summits or something right? What’s going on in your world before we wrap up here?
Vishal: Yeah. So, for example, Starburst, we are the company based on the open-source Trino, which was previously known as Presto. So, we have a Trino Summit coming out which is completely open-source, run by Trino Foundation. It’s [unintelligible 00:50:25] in December. So yeah, I love people to check it out, so it’s for Trino Summit 2023.
I will get the link on the Trino website. And feel free to join, feel free to actually check that out. Also check out our Galaxy Platform, which is where if you’re to learn what the data products look like it, you can also actually download or sign up for the Galaxy Platform with a $500 free credit to actually play around what Trino is, as I’m talking about Trino Summit and what—how actually data product with the Trino context, or Starbucks context is created.
Brian: Excellent, excellent.
Vishal: One more summit is coming, which is called Datanova. We actually do this yearly—summit every year. It has been always virtually, and next year, we are finally doing the Datanova Summit in person in New York in April. So, we are very excited about it. We have actually had the great speakers over the last few years. Check it out. The speakers from the [unintelligible 00:51:15] and other folks. We have Apple and other companies talking around those summits, too, so great sessions in the previous summits. We are very excited to take this summit in person in New York City. Next year in April.
Brian: How do you spell that? Just could you spell that ‘Data Noah?’
Vishal: Yeah. D-A-T-A-N-O-V-A. Datanova.
Brian: Oh, nova. Okay. Datanova. Okay. Excellent. And what was the other event called again?
Vishal: Trino Summit.
Brian: Trino Summit. Okay, excellent. Cool. Vishal, where can people get in touch with you? LinkedIn? Twitter? Do you hang out anywhere that people can stay in touch?
Vishal: Um, well, [unintelligible 00:51:48] LinkedIn, Twitter is the best place these days. A lot of we do at Trino meetups in Boston area, so if [unintelligible 00:51:53]—if you actually think about Trino meetups, I’m always there. It’s always happens in our company. So, we have all this in the Boston downtown area when I go to the office.
So, shoot me an email, shoot me a LinkedIn message. I’m always looking forward to meeting new people and learning from where this data product evolution is going. As I said, like, if I chat about data products in a year, I might come up with some other components of definition which we have not yet figured it out.
Brian: [laugh]. Excellent. Well, thanks again for coming on this show. And good to know you’re local here where—I guess for both Boston-based. I forgot about that, so that’s great. Maybe we’ll cross paths in person. So, thank you again. It’s been great having on Experiencing Data.
Vishal: Thank you, Brian. Appreciate it.