Finding it hard to know the value of your data products on the business or your end users? Do you struggle to understand the impact your data science, analytics, or product team is having on the people they serve?
Many times, the challenge comes down to figuring out WHAT to measure, and HOW. Clients, users, and customers often don’t even know what the right success or progress metrics are, let alone how to quantify them. Learning how to measure what might seem impossible is a highly valuable skill for leaders who want to track their progress with data—but it’s not all black and white. It’s not always about “more data,” and measurement is also not about “the finite, right answer.” Analytical minds, get ready to embrace subjectivity and uncertainty in this episode!
In this insightful chat, Doug and I explore examples from his book, How to Measure Anything, and we discuss its applicability to the world of data and data products. From defining trust to identifying cognitive biases in qualitative research, Doug shares how he views the world in ways that we can actually measure. We also discuss the relationship between data and uncertainty, forecasting, and why people who are trying to measure something usually believe they have a lot less data than they really do.
In total, we covered:
- A discussion about measurement, defining “trust”, and why it is important to collect data in a systematic way. (01:35)
- Doug explores “concept, object and methods of measurement” - and why most people have more data than they realize when investigating questions. (09:29)
- Why asking the right questions is more important than “needing to be the expert” - and a look at cognitive biases. (16:46)
- The Dunning-Kruger effect and how it applies to the way people measure outcomes - and Bob discusses progress metrics vs success metrics and the illusion of cognition. (25:13)
- How one of the challenges with machine learning also creates valuable skepticism - and the three criteria for experience to convert into learning. (35:35)
Quotes from Today’s Episode
- “Often things like trustworthiness or collaboration, or innovation, or any—all the squishy stuff, they sound hard to measure because they’re actually an umbrella term that bundles a bunch of different things together, and you have to unpack it to figure out what it is you’re talking about. It’s the beginning of all scientific inquiry is to figure out what your terms mean; what question are you even asking?”- Doug Hubbard (@hdr_frm) (02:33)
- “Another interesting phenomenon about measurement in general and uncertainty is that it’s in the cases where you have a lot of uncertainty when you don’t need many data points to greatly reduce it. [People] might assume that if [they] have a lot of uncertainty about something, that [they are] going to need a lot of data to offset that uncertainty. Mathematically speaking, just the opposite is true. The more uncertainty you have, the bigger uncertainty reduction you get from the first observation. In other words, if you know almost nothing, almost anything will tell you something. That’s the way to think of it.”- Doug Hubbard (@hdr_frm) (07:05)
- “I think one of the big takeaways there that I want my audience to hear is that if we start thinking about when we’re building these solutions, particularly analytics and decision support applications, instead of thinking about it as we’re trying to give the perfect answer here, or the model needs to be as accurate as possible, changing the framing to be, ‘if we went from something like a wild-ass guess, to maybe my experience and my intuition, to some level of data, what we’re doing here is we’re chipping away at the uncertainty, right?’ We’re not trying to go from zero to 100. Zero to 20 may be a substantial improvement if we can just get rid of some of that uncertainty because no solution will ever predict the future perfectly, so let’s just try to reduce some of that uncertainty.”- Brian T. O’Neill (@rhythmspice) (08:40)
- “So, this is really important: [...] you have more data than you think, and you need less than you think. People just throw up their hands far too quickly when it comes to measurement problems. They just say, ‘Well, we don’t have enough data for that.’ Well, did you look? Tell me how much time you spent actually thinking about the problem or did you just give up too soon? [...] Assume there is a way to measure it, and the constraint is that you just haven’t thought of it yet.”- Doug Hubbard (@hdr_frm) (15:37)
- “I think people routinely believe they have a lot less data than they really do. They tend to believe that each situation is more unique than it really is [to the point] that you can’t extrapolate anything from prior observations. If that were really true, your experience means nothing.”- Doug Hubbard (@hdr_frm) (29:42)
- “When you have a lot of uncertainty, that’s exactly when you don’t need a lot of data to reduce it significantly. That’s the general rule of thumb here. [...] If what we’re trying to improve upon is just the subjective judgment of the stakeholders, all the research today—and by the way, here’s another area where there’s tons of data—there’s literally hundreds of studies where naive statistical models are compared to human experts […] and the consistent finding is that even naive statistical models outperform human experts in a surprising variety of fields.”- Doug Hubbard (@hdr_frm) (32:50)
- How to Measure Anything: https://www.amazon.com/gp/product/1118539273/
- Hubbard Decision Research: https://hubbardresearch.com
Brian: Welcome back to Experiencing Data. This is Brian T. O’Neill. Today I have Doug Hubbard on the line. How’s it going, Doug?
Doug: Hey, good. Excellent. Thanks, Brian.
Brian: I was stoked; I think I told you, you were my first ever Audible book. I was driving to a show, I was performing at a show for, like, two weeks, eight shows a week, 45 minutes each way. And I’m like, “I want to, like, read or something.” And so I decided to try it out and your book had been on my list for a while. And it was great to just dig into this.
So, you’re the author of How to Measure Anything, which I think is a fantastic title. And it can be particularly helpful to data product professionals, so I wanted to dig into this whole construct here. You’re also the president of Hubbard Decision Research, so you have this magical way of helping us see the world in ways that we can actually measure it.
And I think this is really important for data product leaders. I mean, it’s important in a lot of different constructs, but when we talk about building decision support applications, analytical tools and things like this, that are supposed to have some kind of impact. How do we measure this impact, especially when it may seem like there’s no way to measure some of these squishy things? So, first thing I wanted to ask you were, can you give me, like, two or three examples of some of the toughest, squishiest things that you’ve been asked to measure or you know that your training students or whatever have been asked to measure? I want to show people the gap between what seems impossible to the possible.
Doug: Drought resilience of villages in Africa. The economic impact of restoring the Kubuqi Desert in Inner Mongolia. The trustworthiness of a person. Let’s say, the economic impact of 11 dams on the Mekong River.
Doug: The value of better pesticides regulation. Let’s see, cybersecurity risk. That seems like a hard one for people.
Brian: Talk to me about the idea of an absolute measure of the trustworthiness of a person versus the, this is what the culture believes—at this organization or in this context, this is what we believe trust is can be measured and defined by and the difference between those two things. Or is there one? Do you disagree with that premise?
Doug: Well see, that’s the first thing, is we assume when we start out that different people might be using the word completely differently. So first, we have to figure out what they mean when they use the term. What do they mean by this? So, I ask people questions like, “What do you see when you see more of it? You must have seen it vary; are there people that you trust more than other people?” They go, “Oh, yes. There’s people I trust more than other people.” “Why? What did you see that caused you to trust them more?”
“Well, they always acted in a trustworthy, reliable way in the past, and other people did not.” “So, you’re saying past behavior correlates with future behavior? When you say that someone is trustworthy, that means you can put a higher probability on them following through on some assigned task or promise or something like this?” And you might have to get specific about that promise: are you talking about them being trustworthy to help you move, or are you talking about them being trustworthy to repay a loan? That might be different.
Those are specific meanings of the term. Often things like trustworthiness or collaboration, or innovation, or any—all the squishy stuff, they sound hard to measure because they’re actually an umbrella term that bundles a bunch of different things together, and you have to unpack it to figure out what it is you’re talking about. It’s the beginning of all scientific inquiry is to figure out what your terms mean; what question are you even asking?
Brian: It sounds like what you’re saying is, it is very important to understand that we’re not always talking about absolute measurements here, we’re talking about within the scope of what is often a sta—in the context of my audience—a stakeholder, a business sponsor, someone who is investing in some data-driven solution, or product, or application, it’s their measurement of it that is the reality as opposed to an absolute definition of what trust is. So, in other words, this trust measurement you came up with, or whoever came up with it, may not be the same at Beta Corp as it is in Theta Corp. Is that correct?
Doug: Yes, exactly. And that’s simply because they’re using it to forecast different things and they’re basing it on different types of observations. Once you figure out how you observe something, you’re halfway to measuring it; the rest is trivial math, right? So, you just have to figure out what’s your data collection procedure and how you do the analytics with it after that. I mean, literally, anything.
Do some married couples love each other more than other married couples? “Well, do you think you’ve ever seen it vary? Do you honestly believe that it’s all the same? Or do you have reason to believe it varies?” Well, maybe you would say you do have reason to believe it varies.
“Great, what did you see that caused you to believe that some couples love each other more than other couples?” And here’s the next question is the, “Why do you care?” question because that actually helps frame the measurement problem, too. “What decision do you think you’re making better if you had less uncertainty on this particular problem?” So, that’s a big part of it.
If you’re thinking about, “Well, the reason I want to estimate if they’re going to stay together over a long period of time is because I’m a banker; I’m giving them a loan for a family business, and if they divorced at some point in the future, it could cause a lot of problems.” “Great. So, you’re forecasting the chance of non-repayment based on data you knew about their relationship. Okay, if that’s what you meant, that’s what you’re using it for.” Now, of course, when we end up defining these things, people might end up saying, “Well, that’s not what I meant by love in the first place.” “Okay. What else did you mean?”
It’s almost inevitably every time people define their terms more specifically, it ends up being that they actually end up with something a little bit different than what they start out with. And that’s okay. Because what they started out with was just ambiguity. And even feelings can be measured, though.
Brian: Completely. And this is in the—I come from the design background, so I’m dealing with this stuff all the time. And I’m curious, when you’re doing this inquiry to help someone themselves uncover what these definitions of something like trust might be, is there a concern that this person may feel like, “Well, I’m giving you these anecdotes—‘at the party, the couple seemed unhappy.’ You know, ‘the last couple times we went to dinner, they seemed unhappy.’ But that’s just, like, my own experience; that’s just like a few anecdotes.”
I could see a stakeholder feeling like yeah, but that can’t be just that. Like, that’s not deep enough. Talk to me about how you take that further, if they’re concerned that’s a fluff measurement, or it’s not deep enough, or it’s too circumstantial, it’s based on recency bias, based on what I can remember most recently, how do you challenge that?
Doug: Right. Well, first off, that’s kind of the stuff that we’re fighting against, when we’re measuring some things. We don’t want to rely on things like recency bias because we want you to collect the data in a systematic way. First, you got to figure out what observations you’re even collecting, though—that’s the first step is, how are we defining this thing—but you’re absolutely right; we don’t want to depend on somebody’s memory of these things. That was what they were doing before.
But once you figure out how you observe something, then you can think about what would be a procedure that would allow me to collect this in some systematic way? Now, I think sometimes people get hung up on the idea that, well, how am I going to get all the data of this couple’s interactions, right? You don’t get all the data; you get a sample of it. All statistical inference, virtually all empirical science is based on the idea of sampling part of a larger population. I always say that when physicists measured the speed of light, they didn’t actually measure all the photons, right? [laugh]. They’re taking a tiny sample of them.
And another interesting phenomenon about measurement in general and uncertainty, is that it’s in the cases where you have a lot of uncertainty when you don’t need many data points to greatly reduce it. I think sometimes people get hung up on that, too. They might assume that if you have a lot of uncertainty about something, that I’m going to need a lot of data to offset that uncertainty. Mathematically speaking, just the opposite is true. The more uncertainty you have, the bigger uncertainty reduction you get from the first observation.
In other words, if, you know, almost nothing, almost anything will tell you something. That’s the way to think of it.
Brian: Yeah, when I was listening to your book, as I stated before, I think one of the big takeaways there that I want my audience to hear is that if we start thinking about when we’re building these solutions, particularly analytics and decision support applications, instead of thinking about it as we’re trying to give the perfect answer here, or the model needs to be as accurate as possible, changing the framing to be if we went from something like a wild-ass guess, to maybe my experience and my intuition, to some level of data, what we’re doing here is we’re chipping away at the uncertainty, right? We’re not trying to go from zero to 100, zero to 20 may be a substantial improvement, if we can just get rid of some of that uncertainty because no solution will ever predict the future perfectly, so let’s just try to reduce some of that uncertainty. Did I get that right?
Doug: In the book, you might recall that I described three illusions that give us the idea that some things are immeasurable. I call them concept, object, and method: concept has to do with the definition of measurement; object has to do with the definition of the thing we’re measuring, we’ve kind of been talking about that; and then finally, the methods of measurement. How does statistical inference actually work when you don’t get to see the whole population, which is often the case. Under concept of measurement, we point out that measurement doesn’t mean an exact point. It hasn’t meant that for the better part of 100 years in the empirical sciences.
Ever since the 1920s when we started adopting more probabilistic methods and clinical trials and controlled experiments and so forth—and sampling methods—ever since then, we’ve been thinking about measurement as a quantitatively expressed reduction in uncertainty, based on observation. Now, we also know from game theory and decision theory that we can compute the economic value of the information. Even marginal reductions in uncertainty can be worth all sorts of value, depending on the context of it. What’s it worth for a bank to make 2% fewer loan-granting errors? You know, if they had 2% less errors in terms of rejecting what would have been a good loan or accepting what turns out to be a bad loan, what’s that worth to a bank? Well, it’s… millions a year easily, right?
I once made a model for forecasting how much money new movies are going to make, and compared to forecasting models, I don’t think it was all that good. It had a correlation of point three between estimates and actuals. But their previous correlation was zero. In other words, you and I, picking the industry average every time would have done just as well as their experts in terms of forecasting outcomes. It says if their experts literally knew nothing about the thing that they’re supposed to be experts at—and by the way, this is not uncommon; this happens in many fields, where the experts literally don’t—by any measure, don’t appear to have any advantage over naive beginners. That difference between zero and point three for this group of investors for movies, well, it’s been worth millions of dollars since they’ve been using these, simply because they have less error, less chance of choosing a flop or rejecting what would be a, you know, blockbuster.
Brian: Can you tell me a little bit about going back to the trust example, I think this would be a good—or you could use the movie one too—unless—we don’t need to go into the technical part about how you did the measurement because as you said, at that point, it becomes—it’s math, it’s data, it’s all of that, and I think my audience will know how to do that part. The previous steps is the part that I think is the kicker. How did you get to a measurement of the trust or a measurement of the—what was it with the films? It was perceived the economic—the box office receipts—talk to me about the early conversation there and what the decisions were about? What are we going to measure? What would be the things that would indicate a change? What can we see in the wild?
Doug: Well, first, my team and I, we’re always the outsiders in every industry. I mean—
Brian: Mm-hm. Yeah.
Doug: —we do consulting in aerospace, in biotech and pharma, and, you know, environmental policy, and we don’t know any of that. I can tell you a lot about potash mining in Canada now, or radar R&D in aerospace; I can tell you more about that than I knew before, but we’re always the outsiders so we end up asking questions, “Well, what things do you consider now? Why do you think that some things are higher than others, or lower than others?” Generally, they give us some indication.
So, there’s some rationale behind their current decision-making. That’s almost always true. There’s some rationale behind why they think some R&D projects are going to do better than other R&D projects, or some movies will make more money than other movies. But they’ve been doing all that math kind of in their heads, and adds a bunch of noise and bias and error of all sorts. So, we find some things that gives us the initial clue they start us in the right direction.
There’s a few other things we tend to do; we start out with the assumption that no matter what we’re measuring, it’s probably been measured before. This is odd—I’d been having this conversation even just today—how rare it is for an analyst to consider the fact that there was prior work in a given field and there’s a bunch of articles on exactly that already, right? So, we didn’t start with the idea that we were the first people ever to try to forecast how much money a new movie would make, or what the trust of an individual is. And lo and behold, when we looked, there’s tons. People have been writing—almost anything you can think of, somebody has written a PhD dissertation on, or maybe there’s a whole journal dedicated to that for all you know.
Especially if it’s outside of your field, you don’t know if there’s a whole journal dedicated to that issue. Now, sometimes it ends up being pretty fluffy, but you’ll find out when you look, and when you do look and you find some gold nugget, you just look like a person who does their homework; you look prepared. A client appreciates that. They really appreciate it when you find articles from their own field that they hadn’t come across before. Because you’re the first person to just look, right?
So, I think that’s such a valuable thing. You know, when I started doing manual consulting in the ’80s, I’d have to go to the library to look for this stuff. And it’s so much easier to do this now and still nobody does it. Almost nobody looks for academic research on almost anything they want to study. Are there previous measures of drought resilience? Are there previous measures of whether or not you should prioritize roads or schools or hospitals in Haiti? That was the second United Nations project I did.
In each of those cases now, we might find some previous studies and we may say, “Well, they’re flawed in some way,” or, “It doesn’t directly address the thing that I’m looking for,” but they almost always give us some ideas about the methods to employ. And sometimes they have data that’s directly relevant to that. When we were being asked to prioritize inner city school systems for additional assistance, these not-for-profit programs that provide assistance to inner city schools—some very large ones, actually—and they were wondering how do we prioritize schools for our assistance? Where should we send more resources to these inner city schools?
Well, some of that had been measured, actually, and parts of the question we were investigating were previously measured, so we got to use some of that data. So, this is really important. It’s been measured before; you have more data than you think, and you need less than you think. People just throw up their hands far too quickly when it comes to measurement problems. They just say, “Well, we don’t have enough data for that,” like, “Well, did you look? Tell me how much time you spent actually thinking about the problem or did you just give up too soon?”
There’s a line from Malcolm Gladwell’s book Outliers, that I tend to use a lot—and I’m paraphrasing a little bit, but he says, “A successful person is someone who’s willing to spend 20 minutes figuring out something most people give up on immediately.” Oh, yeah, that’s right. Exactly. That’s measurement problems, too. People just have to be willing to spend a little bit more time.
Assume there is a way to measure it, and the constraint is you just hadn’t thought of it yet. As opposed to starting right away. “Look, if I didn’t think of it right away, it must be impossible because I would have thought of it right away if it were possible, right?” Yeah, no, we should start with the idea that it’s not apparent to me right now, but I’m going to assume there is one, a method for measuring this, and we always find it.
Brian: I would imagine that the value—well, one of the things—corroborate this for me or disagree—the act of asking the right questions as the consultant or someone that’s in the services where we’re there to serve and assist people, which I think is true, even for W-2 employees, or wherever you’re classified working inside, if you’re in the services thing, this is the model you should have, that asking the right questions is probably more important than coming in with the answer. So, your ability to come in and speak film with the film people and use the measurements that you found in the research, like, “Have you looked at the leading actors’ names? Like, how many letters in leading actors’ names?” Or I don’t know, “Where was the shooting taking place?” Or, “What color are the background colors of the movie posters?”
Or—you know, and it’s not that you decided that those were important, but you knew to ask those questions because you did some homework on it, and it’s the act of asking the questions, letting them tell you, which of those might be interesting signals to look at. Is that more what it is that needing to be the expert on the specifics of that domain? Would you agree that the questioning is almost the more important part?
Doug: Oh, yeah. It’s the beginning. Absolutely, it’s the first link in the chain. Asking the wrong questions will—doesn’t matter how sophisticated the rest of it is, it’s going to take you nowhere. So, that’s the insight that I find it’s harder to convey to people because they need some insight, they need to engage their higher cognitive skills there.
I think most people kind of catch on to it after a while, but everything else we train is a lot more straightforward procedure. Like, we show people, here’s the spreadsheet templates, put your data here and get an answer there, that kind of stuff. It’s really cookie cutter. But how to frame the problem to begin with. That’s the bigger picture, you know, more frontal lobe type challenge, I think.
I mean, that’s the first big epiphany really, is to figure out that maybe the question isn’t even what I first thought it was, right? You should be willing to accept that sort of thing. Also—I had this conversation earlier today as well—but you’ve heard people, you know, especially in the customer service area, and so forth, you know, people in design, and so forth, they often come up with scores and scales to measure something about interaction, customer satisfaction, that kind of stuff, Net Promoter Scores and things like this, I think a lot of that is an unnecessary shortcut. It gives the illusion that you’re measuring something. Now, there’s a lot of cases where you can take that data and it actually correlates to observable outcomes in a way that’s pretty useful
We took these CVSS scores from the National Vulnerability Database. These are scores about vulnerabilities that have been discovered in various operating systems and hardware and so forth, so there’s this giant database of these—and it’s a score of one to ten—and we did all this analysis and we found that there is a relationship between the score they have and the chance that a vulnerability would be exploited in a given environment. So, it’s not completely uninformative, but if you dig a little bit deeper, you find out that actually you didn’t need this score in the first place. Like, you have a credit score, right? Your credit score is based on a lot of data, and all of that data is boiled down to a three digit number.
Then bank uses that three digit number, if they’re—more sophisticated banks do more than this, but in principle, they’re supposed to take this three digit number and based on that alone, they’re going to try to forecast your chance of non-repayment. Or really the future value of the loan because even chance non-repayment is too imprecise; you could start just being late a little bit, or you could—maybe you make everything on time halfway through, and then all of a sudden, you can’t pay anymore. Or maybe you never pay a single installment. Those are all very different from future value alone point of view—and they’re trying to forecast future value loan—all of those possibilities are there. Well, when you take all that data that you’ve got about your history and boil it down to a three digit number, you’re kind of losing a lot, right?
And the same thing is true if you’ve heard somebody used not just Net Promoter Score, but maybe Innovation Index, or Risk Score, or things like this. My impression is more and more lately, I’m just starting to say that those are just adding more muck to the whole question. They’re not really helping you figure out what you’re measuring.
Brian: NPS is not particularly well liked in the user experience field. A lot of what I—we talk about measuring design efficacy, I’m much more of a fan of getting my students and participants and clients to go out and do qualitative research because we need to understand why the score is bad because we can’t make any change—we don’t know what to change about—we don’t know what to redesign to change the experience to make it better when we just look at the data. And we’re also only sampling that people that actually answered the survey. [laugh]. So, we want to talk to people who aren’t happy and, you know, that didn’t answer.
Doug: Yeah, there’s some math for that though. So like, for example, response rates tend to be different when you reach out to people directly. And if you don’t see a difference in response patterns when you reach out directly versus, say, emails or online surveys, that would usually build your confidence that your survey is representative of a larger population. If there’s a big difference between the two, then you might want to start thinking about, well, maybe I should sample differently. Or you can actually use that information and say, well, I know that there’s a systemic error because of our response rates, but I know the direction of the error, I know that people who are more likely to use our system, or dislike our system, or like our system are the ones more likely to respond. That’s useful information by itself. Just putting a lower bound or an upper bound on a range is very useful.
Brian: Talk to me about setting those ranges. And particularly I like the example in the book where we anchor against the bias that happens when we throw it that first number and we’re anchored to that. So, you have a counter to that to prevent us from falling in love or getting affected by that first number. Talk to me a little bit about quantifying some of these things. How do we do that?
Doug: Yeah. So, we know that there’s a number of cognitive biases in people’s responses just in general, when they asked me things. Anchoring is one. So, it’s kind of interesting, but there’s a lot of times when people might have to sit down and estimate a list of things. Like, maybe they’re in software development and they’re trying to estimate, you know, the effort in a series of tasks.
And then I ask them, “Do you think if I put those tasks in a different order, that it should have any bearing on your estimates?” They go, “No, it shouldn’t.” But they realize as soon as I ask the question that they go, “Oh, but it probably does.” And it’s true. The arbitrary order of those estimates actually change your estimates.
That’s an effect of anchoring, right? Because your answer to one question actually has a small correlation to your next question, regardless of the random order, right? If the first few questions at the top are the big task, then the rest of the tasks will tend to err on the larger side; if the first few tasks were very small ones, then the rest of them tend to be estimated to be smaller. So, if you want to manipulate somebody, that would be one way to do it. So, you have to be responsible with all these new powers I’m giving you, but that’s one thing you could do if you wanted to manipulate some of these responses.
I think other biases are obviously things like confirmation bias. People tend to seek out information that supports their previous beliefs. Even analysts who believe they’re being objective tend to do that. I assume I do it. By the way, you should assume you’re susceptible to all these biases because if you believe you’re impervious to them, then you have no defense against them.
You have to at least accept the possibility that you’re affected by them, so then you can take deliberate steps to guard against it, you can seek disconfirming data because you’re concerned about confirmation bias, et cetera. Also, here’s another bi—it’s not a bias, it’s an error. Bias is directional, right? Error can be just random.
So, inconsistency is a big part of expert judgment. They’re highly inconsistent. They just give different answers every time you ask them. And you can measure that. We’ve actually measured inconsistency for things like people who assess cybersecurity risk.
If you get a bunch of cybersecurity experts, and they’re assessing the probability that some event would occur in cybersecurity, and then you look at the variation of their responses, about 21% of the variation in their responses is just personal inconsistency, because we can measure that in their responses. So, there’s a lot of interesting biases like this. People even reengineer their own memory. They used to believe something that they don’t actually believe.
Brian: Give me an example of that.
Doug: All right, so you’ve heard somebody say, “I knew it all along.” They didn’t really. If somebody says, “Oh, this stock went down. I knew that all along.” They may honestly believe that they used to believe that, but studies have been done where people have been nailed down on original forecasts, like, that’s documented and then that’s kept from them.
And then later on after the events occurred or not, psychologists would ask them, “Well, did you anticipate this? Did you know that was going to happen?” And people say, “Oh, I knew that was going to happen. I knew that was going to happen.” Then they show their original responses that were documented; they had long forgotten those and they go, “No, they didn’t know that.”
It wasn’t always obvious; they forget in hindsight how little they knew in foresight. That’s the way Daniel Kahneman, the psychologist who won the Nobel Prize in economics, says it. Yeah, that happens quite a lot. I think we tend to remember—well, this is true, this is based on research—but we tend to remember when we’re right, and we tend to remember when other people are wrong. So, we all remember being above average forecasters.
It’s a kind of a Dunning-Kruger effect, if your listeners know that one. But the Dunning-Kruger effect is the more incompetent people are the ones that more overstate their skills and so it’s this weird paradox there.
Doug: But most people believe that they’re above average at forecasting, that they’re better than—well, obviously, that can’t be true, right? They can’t all be above average at forecasting. I had this hypothesis I always wanted to test. I think the events you’re most likely to remember are when you made a forecast that turned out to be right and it contradicted all your peers. You can bring that up and say, “Well, remember that time you all thought X and I said this, and I turned out to be right?” I think you’ll tell that story at Christmas parties for years. You’ll always remember that one time. And that may be the basis of somebody’s entire self-image are a couple of those kinds of things. Traders or so f—people like that who made a couple of good calls, they keep wearing that hat forever.
Brian: Well, it’s a status symbol and status is powerful. I mean, it’s a powerful thing—
Doug: Right. Right.
Brian: You know?
Doug: Well, in a sense that it’s deliberate. If somebody is deliberately using that to improve their status, that could be entirely rational. Someone could say, well, if I do X, Y, and Z that’ll improve my status with these people, and I would forecast these desirable outcomes in their behaviors. I’m not judging that. I’m talking primarily about the situations where people honestly believe that they always knew it all along, and they—or that they’re above average forecasters, and they’re really not. They just remember when they’re right more often than when they remember when they’re wrong.
Brian: How do you deal with the fact when the thing that we need to measure to understand if quote, “We were right when we started out the project,” or the thing that we’re making, and we’re going to measure it later that the effort required could be significant, possibly because it’s a multivariate problem; it’s hard to know that this little particular thing we did had an overall impact on that final outcome. It could be perceived as being expensive, or it could just take a long time. So, a great example, like brand improvement, or like did the change of the interface design make a difference in the overall, you know, sales, the speed of which we closed deals on the new version of the product? Could you tease that out? Yeah, in theory, like, you probably could tease it out, but at the same time, we believe something about brand matters. That’s not even my particular area, but I’m using it because it feels like a very squishy thing that can’t be measured. Talk to me about this long period between the measurement and where we’re at in the creation of our thing.
Doug: Yeah, sure. Well, those happen in a lot of situations, there’s certainly long-term engineering projects or the development of pharmaceuticals or so forth, that go on for years, right? But that’s not the only forecast you get to check. You don’t have just the one big estimate for what are my sales for this new drug compound in 2040 going to be? No, you get data before that.
You can get observations on how long this particular phase of the R&D process is going to take and how much that cost, and the growth or change in patient populations, or the growth in sales of other products that are related to this. You get all of these little estimates, almost real time, many of them where you get lots of immediate feedback. You can take advantage of testing your estimates against things that have much more immediate feedback. Sometimes people think, “Well, I don’t really have a lot of data on how long this project is going to take.”
Well, you have a lot of data on the performance of different project forecasting methods. That’s where we’ve got tons of data, by the way. We have lots of data on that. And if we use the methods that performed best, it gives me this range. I mean, that would be tons of data.
I think people routinely believe they have a lot less data than they really do. They tend to believe that each situation is more unique than it really is. I mean, ‘more unique’ is not even grammatically correct; it’s unique or not, right, but they believe each thing is unique and because it’s unique, they believe that you can’t extrapolate anything from prior observations. If that were really true, your experience means nothing.
Brian: Do you think of these as project—I call these progress metrics versus success metrics? Like, what can we measure along the way in the world as we’re going, so we don’t, you know, over invest in something that takes a long time to measure and know? Are those two categories of things or they’re all the same whether it’s progress or success metric? It’s same thing? Or is it a different class of things?
Doug: Do we have some reason to believe historically, based on other experiences, that progress metrics correlate with success metrics, right? So, when a doctor prescribes a drug to you, they might say, “We have no data on how well this drug is going to work with you.” They might even have to wait a long time. Maybe the drug is going to slightly reduce or greatly reduce your chance of heart disease in the future because you have a family history. So, you won’t really know whether this is working or not, if you don’t die of a heart attack in the next ten years.
You know, you might say that there’s no reason to believe that, but there’s already been longitudinal studies that go on for decades. So, you actually have a lot of data about that drug on other people, okay? Likewise, you have lots of data—I would presume in your field, whichever it is; I would be surprised if there weren’t studies on this—where people tracking past projects, their progress metrics correlated to success metrics, it’s obvious that they should because if the project metrics are terrible, then obviously, that’s going to affect success metrics, right? That’s not proof by itself if the success metrics will be good, but it certainly is evidence, right?
Brian: Yes. Let me ask you, though, a good example of this would be, you know, when we’re designing software technology—data-driven, or otherwise—how about progress metric might be we showed it to the sales team, or we showed it to leadership; they feel like it’s on track. We’re getting good verbal, qualitative feedback that, “I can sell this. I can use this. This would make my job so much easier.”
We’re going through iterations and hearing this feedback versus empirical data from studies or whatever that may be factually accurate, it’s sourced from other places, there’s a lot of it, but it would take us a lot of time to go out and do that, or we’d have to resource someone to do it versus this is what the stakeholders believe right now, which is I’m hearing good things from the head of sales. I’m hearing good things from the business sponsors that they like where this is going. If they believe that does that matter or not? Is that even more powerful than what the data would say about it, that a project is on track, let’s say, a long-term project that has, you know, high risk but high reward to, let’s say, is it on tra—this technology initiative on track—for quality; not from a time perspective, but are we headed the right way? Is continuing this investment smart?
Doug: Yeah, sure. Well, remember, again, when you have a lot of uncertainty, that’s exactly when you don’t need a lot of data to reduce it significantly. That’s the general rule of thumb here. When you—like I said, when, you know, almost nothing, almost anything will tell you something. If what we’re trying to improve upon is just the subjective judgment of the stakeholders, all the research today—and by the way, there’s—here’s another area where there’s tons of data; there’s tons of data where naive statistical models are compared to human experts—and—literally hundreds of studies like that—and the consistent finding is that even naive statistical models outperform human experts in a surprising variety of fields: disease prognosis, and the outcomes of sporting events, or the failure rates of small businesses, or how long married couples will stay together, that sort of stuff.
Even in areas where human experts might insist that there’s no way that an equation could outperform them, the equation does. I remember running into this when I was doing the movie forecasting, when I was working with analysts that were in the movie industry. They all perceive themselves as artists and things like this, and they had a hard time believing that a simple equation could ever capture—literally, they used this term—hundreds of variables that they consider in a holistic network. And they had—they imagined this extremely complex cognitive process going on. Apparently, my model with 11 variables was better than they were.
It was a pretty simple linear model. With 11 variables. Remember, they were the ones that had the correlation of zero between the original estimates and the actual outcomes; my model had the point three. There’s this term called the illusion of cognition. So, people imagine that they have these really elaborate cognitive processes for all their judgments and that’s why they have a hard time imagining a simpler equation, doing what they do.
No, the simpler equation doesn’t do what they do, but apparently, a model where a simple linear equation is being used outperforms them because what’s really going on in their heads are some rules of thumb and a bunch of noise. That’s what’s really happening when you’re making these subjective judgments. There’s a few areas where human judgment is better than an equation, but—I don’t know how old your child is there, but the areas where humans are better are where he or she is better already: pattern recognition, visual and audio, like being able to spot his mother’s face in a crowd or hear voice in a crowd, even some software has a hard time doing that right now, right? Or even a robot just navigating a roomful of toys with Legos on the floor and stuff like this. A robot has a hard time doing that; your child does it much more easily than the best, you know, Boston Robotics robot, right? That’s where humans are good. So, everything that humans are good at in preschool, right? All the other stuff, when we start synthesizing lots of data, we should probably just do the math.
Brian: Tell me—and I want to hear where people can get your books and follow you here, but I wanted to ask, with the progression of machine learning today, where’s this being used so much more than it used to be, I know one of the challenges, it can surface realities that are contradictory to what people have believed for a long time that oh, this prediction is highly correlated with these factors that we’ve never bothered to pay attention to. And it’s a hard pill to swallow sometimes. Can you talk to me about when machines come up with these factors versus our stakeholder or the person we’re interviewing or serving is the one helping us understand, well, what measurements can we see in the wild that we care about? All the sudden the machine is finding these things? What’s different there? Just expound upon that a little bit?
Doug: Yeah. I think it needs to be an important type of input for our judgment and decision-making. That’s different than saying that I should accept it, necessarily. I think we should still be skeptical. I mean, if something just doesn’t make sense, you investigate it, but also don’t reject it out of hand, right?
Use it as a clue. Maybe there is something there that I should have been paying attention to before, right? That’s what we do with the data. We don’t necessarily accept it automatically. And even if we never really figure out what the underlying mysterious process is, we can still observe empirically, whether or not it actually is better at predicting some outcome, right?
We can start testing at that point and see, you know, now that I have this model, did it just overfit some historical data and come up with some, you know, weird parameters I’m supposed to pay attention to? If that’s the case, then it’s not going to be any better at forecasting the future than maybe something random or just me guessing or something like this. But if it’s actually better, than maybe there is something there, right? So, I say we should be, you know, hopeful for that sort of thing. Not gullible, but not too skeptical, either. That’s the balancing point.
You’ve made a few points, though, about how difficult some things might be to measure, right? That some things might take a while to measure to get a lot of data on, and we talked a little bit about how there really is a lot more data than you think when you cast your net a little bit wider and look at lots of, you know, analogous studies and cases and situations. But also, I think it’s informative to actually compute the value of information. That’s a calculation. You can compute the economic value of uncertainty reduction.
Sometimes I’ve heard people say, “Well, X is difficult to measure,” then I compute the economic value of it. And I say, “Well, it’s worth, apparently, two-and-a-half million dollars a year to measure. Does it still sound difficult?” Well, no because if you put it in that context and you’re only talking about spending a hundred grand a year to measure something, well, now that sounds like a bargain. Previously, all they heard was the hundred grand.
So, they thought, “Well, that’s too much to spend on a measurement.” Well, no, look at the size and the uncertainty of the bets you’re making continuously here, and this equation says it’s worth two-and-a-half million dollars a year to measure that puts a whole different spin on those things. So, how difficult something is to measure should be expressed monetarily.
Brian: My last question for you, what’s the role or value then of our individual expertise? I mean, I’ve been a designer for 25 years. I’d like to think that I know something about that. I also like to be objective. I love when I find out things are wrong with my design, through the process of testing and evaluating with users. That’s when I learn something, but when does my experience matter?
If I’m a CEO, I don’t have guarantees and all my decisions. “And I’ve been selling boilers and coolers for 45 years, Doug. Don’t tell me X, Y and Z.” There’s got to be some value to that intuition, but you challenge that a little bit, I think on our previous call. What’s the place to rely on my gut and my intuition? Where’s the time for that, especially in business?
Doug: Yeah. Well, first off, there’s basically two areas to rely on gut. One is when you make your initial estimate. When we build these models, these statistical models, we use Bayesian methods, and your viewers will be familiar with that, right? Which means you have to state a prior.
Well, we require subject matter experts to state their prior. The additional data just updates those priors; we need to get them calibrated because most people are terrible at quantifying their initial uncertainty, so they should be trained to do that. We’ve got methods for doing that. In fact, we’ve automated calibration training, it’s entirely automated now. You can do it self-pace.
But the other thing is, there’s currently as far as I know, there are no algorithms for how to frame the question, or what questions to ask. Maybe AI will get there; it’s not there now. Right now, the really hard thing for algorithms to do, like, how to frame the question, what’s the nature of the problem? How do we define it? That’s where we need the human, okay?
But when you start synthesizing a lot of data, we really should just stick with the algorithms, basically. Now, that conversation that you just mentioned, “I’ve been selling, you know, water heaters for 45 years,” or whatever. Yeah, I’ve heard that plenty of times, right? So, it’s usually, those are not the people who hired me; their bosses hired me but I have to talk to them as subject matter experts, and they get, you know, “I’ve been doing this for 45 years.” Well, it turns out that experience is not a guarantee of performance improvement.
You need three things for experience to convert into learning. And they don’t often exist for a lot of areas of management. You need feedback, and that feedback needs to be consistent—meaning you get feedback almost all the time—it needs to be relatively soon—immediate even, not two years down the road—and it has to be unambiguous. Now, there’s a lot of areas of management where you never get that. In areas of oncology and disease prognosis for cancer patients, how well they will do in different treatments, the correlation between prognosis of experienced oncologist and observed outcomes is-0.01. About the same as random chance; slightly worse than random chance.
But their confidence is very high. And why is that? Well, because they get inconsistent and delayed feedback. They don’t get feedback immediately. Like, if you’re shooting baskets, or bowling, or something like this, you get immediate feedback, right, on those situations. You just wait seconds, and you got feedback.
And if you track them all, and start keeping track of your score over time, you can see that. You know, so if you hop on a scale and say is my diet working? If you’re using that you’re getting some unambiguous feedback, and you can get relatively quick feedback every time you hop on the scale, right? But if you’re in a profession where that doesn’t seem to exist—which seems to be the case more, the higher up you go in management; I mean, if people are making judgments about big projects that go on for years, by definition, they’re not going to get immediate feedback. They made all sorts of forecasts about how much money some new product is going to make.
They’re not going to get that feedback. Even if they ever do get the feedback, it’s going to be highly delayed. And sometimes, even when they could have gotten the feedback, the feedback is something like, “Project was successful,” or moderately successful. That’s not going to cut it. In order for people to improve their forecasting performance, they need unambiguous feedback as well.
So, it’s not the case that experience necessarily converts into learning. There’s two researchers on this were Daniel Kahneman and Gary Klein. They wrote a paper together about what is required for experience to turn into learning; that was their conclusion.
Brian: Yeah. Another good framing, I’ll leave on that topic was, you know, “Oh, you have 25 years experience? Yeah, but you have one year of experience lives 25 times over.” [laugh].
Doug: Yeah, right. I’ve heard that one lately.
Brian: Like, how much learning [crosstalk 00:43:00]—
Doug: That’s a good one. You know, people—nobody—
Brian: That’s a Jared Spool—
Doug: —wants to believe that their years of experience may not add up to that much, right? Now, so what’s happening here is there are things they did learn, but it’s not about the forecasting. They learned something about office politics; they learned something about procedures, and compliance, and how certain customers are and so forth, but when it comes to actually forecasting outcomes for long-term projects and so forth, they’re not necessarily any better than brand new, you know, graduates from college or something like this. Certainly brand new oncologists are about as good as their seniors at prognosis of cancer patients. By the way, neither the junior or the senior ones want to believe that.
Neither group believes that. The junior group wants to believe that the senior people know more. And the senior people want to believe that they know more. But apparently they don’t. On that particular problem.
They might remember a bunch of random facts about, “Here’s a drug interaction I should think about,” “Here’s a nurse that’s very reliable.” “Here’s a procedure that, you know, seems to work well in that particular condition.” But when you isolate a particular forecast—which is what a prognosis is—well, then it’s not necessarily a given that their experience informs that task.
So, there’s a lot of tasks involved, right, and some of their tasks are forecasting. There may be a lot of things they’ve learned about other tasks, but it’s not necessarily the task related to forecasting.
Brian: Well, this has been, Doug, a great conversation. I want to invite people to check out the book, How to Measure Anything. You have other books out there, too. Where’s—and Liam even likes the book, apparently—where’s a good homebase for people to find out about your writing, and your whole Applied Information Economics, where’s the place to go?
Doug: Certainly, hubbardresearch.com where the AIE Academy is. So, AIE Academy—Applied Information Economics Academy—there’s a whole list of courses that people take. Also you can find all my books on Amazon of course, but if you buy them off our website, I’ll sign them for you.
So, I figured my signatures were worth five or ten bucks, whatever we added to that. So, the books are out there. I’ve got multiple books now. They’re in eight languages, over 150,000 copies sold. And we work in a variety of industries. I love to hear people say, “I know you’re really a measurement guru, but here’s this one thing I know you can’t measure.” I love those challenges.
Brian: [laugh]. Excellent. You should have a ‘Submit a Challenge’ button on your website, you know, [laugh] a just start a podcast and go over each one. Like, that would be a fun show to listen to.
Doug: Yeah, I’ll do one of those.
Brian: ‘Stump Doug,’ you know? [laugh].
Doug: Exactly, yes.
Brian: Excellent. Well, Doug, this has been super fun. Thank you so much for your time today.
Doug: You bet. Thanks.