Angela Bassa is the head of data science and machine learning and the director of data science at iRobot, a technology company focused on robotics (you might have heard of the Roomba). Prior to joining iRobot, Angela wore several different hats, including working as a financial analyst at Morgan Stanley, the senior manager of big data analytics and platform engineering at EnerNOC, and even a scuba instructor in the U.S. Virgin Islands.
Join Angela and I as we discuss the role data science plays in robotics and explore:
- Why Angela doesn’t believe in a division between technical and non-technical skill
- Why Angela came to iRobot and her mission
- What data breadcrumbs are and what you should know about themThe difference between technical skills and non-technical skills and the skill Angela believes matters most when turning data science into a producer of decision support
- Why the last mile of the UX is often way longer than one mile
- The critical role expectation management plays in data science, how Angela handles delivering surprise findings to the business, and the marketing skill she taps to help her build trust
Resources and Links
Quotes from Today's Episode
“Because these tools that we use sometimes can be quite sophisticated, it's really easy to use very complicated jargon to impart credibility onto results that perhaps aren't merited. I like to call that math-washing the result.” — Angela
“Our mandate is to make sure that we are making the best decisions—that we are informing strategy rather than just believing certain bits of institutional knowledge or anecdotes or trends. We can actually sort of demonstrate and test those hypotheses with the data that is available to us. And so we can make much better informed decisions and, hopefully, less risky ones.” — Angela
“Data alone isn't the ground truth. Data isn't the thing that we should be reacting to. Data are artifacts. They're breadcrumbs that help us reconstruct what might have happened.” — Angela
[When getting somebody to trust the data science work], I don't think the trust comes from bringing someone along during the actual timeline. I think it has more to do with bringing someone along with the narrative.—Angela
“It sounds like you've created a nice dependency for your data science team. You’re seen as a strategic partner as opposed to being off in the corner doing cryptic work that people can't understand.” — Brian
“When I talk to data scientists and leaders, they often talk about how technical skills are very easy to measure. You can see them on paper, you can get them in the interview. But there are these other skills that are required to do effective work and create value.” — Brian
Brian: Welcome back to Experiencing Data. Brian here, of course, and I'm happy to have the Head of Data Science, Data Engineering, and Machine Learning at iRobot on the line, Angela Bassa. How are you?
Angela: I am great, Brian. How are you?
Brian: I'm doing great. What's shaking today? You're up in northern Massachusetts, outside of Boston, is that correct?
Angela: Yep, just outside of Boston.
Brian: Yes. You're in the leaf, the leaf zone, probably.
Angela: It's gorgeous out! We're in peak foliage. It's really, really quite gorgeous out.
Brian: Nice. Excellent, excellent. I haven't had anybody on Experiencing Data yet, as far as I know, that is working in the robotics space as we talk about data science, and data products, and how we use analytics and information to improve you know end-user customer experience. And I was enjoying some of your tweets and you came into my radar. And I think one of our past guests connected us and I was excited to just get your take on this world. And so I'm really glad you could come on the show and share some of your ideas. So tell the audience, our listeners, tell us a little about iRobot and your work there, and how does a mathematician fit in with all that?
Angela: Yeah, so thank you so much for having me. I'm actually a fan of the show, I've listened to quite a few episodes, not just in prep. But I actually, because one of the things that's really big for me in terms of how does one ethically interact with data has a lot to do with how you communicate it, and how you structure that exchange of information without sort of math-washing things and just assuming truth, without really interrogating the potential biases and second and third-order effects. Which is a lot of what you discuss as well in terms of articulating that within an enterprise analytics consumption environment, so there's a lot of overlap.
Angela: In terms of the robotics space, there's not quite as much overlap.
Brian: Uh huh.
Angela: But I also don't want to be projecting as if I come from a deep robotics background; I don't.
Angela: I am sort of one of the biggest fangirls here when I see the folks who actually work on the robotics end of the technologies.
Brian: Uh huh. Mmhmm.
Angela: So my end is closer to the math end.
Brian: Uh huh.
Angela: Is helpful because that's my background.
Angela: So, I started my career-
Brian: Actually, before you before you go into this,
Angela: Alright, go ahead.
Brian: I have to ask you a question. I love this word and I have no idea what it means. What is math-washing? This is such an awesome term. I love it and I don't even know what it is. But I just immediately pictured Listerine, but it says "Math Wash" on it.
Angela: Yes! Right?
Brian: What is this?
Angela: And it makes you think like everything is better, and it's just cosmetics. So it's not my term, I didn't come with it, and I want to say it's a term that Fred Benenson sort of coined, but don't quote me on it because I don't have perfect memory. But it essentially means that, because these tools that we use sometimes can be quite sophisticated, oftentimes they aren't but they can be, it's really easy to use very complicated jargon to impart credibility onto results that perhaps aren't merited. And I try, I like to call that math-washing a result.
Brian: Mm. Awesome. I mean kind of not awesome, but I get the point.
Angela: Well, it's good to label it, right? So you can call it out.
Brian: Yeah. That's, that's interesting. But I totally interrupted you. So you were kind of jumping into how you were saying you're less on the robotics end, but you're working in a robotic environment and how you fit into that. So, sorry for that, but I had to jump in there. But please continue.
Angela: Yeah, no, of course. So, in terms of how the field of data science and data analysis sort of overlaps, as I mentioned, there's a ton of really brilliant people here. And iRobot is quite a well-established company, it's you know got almost 30 years. So they have brilliant algorithmists and roboticists and technologists, but our product as a connected consumer product is a more recent development for the company.
Angela: And so when they launched, they sought me out to come in and establish sort of a data science discipline within the company to leverage that fleetwide data-ingestion point and knowledge potential that they had at their disposal. And so we, our team is sort of, our mandate is to make sure that we are making the best decisions, that we are informing strategy rather than just believing certain bits of institutional knowledge or anecdotes or trends. We can actually sort of demonstrate and test those hypotheses with the data that is available to us. And so we can make much better informed decisions and, hopefully, less risky ones.
Angela: So that's the promise, and that's how it fits into the robotics space. And it's really important to be really, really thoughtful about that analysis. Because one of the things that I often say on Twitter and to whomever will hear me, sort of on that vein of math-washing, is that data alone isn't the ground truth data isn't the thing that we should be reacting to. Data are artifacts. They're breadcrumbs that help us reconstruct what might have happened.
Angela: And so that's what I try to do, is make sure that we are being respectful of the biases that are potentially influencing us and making us think we're going where the data is pointing, when sometimes that's not what we're supposed to interpret.
Angela: So that's sort of a long-winded answer of how can robotics benefit from this in our space.
Brian: Right. I'm curious, the you've been there about two years, is that right?
Angela: Almost three, yeah.
Brian: Almost three years. So, when you came in, and it sounded like you were kind of starting this department, so to speak?
Brian: What was the problem or the gap or the need that the business said, "We need to go out and hire an Angela, and find the Angela"? Or was it a, "We're not really sure, but we have all this telemetry you know coming from our advices and there's potential to do something with it and we need someone to come in and help us figure that out"? Or it's, "We know we want to do X, and we need this particular expertise." What was that setting?
Angela: I think, yes. And, like most of these sort of strategic decisions, you sort of have a vision of what you hope to accomplish. But hopefully, you'll also have the humility to know that if you're seeking somebody out, it's because you need them to help you figure out what it is that you don't know yet. So I think it's probably a blend,
Brian: Uh huh.
Angela: Appropriately so.
Angela: So they, they obviously had folks who understand data and understand science. Right? The data and science are sort of very present in most of [inaudible 00:07:37] so sometimes they are the most difficult audience because they're like, "How are you doing it differently than I do it? I look at numbers.
Angela: I test hypotheses. Right? Like what's different?"
Angela: And so I think what we bring as data professionals, and what I tried to bring here and stand up an organization that reflects these values, is we bring industry the best practices. We bring rigor, and we bring certain tooling that is specific to that domain. And so if your domain is pharmaceuticals, you get really good at understanding how assay machines work, especially early in the, in the process. If your gig is marketing, you can become really good at understanding sort of surveying instruments.
Angela: And so for data science, when you're doing that at scale, and when you're doing that in a cutting edge platform like iRobot's, I mean, iRobot is really at the tip of the sphere of the serverless architectural movement. And so to build a data scientific function on top of that kind of architecture takes a certain level of experience and know how that I think a lot of technologists... part of what makes us great, part of what makes us amazing, is this hubris of, like, "Yeah, I can invent that. How hard could that be?"
Angela: But the flip side to that can be a little bit of arrogance, of not valuing just how hard it really is. And I think that's one of the things that's amazing about iRobot, is that that hubris is couched in a lot of humility. And that's what I encountered here, which I think is why we're doing [inaudible 00:09:16].
Brian: I think at some point you self-identified as a professional nerd. And so I wanted to ask you, what's the role of a professional nerd in a company like iRobot? And are you surrounded by them? Or do you feel like, "Hey, I'm a nerd and everyone else isn't"? Or, "I'm surrounded by them and I'm with my peeps"? Like where... How does that
Brian: How does a data science role like at iRobot fit in if it's... I mean, I'm sure you have lots of different departments, right? Not everyone is a roboticist, but what's that vibe like?
Angela: Oh, yeah. So, you know it takes an entire village. So we have amazing folks with marketing understanding, and amazing folks with logistics understanding and manufacturing, and folks who really understand product and who really understand hardware and software, who understand the cloud and who understand algorithms, who understand machine learning. So it really runs the gamut.
Angela: And in terms of nerdery, I think there's seven billion types of nerds. Right, like we're all nerds. And I just get to be one who gets paid to think for a living as well, which is you know an enormous privilege.
Brian: Yeah, yeah.
Angela: But I think that's part of the reason that data science has flourished so much here at iRobot,
Angela: Is because there is sort of a game recognizes game thing that nerds do to each other, that once we started accelerating everybody else, once we started making it easier to incorporate learnings. Right, like that feedback loop can take a really long time. Or, you can shortcut that and really get feedback on how well or poorly are your decisions guiding your outcomes.
Angela: And so I think that's part of what's happened here. But you know it's, it's because this team has done the hard work of making sure that we are communicating what it is that we do with that humility of knowing, "This what we do know. And this what we're not going to pretend we know."
Angela: And that way, you can take on the right risks. Because not everything is risky; right, like you can then narrow down the playing field of which are the bets that you are willing to make, and which are the bets that are likely to pay off and how so.
Brian: You just used the word like how our decisions are facilitating outcomes. Right? Can you talk to us about, and this is something we talk about a lot on this show, but the difference between creating outputs and creating outcomes? And how do you decide how you're going to measure an outcome, and do you follow through with that evaluation, with the business sponsors that you work with? Can you kind of unpack this output versus outcome thing for us, and how that works at iRobot?
Angela: Yeah. I think now, we're in a really good place where, because we're victims of our own success, we have a lot of folks who want us to help, to play a consultative role. And in that prioritization, it's easier to say yes to things that will actually drive decision making. But that's sort of, that's the end stage you want to get to. And it's hard to get there.
Angela: And the thing that I believe works in getting a team to that point where you get to work on the 80/20, right? In the beginning, especially when you're still establishing the role, establishing the capability, it's you know it's a substantial investment. It's not cheap to stand up a team like this. And so you want to make sure that you're creating opportunities so that serendipity can cross your path, and you can find those pockets of institutional knowledge that aren't well-documented where you can really make a difference. And so those are hard to find, and you need to create space for them to bloom. But it's really important to deliver ROI really early on.
Angela: And so, in addition to making sure that there's that space for intelligent, sort of high-return bets to be placed, the big initial set of activities really needs to be about instrumenting everything, creating metadata everywhere, so that you can actually figure out who's looking at what and for how long.
Angela: And that way you can either figure out, "Okay, this is taking too many people too long," so you can optimize it. Or, you can find the pockets where nobody's looking, and you can start to go, "Okay, why? Is it too hard to understand what's happening here?" And that way, you can even guide where decision points can happen that haven't been looked at yet.
Angela: But initially, having just exploration is helpful in that it drives the effort towards a decision point. But just knowledge for knowledge's sake, if it's not going to make anybody choose different, or be more confident in the choice they make, then it sort of feels a little bit like wasted effort.
Brian: Mm-hmm (affirmative), Mmhmm. Can you talk to us a bit about showing value early? So, whether this is like prototyping or, what is it that you do to structure a project such that you can deliver the smallest amount of measurable value early to build trust? And it sounds like you've created a nice, almost like a dependency or a... You're dependable. The team is looked at as strategic and like an asset and not kind of off in the corner doing cryptic work that people can't understand.
Brian: Instead, you're like partners now. You must have had wins along the way to do that and I'm curious, how do you do that fast? And how do you do it... Or do you do it like incrementally? Because you did mention there's a sizable investment upfront, but I'm sure people are, upstairs are looking for some type of ROI soon that they can attribute and say, "Yes, this was a good investment." Can you talk about that like process?
Angela: Yeah. I think that in, and it being a substantial investment has really to do with what kind of instrumentation is already in place. So if you don't have a lot of data that needs munging, then your return is going to be really difficult to materialize if you make a large investment in talent, but you don't really have anything for them to science on. Right? Really getting a good survey of the data landscape to understand what your initial investment is going to be is really important.
Angela: But assuming you have already sort of some kind of architecture in place that's already piping in information, and you want to start extracting value from that and have it be, as you mentioned, an asset and not just an additional source of potential confusion later on when nobody knows why things are in the schema that they are and so forth, I think the first part that's important is getting as much standards, standardization, in place so that you can get your information to be apples to apples.
Angela: So, for us, one of the things that was really important was changing the culture from a culture where the only people looking at those breadcrumbs are the people who left them in the first place. So they can have a lot of shorthand and things that they have in memory, their own human memory, that somebody else looking at it won't know. And that creates all sorts of latencies and hops that you need and facilitation that you need to actually find the person who knows the answer to the definitional question that you have.
Angela: So getting really good practices in place, documentation in place, being your own training data, so that it's much, much easier to leverage that human investment once you make it and I think that was one of the things that iRobot did that was really smart, is they, they were very diligent in how they started standing up to practice.
Brian: Are you saying that this infrastructure was well on track to be in place, or was in place when you came in the door? Or they needed you to help with... that was part of your work when you came in the door?
Angela: Yeah, some of it was already in place. I mean, the first cloud-connected robot that iRobot sold was late 2015, and I came in 2017. So there's was already a team in place that was making sure that all of that was working and working well. And my role was then to come in and extract insight from what we, what we had made available to ourselves. There was an architecture in place, but what ended up happening is we were now being able to iterate and iterate towards a better solution, given what we thought we wanted to do.
Angela: When I say that our team was really diligent, even as it predated my joining, it's that we didn't over-engineer. Right? We didn't over-engineer all of our solutions for all of the possible permutations of platforms and tooling.
Angela: Because if you want to stay super flexible, then that means you add a ton of unnecessary complexity.
Brian: Mmhmm .
Angela: When I came in and the team started growing and we started making certain standardization decisions, we were able to simplify our architecture, or bring in different criteria to evaluate potential solutions or potential areas of additional investment.
Brian: Mmhmm . Can you tell us a little about the skills required that are non-technical in terms of... so you walk into this environment where it sounds like there's some rough plumbing in place, there's some data in place. You're coming in the door to deliver insights back to the business, part of which are, "We have X, Y, and Z question we want answered" and then there's like, you know, A, B, C thing, which is, "We don't know what it is, but you're wearing the problem-finder hat. Like let's go find some interesting questions to answer."
Brian: What is, what's required in that skillset, and is there something you teach or do or look for in your hiring to deliver that skill back to the business? Because it's not a natural fit for a lot of... when I talk to data scientists and leaders, they often talk about the technical skill is very easy to measure, and you can see it on paper, you can get it in the interview. But there's these other skills that are required to do effective work and create value. So what's, what do you look for there?
Angela: Yeah, I would actually push back and challenge the assertion that technical skills are easy to see.
Angela: Because I think the way we talk about, and this is just a personal pet peeve,
Angela: But the way we talk about technical versus non-technical skills, I actually I don't buy the premise. I don't even think that that distinction is really helpful.
Angela: Because if you know how to structure a particular line of code or query, but you don't take into account the fact that by doing it in one way or another, you might not have accounted for a particular assumption.
Angela: So, let's say a data set is intended for one use and you're repurposing it. And you forget to account for the fact that the way data was imputed for its original purpose invalidates the hypothesis for this repurposing. If you don't think about that, sure, your code can be pristine and the code will compile And that's worse. Because that's the math-washing? Right,
Angela: Like that's when you get an answer that doesn't reflect what you thought you were asking.
Angela: And so I think divorcing the "technical" from the non-technical
Brian: Mmhmm .
Angela: Isn't helpful. Because it's all problem solving,
Angela: And it's all creative thinking.
Angela: So what I look for, I don't call them soft skills because I don't think they're soft, and I don't think the technical skills are hard skills. I look for curious people. I look for intellectually curious, but also just generally curious because that curiosity is what's going to make you go splunking farther and discover the things that aren't necessarily positively labeled, but they're just missing. And I'm looking for folks who won't assume that things are editorial choices, that they understand that a lot of these systems are built on top of each other for different purposes. And that attention to detail, that attention to the humanity of the system.
Angela: So one of... As an example, right? One of our best data scientists on the team, she's a senior data scientist, Teresa Borcuch. Her background is in marine biology. Right? So like, even different from my background of mathematics, it's not a robotics background at all and it's not a "traditional" data scientific background. But the thing that's brilliant about what she brings, is she studied pods of dolphin in the wild. So she has firsthand experience gathering primary data and analyzing it in a really, quote on- like literally wild environment. Right?
Angela: So her appreciation for the fact that the data you're looking at is just breadcrumbs, it's, it's the digital exhaust of a system. It's just the artifacts that you use to try to reinterpret and build a model of that system that you then can manipulate so that you can feed it inputs and play with how those outputs vary, given the variance that you input into it. Right, like that sensibility is really important.
Angela: And it's definitely trainable, it's definitely learnable. It's not something that you either have it or you don't. But it's not something that is easy to find. And I think it's not really something that we do a great job of teaching in technical disciplines. And I say that as a huge tech nerd
Angela: Who has a math degree, right? And didn't take a ton of humanities. I think that's something that I personally had to rectify later on.
Angela: And it's something that I look for in building a team, is folks who have that appreciation for the hybrid nature of the role. Because we're therapists, right? Like we're coming to you at your worst point. You're like, "I don't know the answer to the question." Right? You're tasked with figuring something out and you're having to get external help. So we're coming in when things aren't working super well. Because if they were, you'd be using the self-service that we implemented, right?
Brian: Right, right, right.
Brian: Part of my question might have been biased towards what I had in my head, but what I was getting at mostly was the, and again I'm casting large generalizations here,
Angela: Sure, sure.
Brian: But that there's a concern that it's really hard to understand the results and the insights that are sometimes provided back at the end of a project. So it's what I call the "last mile," whether it's a software application, a PowerPoint presentation, a .pdf. Like there's, there's some... And again, we're talking about human-in-the-loop solutions, right? Not about like optimizing a machine to do something better where there's no human interaction.
Angela: Sure, sure.
Brian: It's that part of involving whoever that end-customer recipient is in the process, and making sure that they trust, believe, understand, and will react to the information provided, as opposed to, sometimes you hear, like "We did all this work, we came up with this pricing model, we gave it to them," and then onto the next project. It was never used. No one even knows where it is. It's just, like, what happened? And it's like, "Well, whose job is that to make sure it happens?" "I don't know. Next project."
Brian: That's kind of one of the worlds that I hear about, at least in my conversations, so I was curious, like how do you ensure that that doesn't happen? Like how do we ensure that the insights actually get, those decisions occur, and that value is created, right, in that last mile?
Angela: The last mile is so long. It's so long.
Brian: Yes, it is.
Angela: And it can be hugely frustrating,
Angela: Especially when there's so much work that goes into something and it doesn't move the needle. It doesn't actually make an impact.
Angela: It doesn't inform a decision. It is frustrating. And I don't want to pretend that that doesn't happen to me, or hasn't happened to me in the past.
Angela: So, before I was in technology management, I was first an investment banker, and then I was a strategy consultant. And in that strategy consulting role, one thing that was really interesting to me, I was you know doing a lot of data consulting and sophisticated data analysis. And a lot of the times, you know you parachute in when things are on fire, as an external consultant. Everybody hates you. They don't want you to be there, but you have to be there because you're the objective party. And you come in and you do this enormous analysis, and you print out 500 slides that you walk everybody through. And then you leave, and that becomes a doorstop.
Angela: And nothing comes of it because what they needed was the process.
Angela: That's very different when you're running a data scientific team within an organization,
Angela: Rather than when you're consulting and coming in and sort of helping folks work through their process so that they become better equipped to navigate it once the consultants leave.
Angela: When you're in the company, you don't leave.
Angela: Right? Your things don't move the needle, you don't get asked next time, which is worse. I was sort of being facetious but not really. I mean, you are a therapist and you are sort of there helping folks understand what their questions are, understand what's possible, understand what is reasonable to expect, and what is not reasonable to expect.
Angela: I think one of the big problems with embedded analytical functions in organizations that don't have analysis as their end product, is that it's easy to forget that that is a skillset that is available to you. And so part of what was the hard work here, and you know in my previous role where I set up a data science discipline, the hard part is getting to know everyone, getting to understand their problems, getting to understand how their problem is connected to everybody else's problems, and to the business. And really understand, almost doing like [inaudible 00:27:50] sensitivity analysis of what's actually going to impact the business.
Angela: So that as you choose which activities to prioritize, that that's part of the equation, that you're always making sure that you're working with people who will be your megaphone, right? Who will be the ones that shout in rooms you're not in that the reason they're making this decision is because they have all of this evidence, and here's why the evidence is credible. That credibility is the currency. The minute you don't have credibility, then nobody trusts the data anymore. And you know nothing good comes from when you can't even trust the information you're being given.
Brian: And does that trust stem from the method, the timing, and the frequency that you're engaging with the end, I'm going to call them a customer, whether it's a business sponsor or whatever it may be, that they've been engaged early on frequently, perhaps? Like is that correlated with the success?
Angela: I'm not sure. I don't know that, I don't know that I can generalize.
Brian: Mm-hmm (affirmative). In your experience.
Angela: I think it depends...Yeah. In my experience, I think it depends who that customer is.
Brian: Mmhmm .
Angela: So some customers prefer sort of frequent touchpoints. Some customers, that's not what's going to give you credibility. I think that credibility comes from, I mean I really think that credibility comes from the humility. I think the minute you tell someone where your knowledge ends,
Angela: They're much more likely to believe that you have the knowledge you do have, rather than infinite knowledge.
Angela: And I think that's one of the curses of data science and the hype surrounding data science, is that it's really easy to feed some numbers into an algorithm that's a black box and doesn't have any interpretability built in. And it spits something out and you're like, "Eh? That's what the model says."
Brian: Right. I'm curious. It feels and... It feels to me risky if there's not this regular engagement. I mean, unless you've built up so much trust over time that someone can "throw it over the wall" to you and say, "I already know Angela kicks ass, so I don't need to be that involved because I'm going to trust what she brings back to the table." But that comes with like experience and trust, and you've delivered value. And you've, that person typically knows-
Brian: I guess I'm casting an opinion there that there's a bunch of stuff that happened prior to getting to a place where you can work like that. It feels to me like it would be pretty risky. Have you not seen a difficulty in getting people to adopt your decisions, your recommendations when they haven't been involved? Like that seems different to me than I've heard before, but I'm curious.
Angela: I agree completely. If they haven't been involved it's really, really difficult. But I think, I think what I'm perhaps inelegantly trying to say is that I don't think the trust comes from bringing someone along during the actual timeline. I think it has more to do with bringing someone along with the narrative.
Angela: I think if you have a really strong understanding of what the objective is, right? If everybody has shook hands and agrees on what the outcome is, it's much more important that the outcome be clear, that the assumptions and the edge cases and the ways in which it fails be transparently articulated
Angela: Than to demonstrate how the sausage is made.
Angela: Because I think, part of what I was saying in terms of this is all problem solving and it involves creative thinking is that it's really hard to demand a really interesting, novel finding on a cadence.
Angela: And sometimes you're going to go two weeks and you're still you know muckling through dirty data and deciding whether imputation makes sense or doesn't make sense, deciding whether leaving gaps and what defines an outlier, and you need more interviews with different people that aren't the original stakeholder.
Angela: So I think making sure that the process is well-documented and well-communicated is probably more important than the frequency of contact. Which isn't to say that isn't valuable or that doesn't also impart trust, but I think being really clear about what went into an analysis and how an analysis fails probably delivers more bang for buck.
Brian: Yeah. And I should say I'm not necessarily advocating like learning how the sausage is made step by step. I don't think any client you know or customer really wants to be that involved if it's not their core you know discipline unless they have a real control issue, which signifies a trust issue,
Brian: Which is something else.
Brian: But what you just said about defining what the outcome would be and also, like, "Well here's what it's not going to do." And do you talk, do you have some you know planning session, or some kind of discovery where you talk about these theoretical outliers and how, like, "Just to be clear, like when we do this, here's what it's not going to do. Here's what it's not going to answer. Here's how, here are the possible outlier situations that could occur." Do you kind of tease that out or have a session about this at some point, whether it's one time, or regardless of how many times, but there's some expectation about what might be coming at the end?
Brian: And I'm not talking about the final number, "67 is the number that came out." I'm not talking about the number, but the process that you go through and what it will and won't do, whatever the solution or insight is, is that part of the project process for you early on?
Angela: That's a really good point and yes, very much so. Yeah. So, that expectation management-
Angela: Is probably... we like to laugh that data science is 80% data cleaning and 20% analysis and sciencing.
Angela: I mean, well functioning teams don't have that. But even so, I think in a management capacity, I think like 60% of my payroll is expectation management.
Angela: It's not that I, I don't dislike bad news; I dislike surprises. And I think our stakeholders are the same way. They don't want to be surprised by a finding. And the way that you build trust is you minimize surprises.
Angela: And that means if you're building somebody up to a counterintuitive result or finding,
Angela: Making sure that you're walking them through all of the steps, whether, you know, sometimes it does mean you have to go into the sausage making steps. Right?
Angela: Like for me, sometimes that does mean, "Here's where on the chip board this thing matters." Or, "Here's where on the particular line of code this thing matters.
Angela: This is where on the piece of information that we're analyzing, this is why this part is sensitive."
Angela: So sometimes it does go down to that level, not often, but I think, yes, you're right. It's much more about, and I think with tipping my hat to my marketing friends, that the adage goes that you need 10 touchpoints before somebody will remember. So the frequency definitely plays to that. It's definitely helpful if you get to tell somebody over and over how this analysis works, what it will and what it won't do, the fact that when you're doing a K-means clustering that you're imposing a particular number of clusters, and that that's an editorial choice.
Angela: And we get to choose what we want that to be.
Angela: Making sure those kinds of things are really explicit helps in making sure that the final number is a credible number that will actually inform, the last mile. Which is where it all, which is what matters.
Angela: Right? Like if we don't make that change, if we don't make that manufacturing change, or that coding change, what's the point of looking into the data?
Brian: Sure, sure. Do you ever simulate those findings? Let's say it's a domain or area in which you can predict just from experience a range of values, or what outcomes might come after you actually run it through the sausage maker. Do you ever tease this stuff out early with the stakeholders and say, "So if it's 69 and not 67, you said 67 is peak, but we think there's a chance it could be peak out," how are you going to react to that? Are you guys going to stop the project? Or are you going to do whatever?
Brian: Do you ever kind of tease out those outlier scenarios in order to figure out whether or not you need to adjust the sausage process upfront? Like, "Oh, we need some interpretability package and we're going to really need that, which means maybe less accurate prediction, but we'll have the explainability part," which means the business can then kind of live with the result no matter how much of an outlier it is. Do you ever do any type of that, or not so much?
Angela: Oh, absolutely, yeah.
Angela: So, when we're choosing what the best methodology is to answer a question, I think the precision of said methodology is one criterion, but it is not the criterion.
Angela: I think what you're saying plays a great deal into how we choose to answer a question. Right? Is this something that needs to be maintained by a different team?
Angela: Or is this something that we're going to own? Is this something that it has to be repeated or is this a one-off? Is this something that has to withstand scrutiny from a regulatory body or some other external party, or not?
Angela: And so, given those constraints, different solutions might be more appropriate.
Angela: And so sometimes we will trade perceived precision, perceived accuracy, for the ability to attribute different parts of that outcome-
Angela: To individual inputs or to certain scenarios. So yeah, definitely.
Angela: And sometimes we will choose a "simpler" solution because, methodologically, it's easier to nest that into another simpler solution. And with an ensemble, you actually get better than if you had started with a complex solution in the first place.
Angela: And then your end result is even better than had you just done the single step in an overcomplicated way.
Angela: So I think the constraints go beyond just what the right answer is because it's really important. For instance, if you use a super sophisticated algorithm on data that's really dirty, then you're just going to find dynamics within that system that are reflective of noise, that are just random chance.
Angela: Right, and that's usually something that very junior analysts tend to do more often than senior analysts. Right? Like they are eager to try out that really cool auto encoder.
Angela: They really want to try out a support vector machine once. And so they'll throw it at a CSV, and it'll give them like an amazing precision number. "Look at my accuracy! It's you know six nines, it's so great" And I'm like, "Sure, but it's pointing left. And we know it can't turn left."
Angela: It's telling you with 99% certainty that you should turn left where there's a cliff.
Angela: That plays a huge part into how we choose to solve a problem because we're looking for the real answer. Right? We're not looking for the best-looking answer.
Brian: Maybe this ties into this but I often ask guests on the show about this. So, you've probably seen all these studies that come out, and they come out every single year, about the success rate or lack thereof of big data analytics projects now. It's about you know algorithms and AI models, this type of thing. You know 10-20% would be about the average of these studies which means, you know, 8 out of 10 bats, people are striking out. Why do you think that is? What types of changes need to be made there, especially particularly on the non-technical side? Because there's always the reasons like, "We don't have the data to do any of this work, and so you know we weren't able to do a pricing model" or whatever, I don't know, whatever the example is. But, in terms of the non-technical side, do you think there are factors there that are non-technical that contribute to this kind of low success rate here that continues to be you know in this 10-20% range year over year?
Angela: Yeah. So I think it's also important to note that even the technical aspects are perhaps overlooked more often that we'd like to believe.
Angela: So I think a lot of times there are projects that go into a data set that is just not robust enough.
Angela: And that's a hard truth to hear.
Angela: I think you know it's really difficult for you step up to a CEO or a VP and say, "I can't answer that." It's a lot easier to answer, and answer in a black-box way that won't be able to be checked until you've left
Angela: And joined another company or another consulting firm or started another project.
Angela: So, because the because the learning loop can be so long that having an appreciation for just how difficult this stuff is, is really important. And I don't know that we do a great job of saying no to projects that we, and I say we as a profession. I think we at iRobot have done a better job, but obviously I'm biased. I'm going to think that because I'm the one who thinks they're doing it so grain of salt and all that. But I think that's really important is making sure that we don't try to answer questions that we're not ready to answer.
Angela: And I don't know that that happens as often as it should.
Angela: But on the non-technical side, I think the diligence to really take an answer from start to finish is really important and is also difficult. So, especially when you have a really great team in place, it is extremely tempting to go, "Oh, I have this problem. Let me just ask this person to take a quick look. You know they're smart. They'll solve it faster than anybody else could." And the problem with that is the switching costs, alright? And the opportunity costs, and the fact that that disruption is really difficult to to stop that [inaudible 00:42:28] once you do it once it's just really easy to continue to disrupt that team. And so then they can't focus and finish projects, and that can erode credibility, that can erode the reliability of the team.
Angela: So that's sort of where I think the business really needs to be careful, is allowing the analysis to complete fully before reacting to potential signals or you know. Because that's the nature of the beast. You're doing exploratory data analysis. You're going to find weird things that sometimes portend but end up meaning nothing.
Brian: I want to wrap this up real quick with a quote that you had written on your Twitter account and it said, "We're all winging it. I promise you, even the experts. Especially the experts; we're dealing with cutting edge problems and there's no playbook. We're all figuring it out as we go. Be kind to yourself. Try to figure out what it is you should do, and not what the right answer is." Can you unpack that a little bit more? What prompted you to write that?
Angela: Oh, I was probably struggling with Python or something and-
Brian: Hammer coming out.
Angela: Yeah, and trying to think, "Oh my God, what is the right answer here? What is the right answer here?" You know, so I think what that's intended to say is there's this discipline that... I grew up in the Jesuit Catholic tradition. I'm a recovering Catholic.
Angela: There's this principle of casuistry, of descending into the particular. So, when folks are having disagreements and they're having philosophical disagreements, it's really difficult to reconcile those because you're bringing up a lot of theory and emotion.
Angela: And you know what? I'm going to start again.
Angela: Because that went a difficult place. Because what I'm trying to say is figuring out what's right is unhelpful. Right? Like trying to solve the theory is unhelpful. Trying to figure out in this particular scenario an appropriate next step is, rather than what the correct step always is,
Angela: Is easier and more fruitful. Like, that's what I'm trying to get at. And I didn't get there.
Brian: No, I think you...you caught it in the end and I understand what you're saying there, so that was good.
Angela: But in essence, that's what it is. It's trying to figure out... because I think it can be paralyzing. Right? Like I'm sure you've heard of analysis paralysis.
Brian: Mmhmm .
Angela: So rather than... I think at that moment, I was probably feeling paralyzed.
Angela: And then one step at a time. And then that's how you get to the end of an analysis.
Brian: Yeah. That's awesome. Well thanks for sharing that. Where... This has been a really fun chat. I'm sure we probably have some listeners that want to follow along to your quotes and everything else. So obviously you're on Twitter, so I can put a link to that if you want in the show notes. Are there other ways that people can follow your ideas and thinking on this?
Angela: That's probably the most direct access to my id.
Brian: Okay. Excellent.
Angela: If you're looking for something a little bit less unprocessed, I also put up my talks and articles and anything that I write or participate in on my website. So you can find that at angelabassa.com as well.
Brian: Got it. And your Twitter handle is, just so people can hear it.
Angela: Yep, it's @angebassa, so that's A-N-G-E-B-A-S-S-A.
Brian: Awesome. And I will put a link to your website as well and, Angela, this has been super fun. Thanks for coming on Experiencing Data and sharing your ideas.
Angela: Thank you so much for the opportunity. I loved nerding out with you.
Brian: All right. Take care.