Today I’m chatting with Eugenio Zuccarelli, Research Scientist at MIT Media Lab and Manager of Data Science at CVS. Eugenio explains how he has created multiple algorithms designed to help shape decisions made in life or death situations, such as pediatric cardiac surgery and during the COVID-19 pandemic. Eugenio shared the lessons he’s learned on how to build trust in data when the stakes are life and death. Listen and learn how culture can affect adoption of decision support and ML tools, the impact delivery of information has on the user's ability to understand and use data, and why Eugenio feels that design is more important than the inner workings of ML algorithms.
Highlights/ Skip to:
- Eugenio explains why he decided to work on machine learning models for cardiologists and healthcare workers involved in the COVID-19 pandemic (01:53)
- The workflow surgeons would use when incorporating the predictive algorithm and application Eugenio helped develop (04:12)
- The question Eugenio’s predictive algorithm helps surgeons answer when evaluating whether to use various pediatric cardiac surgical procedures (06:37)
- The path Eugenio took to build trust with experienced surgeons and drive product adoption and the role of UX (09:42)
- Eugenio’s approach to identifying key problems and finding solutions using data (14:50)
- How Eugenio has tracked value delivery and adoption success for a tool that relies on more than just accurate data & predictions, but also surgical skill and patient case complexity (22:26)
- The design process Eugenio started early on to optimize user experience and adoption (28:40)
- Eugenio’s key takeaways from a different project that helped government agencies predict what resources would be needed in which areas during the COVID-19 pandemic (34:45)
Quotes from Today’s Episode
- “So many people today are developing machine-learning models, but I truly find the most difficult parts to be basically everything around machine learning … culture, people, stakeholders, products, and so on.” — Eugenio Zuccarelli (01:56)
- “Developing machine-learning components, clean data, developing the machine-learning pipeline, those were the easy steps. The difficult ones who are gaining trust, as you said, developing something that was useful. And talking about trust, it’s especially tricky in the healthcare industry.” — Eugenio Zuccarelli (10:42)
- “Because this tennis match, this ping-pong match between what can be done and what’s [the] problem [...] thankfully, we know, of course, it is not really the route to go. We don’t want to develop technology for the sake of it.” — Eugenio Zuccarelli (14:49)
- “We put so much effort on the machine-learning side and then the user experience is so key, it’s probably even more important than the inner workings.” — Eugenio Zuccarelli (29:22)
- “It was interesting to see exactly how the doctor is really focused on their job and doing it as well as they can, not really too interested in fancy [...] solutions, and so we were really able to not focus too much on appearance or fancy components, but more on usability and readability.” — Eugenio Zuccarelli (33:45)
- “People’s ability to trust data, and how this varies from a lot of different entities, organizations, countries, [etc.] This really makes everything tricky. And of course, when you have a pandemic, this acts as a catalyst and enhances all of these cultural components.” — Eugenio Zuccarelli (35:59)
- “I think [design success] boils down to delivery. You can package the same information in different ways [so that] it actually answers their questions in the ways that they’re familiar with.” — Eugenio Zuccarelli (37:42)
- LinkedIn: https://www.linkedin.com/in/jayzuccarelli
- Twitter: twitter.com/jayzuccarelli
- Personal website: https://eugeniozuccarelli.com
- Medium: jayzuccarelli.medium.com
Brian: Welcome back to Experiencing Data. This is Brian T. O’Neill. And today I have Eugenio Zuccarelli on the line, research scientist at MIT Media Lab. How are you?
Eugenio: Hi Brian. Good to be here. Really well, today.
Brian: You had reached out to me, I think on LinkedIn, and you were talking about how you had had some experiences working with design and user experience and how that was relevant to the data science work that you had been doing at the Media Lab, particularly two use cases that I thought we could jump into today. One was about a predictive algorithm for understanding survival rates for children undergoing heart surgery in the operating room, and then the second one was Covid-related dashboard, during kind of the heights of the pandemic, which I imagine about half this audience probably had some finger in one of those things, [laugh] including myself and you. But it sounds like you had some interesting experiences there in terms of culture, what people want to believe about the information, decision-making, and all that kind of stuff. So, I thought we’d kind of jump in with the work you’re doing on this heart surgery thing. So, maybe you can kind of give people an overview of what this model was supposed to do that you’re developing, how the humans in the loop actually experienced this, like, what’s the digital experience that goes around this, and maybe what were some of the findings that you had when you built this.
Eugenio: And as you said, the reason why I contacted you is because so many people today are developing machine-learning models, but I truly find the most difficult parts to be basically everything around machine learning, as you said, culture, people, stakeholders, products, and so on. And this was a great example to me, so I really wanted to kind of share with everyone, I think, that there are a lot of learnings. And we all as data scientists or data-related people definitely can benefit from. And the idea here is that when I was at MIT and when I was working on this project, we had conversations with doctors. MIT, of course, has a lot of relationships with a few different hospitals, and we all were able to understand the struggles that a lot of these doctors have to face.
And we all know that doctors have a lot of issues with EHR systems, so the Electronic Health Record systems, so ability to take decisions in a split second. And so, we decided to join our efforts. So, trying to put technology in the hands of doctors, with this idea of making human-centered technology, just usually what I try to always do, not make technology for a sake of making technology, but actually to make something that actually builds value and ideally value that improves the life of beings. And so, patients where there’s a lot of people, a lot of doctors have to deal with cardiovascular surgery issues, especially in children. So, when you are a doctor, you work in the cardiovascular surgery area, you have to face a lot of struggles when you deal with children that are very young, usually a few months old.
And all the rules that apply to adults, you should apply to children, especially when they are so young. So, they have to take decisions usually based on a few data points, mainly on experience, and split-second decisions. So, we decided to make a product, I’m not going to call it just a machine-learning system, but a product that would help them try to answer some of the questions that we usually have. And these are often in the surgery room. So, it’s a matter of making faster decisions about the life of children. So, really big deals. And one of these questions was, is this child going to survive this type of surgery or do we have to choose another type of surgery? And with machine learning, we try to answer the question.
Brian: I want to put the listeners in the context. Are we in the surgery room right now and there’s a piece of software running that’s being used in the middle of something where it’s work, work, work, work, work, stop, question, ask the software, get a recommendation, continue surgery, or are we talking about, like, we’re getting ready to, like, literally get in there with the scalpels and cut somebody open, but we have some questions. When is this software, this machine-learning algorithm, predictive model here, when is this going to be used in that context? Maybe you can help paint, where does it sit in the workflow?
Eugenio: That’s a great question. That’s exactly what we asked ourselves, first of all. Ideally, you have a child that gets into the hospital. They know that have a have to do surgery, usually these are planned surgeries; often they’re not, unfortunately. And so, the doctor, before going to the operating room and organizing everything, they often have to take a decision, which surgery I’m going to do.
Some surgeons might take more time, be less risky, so to say, despite they take long time. Other might take less time, and be slightly more invasive, for instance. So, before getting to the operating room, they have to take a decision and prepare for stuff to actually decide what to do. And that’s when we want to provide a tool to these doctors to say, “I’m not just going to rely on my experience.” Sometimes we have to face unique challenges and unique experiences. And so, with data, they can probably leverage some other cases all over the world. And before joining the team in the operating room, they can decide there.
Brian: Are we talking about? Hours, days, weeks prior to putting on scrubs and going into surgery? We’re talking about desk work at this point, is that correct?
Eugenio: That’s right. Ideally, this tool can at the beginning, when we were to plan it, could be used right minutes before surgery. So, as long as the team is ready, as long as everything is set, this tool can provide, this was both the theoretical idea and also pragmatically what happened. It can provide a prediction in seconds, so that even if we have an emergency and this doctor has to jump in the operating room to do an emergency surgery, this can still be done.
Brian: The majority use case is in planned scenario, but there was also usage of this tool in an unplanned surgery scenario, like a quote, “Emergency,” I assume?
Eugenio: That’s right. That’s right.
Brian: Okay, got it. And is the idea, like—just to simplify this since I don’t know the medical jargon, surgery A, surgery B, surgery C, and is the idea, the problem is this: as a surgeon, I have surgery A, B, or C to choose from. If I run A, the machine-learning model will tell me what’s the likelihood of survival based on past surgeries. Or is it based on the current health record of the child, and what happened in the past with other patients, the combination of those will give me a recommendation either about which one to use or at least what the success rate is for the different options I have- A, B, and C? I’m really simplifying this, but is that the thing? Is that what it’s supposed to do?
Eugenio: That’s right. That’s exactly right. So ideally, you want to codify some of the knowledge of the doctors, so they know that if a child tends to be this weight, this size, his height, has maybe these other comorbidities, so other conditions, so we’re probably going to go the safe route in this solution. Otherwise, you’re going to take another route. And so, pragmatically speaking, we were able to provide a lot of data on past surgeries to the system and the systems that are taking up all of these patterns, and codify them into a way that can be applied to any case on moment.
Brian: The essence of it is that the input is surgery methodology, as well as health record, current health status or statistics or whatever, some data we have about the child already, and the output is will they survive if you choose to use that method?
Eugenio: Yeah, you have data as usually happens when you have a child or a patient getting into hospital, you have a triage, you get some information. Instead of having information manually written somewhere, we might just have a tool that takes in information, like a tablet, like an app. Ideally the adult, you know, the parent of a patient or parent of a child, and those information, demographics, diagnosis, previous procedures, and other characteristics of a patient. And that information together with all of the other experiences that other doctors had with other patients together get ingested by the model, and this is able to provide pretty much just a binary classification of outputs whether this child is going to survive or not, surgery, but then you can also provide them with information, like, how long if surviving, he’s going to stay in hospital.
Brian: The primary thing that the doctor is looking for is to get an overall both survival rate and quality of life, or just, kind of like, what to expect on the other side, if we go down Route A, that’s the decision they’re trying to be more informed about?
Eugenio: Correct. It’s a decision-making tool. So, we want to understand out of all of these, options can we quantify the difference. There might be a big gap in outcomes and probability of surviving surgery between two surgeries but for other two, there might be a small difference. And so, in this way, the doctor can better understand which one would be ideal, but specifically for that patient.
Brian: How did you design this such to get the doctors to believe this? I’m wondering, too, was this for one hospital or—because I can see how if some of the surgeons weren’t involved with how it was made, there’s going to be a trust issue and especially with maybe more seasoned [laugh] surgeons that have done this a long time, they’re going to more probably more rely on experience, right, because they’ve done it so many times. So, I’m curious, was that difficult to get? Did you notice, did you actually track whether or not someone agreed with the recommendation and proceeded and when they didn’t? Did they actually say, “I’m going against the recommendation and I’m still going to do Option B,” which has a lower—but we’re going to track that and then feed that back into the model to say, “Oh, that actually worked.” Talk to me about this. What was there any challenges getting people to believe and use this? And were there people that were in less inclined to go with the recommendations?
Eugenio: For sure. It’s glad to hear that because you really found the biggest issue of the whole project. Developing machine-learning components, clean data, developing the machine-learning pipeline, those were the easy steps. The difficult ones who are gaining trust, as you said, developing something that was useful. And talking about trust, it’s especially tricky in the healthcare industry.
Of course, there are some other industries, other areas, where gaining trust with the domain experts is maybe less important. Sometimes the stakes are also lower. But when you have to deal—thank God, we have to deal with amazing doctors where they are extremely experienced in their field, we have to deal with people that have so much experience and the stakes are so high because of course, a really high threshold to gain their trust. But that’s great challenge because it really puts your skills and your experience as a data scientist or as a research scientist to the test. And in this case, the solution came by just working very closely with the doctors.
So, the trust was something that we knew from the beginning would have been an issue. We knew we had no problem developing a binary classification system; it’s pretty straightforward and normal in the machine-learning industry to make. But the issue with trust was addressed by trying to have, of course, open conversations about the whole product since the beginning. So we, of course, didn’t want to push any machine learning as a solution. We wanted to talk to doctors to understand better what they could use.
Now, if it had been an Excel spreadsheet, that would have been fine. The conversations to start with basically us agreeing on the fact that they needed a tool, ideally an app, and this app would need to predict something. At the end of the day, they care too much about what it was made of, but of course, they need to trust the algorithm. And so, I’ll say probably the key there was having ongoing conversations and making sure that every step was explained very simply to the doctors, but also trying to leverage a bit more technically what’s usually called explainable AI. So, we obviously had a whole wealth of portfolio of options, we could have gone: the deep learning route, we could have gone the linear regression route.
And we knew that from training from experience, doctors usually knew a bit of linear regression or logistic regression; they definitely do not know about deep learning. And we also knew that a black box model, some of them that not even data scientists understand. So, that will be in for sure a red flag for them and for us. And so, we went down the path of trying to do something simple, something interpretable, that you could have opened the box for and you could have looked into the algorithm, all of the decisions made by the system, so that we could have, together with the doctor, gone through the algorithm and the decision-makings to better agree that yes, this is exactly what the doctor would do. And in this way, we gain the trust.
Brian: Walk me back to the beginning of this project because you said something about - they want to predict something. So, was this, like, the cardiac surgeons came knocking at your door and said, “We have this problem where we don’t know which surgery is going to work best, and we think that looking at all the world surgery data or some big set of surgery data from the past would help us decide this.” I’m guessing that is not what happened. But what was the ask? Because at some point, you generated this thing, this solution there, and I’m trying to—one thing we hear a lot and I hear a lot in the community is that, you know, in this case, it’s not a business sponsor.
I’m guessing this didn’t come from one of the business suits. It was someone on the clinical side, but they don’t know what they want. And then the data crowd says, “Well, there’s a lot of stuff we could do, so what problem do you have?” And then the business people say, “Well, what’s possible? What could we do?”
And then is the data tennis match [laugh]. So, talk to me about the original ask and how you arrived at we’re going to work on predicting death rates based on surgery methods for children in a cardiac surgery scenario. That’s very specific. How did you arrive at that?
Eugenio: That’s exactly what usually happens. Because this tennis match, this ping-pong match between what can be done and what’s your problem, what’s your solution? And thankfully, we know, of course, it is not really the route go. We don’t want to develop technology for the sake of it. How it happened is that this European association of cardiovascular surgeons had some contacts for different reasons with people in the team I’ve been working with at MIT.
And one of the big conversations there were, “Okay, we as doctors have a lot of decisions to make and we know that we don’t rely on data as much as we could for a whole different series of reasons. And we know that there are some existing technologies that can make, you know, our decision-making better, but we know that those don’t really work well.” And their issues, you know, clearly sad, where, for instance that it’s really tricky what they have to do with children. It’s tricky business. Nobody really is sure, and it’s such a high-stakes game that they wanted to try and find advanced solutions to support their decision-making.
And that’s exactly married with my philosophy of saying AI and these tools have to be enhancing and empowering decision-makers, in this case doctors, to allow them to do better in their job. And so, that’s how it started, mostly from a question of how can we answer some of their questions? And the question is being, how can we ensure that these children survive more?
Brian: The ask from this cardiac surgery society or organization was, we think we could do more successful surgeries. And it’s either nervous for them because they lose patients, and no one likes to lose the patient, right? So, it’s high risk on the doctor, it’s hard in the families, the assumption was some kind of data-informed decision-making would help us improve survival rates, but we don’t know exactly what that is. Can you help us make better decisions specifically about what types of surgeries to make that will lead to better outcomes? That was basically the ask?
Eugenio: That’s exactly it. And they knew that we had the skills and the technologies but we didn’t have the data. But they have the data but they didn’t have the skills and technologies. So, we had a good marriage there.
Brian: Was this prototype with, like, a single hospital or multiple hospitals, multiple surgeons? I’m curious if maybe the cultures are different in different organizations such that—or it’s just one person at this hospital is involved in this society, but the other 25 surgeons are not and so they don’t know what’s happening and they have a different way of doing it, or a different belief system, or whatever it may be. Maybe you could talk a little bit about how do you get to a solution that the majority, I guess, is what you’re trying to do, the majority of child cardiac surgeons would actually use this tool. Talk to me about the adoption hurdles here. And I know you talked about the model interpretability and I want to jump into that too, but I’m curious even at that earlier level, the buy-in piece.
Eugenio: Honestly, the process was, of course, to try and test it on different hospitals across this organization. So, an organization has hospitals member of the whole system, across Europe mostly, and of course, there are different cultural components to take into account. Also data savviness, openness to new technologies. And I honestly don’t think that it is a real solution to try and help everyone. My philosophy is usually to try and find the solution that has taken the most opinions in consideration, and tries to do the right thing.
And of course, that’s not going to help everyone. It’s not going to answer everyone’s questions, it's probably not going to be used by everyone. But by having conversations with, for instance, some of the leaders in all of the hospitals across all the different verticals—so technological verticals, clinical verticals, and so on—we’re able to have some success, we’ll say successes, thanks to the fact that we have conversations with the leadership there across all the hospitals, and especially the leadership of the association. So, of course, their introduction and their help, really help. So, I’d say even though for instance, we might have not been extremely knowledgeable of all of the different countries or different hospitals, we could rely in between on the leadership of the association that had a better understanding of each country and each leader of the other countries hospitals, what were their characteristics and how to better tackle them.
Brian: Was there a discussion or a definition of what the success criteria would be for this? How would you know at the end of some period of time, that this application that you—this data product you built, this decision support tool, actually, quote, “Worked,” or it increased the survival rate, or whatever the very specific metrics were? Were there some? Because I’m wondering how you go to another hospital and say, “Look what we did over here. Here’s the proof. Here’s how we tested it. Here are the metrics that came out of it.” Was there a very clear definition of what that would be over a specific time period so that you guys would have some evidence for this? How did you measure that? And when did you decide to measure it?
Eugenio: That’s a great question. And honestly, it shows you how complex a machine-learning endeavor is in real life. So, we would have liked to have quantifiable metrics, just because I think that’s how everything should be done. You know, A/B testing, since day one, what you can quantifiably measure the difference. And that’s relatively easy in a case like this.
You can just measure how many deaths there have been in a year or in multiple years, and then compare this with the system. You also have a geographical difference there, so you can do this across all the different hospitals. That’s where it gets tricky because if you think about it—and this is also an issue of experience in other countries—when you start comparing hospitals, when you start comparing results on a clinical level, that’s an extremely sensitive topic. And so, there might not be something that will be extremely easy to achieve on a real-life scenario. So, if you think about it, when you have a tool like this that can clearly spot, okay, this doctor, maybe, is less performant than another one, this hospital, just because we now have a data, is less performant than another hospital, even though the demographics and the pool of people, the sicker the people, the more difficult the cases.
And so, of course, the worse you will perform as a doctor. And it’s, of course, a conversation that’s easier to be had when data is lacking, but when you are able to shed some light on this information, you get into some, let’s say, political components, which might be tricky to handle. And so, that’s something that has been initiated and had been started, but of course, it will take time, just because you get into all of this cultural differences, also components which are really tricky. And there’s not really a right or wrong, even on the technological side. Even if we were able to show data about it, it might not paint the right picture, and so finger pointing, you know on a hospital level might not even be fair, even though it might seem.
Brian: My question wasn’t so much on—how would you know if your team at MIT did a good job on this project or not? Because simply coming up with a predictive model is not actually doing anything. If it just lives in a computer system that nobody looks at, right, you didn’t do anything except to spend money and time; you didn’t create any value. So, I’m curious, was it clear how this European cardiac society, or whoever the stakeholder was, how they would know that this actually was useful? Like, should we keep spending more money on it? Should we write a paper on it? Should we invite other doctors to use it? Or is it—and ma—I don’t know, is this, like, an old project? Is it still going on? I’m just curious, was there a discussion about that ahead of time to know that if we hit this marker, we’re going to feel like we’re on track?
Eugenio: No, for sure. Maybe it was not extremely quantifiable, but we knew that our KPIs are pretty much about adoption, you want to see that tool being used; especially for machine-learning systems, you don’t want them to sit on the shelf. So, the success rate for us was, as you said, trust, actually seeing the tool being used. You can measure this by just looking at the downloads of the app and also understand geographically how this is split. But specifically by looking at how much the doctors trusted the tool and kept on using the tool, that would have been just the best way, it is this best way for us to measure adoption rate, but also how much they stick with something over months - a dashboard. That is usually a tricky business to get yourself in because it might be easy, but it’s also been difficult to ensure success in terms of adoption. The same happened here.
Brian: Were you tracking the, “Okay, I’m going with Option B, the model said B. I’m going with B.” Does it track that Eugenio’s model suggested B; the doctor agreed to go with B, and at some place, we’re going to track that that was the accepted recommendation, the doctor went with it, and then you have some record of that information later, again, not so much for accountability on an individual level, but just more to know they used it, they went with the recommendation, or they didn’t they went with the other option, and we’re going to still track that. And if it’s successful, maybe use that data to retrain the system, right, that there’s something maybe they got lucky, maybe we couldn’t—who knows what the reason was, right, but for whatever reason, the survival rate was higher than expected. Was any of that factored in, or is it a kind of a read-only system it doesn’t record anything about what the actual prediction was and the final decision that was made?
Eugenio: It was for a read-only system. We would have liked to have that feedback loop because as you said, when you have a feedback loop with a human in the loop, then you can improve a system when you can understand where there is a difference. But we didn’t have the ability to have quantifiable data, so to actually have the doctor saying, “I would have liked to do this and the system is saying that, so this is what’s going to happen: I’m either going to stick with my decision, or I’m going to follow the tool.” And it’s mostly probably an implementation issue. It’s difficult, of course, to have also the feedback on the spot from the doctor, but it is for sure something that I would have liked to have because that’s exactly how you improve the system.
Brian: This organization, this cardiac association, did they tell you that we want to see, like, 200 doctors in Europe using this every month, at least once a month? Did they give you any type of indication of what good amount of usage would be, whatever that means?
Eugenio: No, not really, honestly. So, it’s difficult to qualify to. It is more a matter of saying—and I think it’s probably coming also, from a place of artificial intelligence is so rare to be seen already in the hospital settings, in the surgery settings that just the fact that a doctor would have used it, knowing just one child could have survived thanks to that app, I believe that we all would have considered that success. It’s really rare to see artificial intelligence in the operating room.
Brian: This stuff can be really hard to do. I know for—and this is, I’m kind of talking—well, we’re both talking to the [laugh] audience here, but one thing I learned when I was—I think it was Doug Hubbard, who I had on the show here, how to measure anything, that’s always stuck with me in his framing, which is, when this association came to you, they were feeling this pain of the surgeries are high stress, the risk is high, we do not feel like we’re doing the best work we could, that we could be making better decisions. And the fact that they’re saying and feeling that means something in the wild is observable, they are observing that whether it’s through feelings or something. There’s a way that they can point to some information that might be anecdotal. I don’t know what it is, but they’re seeing it in the real world.
So, by definition, that means it’s measurable because they’re feeling that pain or they’re feeling the challenge, or the lack of confidence, or whatever that is, and the opportunity there. That’s usually a good place to use as success metrics for stuff, right? And it might just simply be the most senior surgeon at one of each of the hospitals in the region mentions that they have heard one of their team members using this and found it useful. That is the success metric. It has nothing to do with the survival rate. It’s simply, we got buy-in from a senior stakeholder who said, “We’re trying this. We’re finding it useful.” Whatever.
And that’s it. That might be all the metrics you need. And is it lots of numbers? No. Is it an analytic? No, you’re not tracking software usage, you’re not tracking any of that. It’s simply the feeling that we’re on the right track, we’re going in the right direction, this feels like it’s usable.
And I’m not saying this is what the scenario was, in your case, I’m just trying to help our audience with some of these squishy situations when it feels like it’s not quantifiable, most of the time it is it just may not be quantifiable with analytics or some very hard, a hard number necessarily. It can be more in the feelings category, which is in some ways, it’s tougher. In other ways, it’s sometimes easier to measure because you can talk to people about it and it’s qualitative feedback. But that doesn’t mean it’s not data because qualitative feedback is still data of some kind. It’s just a different kind.
I’m always curious about where how do we know if we did a good job when we get to the end. Where is the end? Is the end deployment? Is the end deployment plus 60 days? How many hospitals? How many surgeons? Having some kind of conversation about that so that the design can incorporate all those factors into the final solution, right. Because as you said, it’s not enough just to build the model itself.
Going back to the interpretability, did you work with the designer user experience person here? Like, it sounds like maybe you had to learn a little bit about how to make the results of the prediction interpretable. Maybe they needed some explainability as well, about how what did the model churn through in order to generate this recommendation and then here are the factors most highly correlated with that. Does this interface go through some rounds of redesign as you got feedback on it? Or tell me a little bit about how you actually design the interface and the experience here.
Eugenio: We did, for sure. We didn’t not, honestly, work with designers, just because it was a bit more of a—it was [unintelligible 00:29:15] research endeavor on the design side. But that also shows you some of the [unintelligible 00:29:20] here. We put so much effort on the machine-learning side and then the user experience is so key, it’s probably even more important than the inner workings. So, how that worked was that by having weekly conversations with doctors where we showed them not just the model, the decision tree out of the model, and also how to use the app, they kept on showing us improvements.
And that was amazing feedback because you could clearly see how sure we can get to some product, we get some MVP, but that’s only something that we think it’s right, at least it’s right for us. But it has to answer their questions, so their feedback about, “Oh, we don’t use this units.” You know, “Here in Europe, we use another type of unit.” Or, “We prefer to have this age, let’s say, in months because we mostly work with children that are so young, like, the majority of them, and so we need to have the age in months.” All of these small nuances were the ones that helped us build a lot of trust, answered our questions and also make it so much easier for them to use and sort of adopt the app. So, went through a lot of different iterations until you get to something that you’re comfortable with and actually helps them.
Brian: Did you guys ever prototype the interface and the experience out in some lower fidelity that didn’t actually have any working model behind it in order to see is it even worth trying to build the model here because of what their expectations are from an experience standpoint? Or did you always have some kind of working prototype with an actual predictor behind it and—or a model behind it?
Brian: Did you feel like you were able to get good feedback at that low-fidelity level?
Eugenio: Yeah, totally. I actually think that when you can iterate on those initial steps, you gain so much more time than if you had to do the same, later on, of course. Of course, we saved a lot of development time by going early on with a very simple MVP. And even later on when we moved to a bit more complex digital solutions, not just pen and paper, we also had an back-end, basically nothing working and it was just a simple logistical regression model; it was not a fully-fledged system. But it could show them actually, okay, if we could move from pen and paper to a simple dummy model that doesn’t actually make a prediction, it just does something, do you like this? Do you like the output? Do you want the probabilities in percentage or do you just want them as high and low? And this really helped to save a lot of the development time.
Brian: That’s important, I think for a lot of people that don’t come from the design world don’t realize that designers work a lot in low fidelity. If you’re really talking about product and user experience design, we like to work in low fidelity to increase the learning cycles, right? The feedback loops come in faster. If you’re always working in really high fidelity, whether it’s a coded prototype or just a high-fidelity design prototype, those are nice and sometimes you need those, when you get into really deep data visualization, you may need to get—the visual design starts to become more of an issue, but for the kinds of things you’re talking about, you really can get some great feedback early. One question: was there anything very specifically that you learned in the prototyping phase that saved you, particularly a lot of money in terms of maybe the type of algorithm or the way you’re going to develop the model or something, a factor that maybe saved you a significant amount of time or effort. Like, you know what, we don’t need to make that. Let’s just not even do that right now because they don’t want it or they won’t use it. Was there one thing that jumped out or anything like that to you?
Eugenio: They honestly didn’t need too much in terms of, you know, fancy systems. It was interesting to see exactly how the doctor is really focused only on make their job and doing it as well as they can, not really too interested in fancy components, fancy solutions, so we were really able to, from the beginning, not focus too much on appearance or fancy components there, but more focusing on usability, readability. And some of the components to, like, we knew that the decision tree that was behind was something that was flexible, it could change, it could be showing new questions to the user, depending on which path they went. And so, making sure that this component was clear to the doctors before actually building a prototype made them better understand that, yes, they only needed very little information; they didn’t have to have the whole fully fleshed information. And so, that saved us a lot of decisions in all of the structural decisions later on.
Brian: I want to jump to the—I know we’re close to our time here, and we didn’t even really jump into the Covid stuff. But I wanted to fast forward to maybe a key learning that you got out of the Covid dashboarding stuff that you had done. And if I could just summarize quickly, it sounds like you had built a decision support dashboard to help state—it sounded like a state but also national level up to the White House, actually, understand where will personal protective equipment, maybe military support, where do we need resources at the zip code level in the United States?
So, you built a model, you scrape data from a bunch of different state websites to put this together into a dashboard. You said you found something happened, though, right, like in terms of the [laugh] readiness to actually either believe the numbers to take action, say—“We’re actually going to buy some PPE and we’re going to send it to this zip code.” What happened in the last mile there? Any key findings that you had in that project?
Eugenio: Yeah, in the last mile, as it usually happens, we have these issues. You know, all of the 95% of the [unintelligible 00:35:43] goes behind the development and UX, and so on. All was then some extent easier than the final components, just because when you see such a crisis situation, you really see all of those other components, which are a bit less technical. So, people’s ability to trust data, and how this varies from a lot of different entities, organizations, countries, jobs as well, and how this really makes everything tricky. And of course, when you have a pandemic, this acts as a catalyst and enhances all of these cultural components.
And so, of course, when we tried to show this tool, and this tool went online on big publications, big newspapers too, you could see how when we tried to help some organizations or some local governments or larger governments, not everybody has the same answer, but not everybody had the same response to this. Some people were embracing help. Some people were having some, maybe, less propensity to take decisions based on data. And sometimes it’s also understandable because that data was not perfect. You cannot rely on perfect data in a pandemic.
And other people inside might have had a data overload, and so they had way too many people providing them with the information, providing them with insights, and so you could clearly see this issue of how can I drive home the value? How can I actually drive value when you have some people who might have too much data, other people that might not trust the data, and other people, again, they might be using the database in different ways and not exactly as you planned. Especially coming from MIT where it’s a very technological data-driven university.
Brian: Was there a key change that you made in the design, or something and to accommodate these different perspectives that you were dealing with? Or did it go beyond anything that you could possibly address?
Eugenio: I think it boils down to delivery. You can package the same information in different ways and try to, say basically, trick the end-user in understanding that information because it actually answers their questions in the ways that they’re familiar with. So, very simply, like, people might not be into dashboards, people might be used to lengthy reports. And by doing this by, delivering this information to a technical person, maybe at MIT, it’s fine to say data, percentage everywhere, overwhelm with facts because that’s, kind of like, the cultural there. But other people might just need short few executive summaries, bullet points, in text, in plain text, just saying, “This is bad,” even though you know that that word is backed by data.
And even the format is paper, digital, all these different ways, just having maybe some of the more senior people that might see acting as the ambassadors delivering the message, all of these different ways helped. And you had to tailor each different message, even though the content was the same, to the audience. And it was fascinating because you can see how the product really has to be developed for the audience, for the right audience, and not for the ones it actually made it to.
Brian: People process information differently. Obviously, you know, an analyst or someone who’s like, “Well, maybe my ass is on the line. If I’m the one who’s going to order 5 million masks and write the check, I’m going to want to know that that’s right.” Whereas maybe a different stakeholder whose job isn’t to actually go and write the check, but to make a recommendation can go with, “Okay, it seems pretty bad there. I guess we should send some materials there.” Like, “I don’t need all the proof in the world. It sounds right. It looks right. I’m getting phone calls from the governor, anyways. This is just more concrete proof. Let’s do it.”
Those are very different perspectives, right? When you’re not asking to justify the spend or resources are limited and you have to take from one hand to give it to the other hand, you can see how that would make a lot of sense. And again, this is why it’s so important to understand the downstream decision-making. What do they believe now? How do they make decisions now?
And then how do we slide our technology into the natural way they already do stuff? Are they used to getting a PowerPoint? And then they make a decision based on the summary from a system because they don’t know how to log in? [laugh]. Or do they, like, going in Tableau and they want to play with the data, and they want to look at it in a chart, and they want to dump it into Excel, and they want to combine it with their own stuff?
Those are very different approaches and there’s not necessarily a right or a wrong one there. It’s really just, as you said, comes back to knowing who’s going to use it. How do they like to use stuff today? How do they do their work today? How do we slide this into the way that they’re going to be most receptive to using it.
It’s not, get them to use the thing that we made, convince them, change them. That’s such a harder thing to do, right, to get them over to your side, [laugh] and then try to tell them how it’s better for them, you know? It’s such a harder thing to do.
Eugenio: And that’s exactly it. Like, one of the other questions was, what do they actually care about? Because the situation might be dire, but as you said, each person [unintelligible 00:40:55] might see the issue in a different lens. There might be nursing homes will [unintelligible 00:41:01] a lot of the nursing homes, might see the issue as dire if it affects people there, if it affects the residents. But, we do really care about the people living there?
And that’s an angle if someone, as you says, may be more on the financial side, on the costs, and so on, you might explain to them how the ROI there is better than what might happen. So, learning to speak all these different languages is extremely important, in addition to understanding that maybe sometimes someone wants a chart, a visual, or just, “It’s bad.”
Brian: Eugenio, thank you so much for coming on the show here. Where can people follow your work and stay in touch with you?
Eugenio: They can follow me across social media on LinkedIn, my website. It’s literally just my name: jayzuccarelli across the socials. And it’s been a pleasure.
Brian: Cool. Yeah, I will add those links. I think I saw them on your LinkedIn profile, so we’ll pop those into the show notes for people. Again, this is Eugenio Zuccarelli. He’s a research scientist at MIT Media Lab. You’re also a data science manager at CVS Health. Thank you for coming on to the show and talking about some of these experiences working with design, and users, and machine learning.
Eugenio: Thank you. It’s been a pleasure.