Azeem Azhar's Exponential View / Bonus

Azeem’s Picks: How to Practice Responsible AI with Dr. Rumman Chowdhury

Listen | Podcast loading...

Algorithms can cause unexpected harm on a massive scale. How do we make sure they don’t?

All episodes

November 01, 2023

Artificial Intelligence (AI) is on every business leader’s agenda. How do you ensure the AI systems you deploy are harmless and trustworthy? This month, Azeem picks some of his favorite conversations with leading AI safety experts to help you break through the noise.

Today’s pick is Azeem’s conversation with Dr. Rumman Chowdhury, a pioneer in the field of applied algorithmic ethics. She runs Parity Consulting, the Parity Responsible Innovation Fund, and she’s a Responsible AI Fellow at the Berkman Klein Center for Internet & Society at Harvard University.

They discuss:

How you can assess and diagnose bias in unexplainable “black box” algorithms.
Why responsible AI demands top-down organizational change, implementing new metrics, and systems of redress.
More details on the emerging field of “Responsible Machine Learning Operations”.

Further resources:

TIME100AI: Rumman Chowdhury
Building an AI We Can Trust with Anthropic’s Dario Amodei (Exponential View, 2023)
“‘I do not think ethical surveillance can exist’: Rumman Chowdhury on accountability in AI” (The Guardian, May 2023)

AZEEM AZHAR: Hi there. It’s Azeem Azhar, our founder of Exponential View. We are moving into an age of artificial intelligence. These tools of productivity, efficiency, and creativity are coming on in leaps and bounds even if they remain incomplete and immature today. Implementations of AI are becoming priorities amongst top execs in the largest firms all over the world. Now, one big question is how do you make sure your AI systems behave ethically and fairly? It’s a huge issue and it’s one I’ve been exploring since 2015 in my newsletter, Exponential View, and over the years I’ve hosted some of the leading experts on this subject on this very podcast. I know that ethical AI implementation is top of mind for leaders like you. So, to help you think through the questions of responsibility, accountability, and power in the context of AI development, I’m bringing back some of my previous conversations over the next five weeks. First up is my 2019 conversation with Dr. Rumman Chowdhury, a pioneer in the field of applied algorithmic ethics. Earlier in 2023, Time Magazine named her one of the 100 most influential people in AI. But at the time of our conversation, Rumman was a director of machine learning, ethics, transparency, and accountability at Twitter. It was a position she held until Elon Musk’s takeover of the company. Today she runs Parity Consulting, the Parity Responsible Innovation Fund, and she’s a responsible AI fellow at the Berkman Klein Center for Internet and Society at Harvard University. Rumman is a foremost expert in the field of responsible AI, so do listen closely. Rumman, it’s fantastic to have you here. Welcome to Exponential View.

RUMMAN CHOWDHURY: Thank you so much for having me, Azeem.

AZEEM AZHAR: We’ve been talking together about the ethics issues around artificial intelligence, this question of responsible AI for quite a long time. I mean I think it’s approaching five years now. How do you feel about the fact that we’re still having this conversation?

RUMMAN CHOWDHURY: I actually think it’s a good thing. I don’t think the conversation should be going anywhere, nor will it be going anywhere anytime soon. What I have loved seeing is the arc, the evolution of the narrative. When I first started at Accenture, my early forays into responsible AI, I used to have this slide before every talk and I would say there are three things I don’t talk about, Terminator, HAL, and how Silicon Valley entrepreneurs are saving the world, because those were the three things everyone used to ask me and then it became the trolley problem. Then I had to add a no trolley problem slide. But what I like is that I fortunately no longer have to present that slide. Now we get asked very substantive questions about what is the role of artificial intelligence in societies? How do we think about the regulatory and political space? And these are the questions that we should be tackling.

AZEEM AZHAR: Some of those still seem like very large questions. Could you give me a specific example of kind of question you’d get asked today?

RUMMAN CHOWDHURY: One of my favorite things that I have seen is just the general public becoming smarter about AI and becoming more critical and rightfully so. One of the questions I get asked a lot when I do public speaking engagements with a broader audience is, “What can we do? What can I, average person, I’m not the CEO of a big tech company, I’m not an engineer, I’m not an AI programmer, what can I do?” And that’s actually one of the questions I’m very happy to see people asking.

AZEEM AZHAR: But what do they want to take action against? What are the bad effects that people are noticing in their everyday interactions with AI?

RUMMAN CHOWDHURY: In the US, we have had people being unfairly arrested because a facial recognition system misidentified them. This is very real. I think we have seen algorithms fail individuals in the US at Stanford University, I think infamously with Ofqual and algorithmic grading. So people have either already experienced algorithmic failures or have seen others who they view as just like them.

AZEEM AZHAR: The Stanford one was the case of how vaccines were prioritized and the system prioritized well tenured medical staff who weren’t on the frontline over those actually dealing with Covid patients

RUMMAN CHOWDHURY: Exactly and more deeply, there was no consideration of individuals like janitorial staff who were arguably some of the most exposed to Covid. So that vaccine distribution only included medical professional, did not even give consideration to cleaning staff and janitorial staff.

AZEEM AZHAR: Why are they specifically machine learning or algorithmic issues rather than general issues of poor systems design?

RUMMAN CHOWDHURY: And this is why I bring up Stanford, and this is what sparked the MIT tech review article that I wrote with Dr. Kristian Lum on what is an algorithm. What the Stanford “algorithm” was classified as a medical algorithm. For someone like myself who’s a data scientist, it looked to me like a flow chart. Literally, it can fit on a spreadsheet. It was a series of actions and a weighted formula as a function of certain very limited variables. If there is one conversation that I’ve seen very little headway on in the last four years, it is literally defining what is AI and what is machine learning. And the problem with that is it leaves us in a place where we cannot define what is the scope of harms we should be addressing and thinking about versus what exists today to address harms.

I think that’s the gray area that we’re still trying to grapple with. So your question to what is possibly different about machine learning as opposed to other sorts of systems failures? Nothing. That was actually our answer. It was nothing. There was literally no difference. What we really should be thinking about is the impact of algorithm decision-making systems, and where I like to look at and what I like to think about is how the security and privacy risk world think a lot about likelihood and impact scales rather than trying to measure, “Does this really count as security engineering?”

AZEEM AZHAR: It’s funny that you talk about the definitional problem. Yesterday I was at an event and Masayoshi Son from SoftBank was speaking and he mentioned how he’s only backing AI companies. And then in the online chat for the online attendees there started a discussion on what AI meant. And for the remaining hour that he was speaking, the online attendees were having arguments that eventually ended up with Aristotle, which I guess is as early as you can go when you’re trying to define this stuff. So definitionally, it’s kind of comedic, but I think today part of the issue is that, of course, that we are finding that machine learning or algorithmic approaches are powerful and convenient tools to be used in systems so they are showing up all the time. And that’s why the discussion about responsible use of these particular technologies becomes so important because they’re not niche, they’re actually ubiquitous.

RUMMAN CHOWDHURY: Absolutely. And I think the difference that is important to think about is algorithmic systems are meant to scale and they are meant to centralize. The benefit of them, yes, does lie in the efficiency of compute, et cetera, rapid decision making. However, that rapid decision making is only both A, useful, but B, potentially harmful because we try to use these systems in a scaled and generalizable fashion.

AZEEM AZHAR: Right.

RUMMAN CHOWDHURY: Or to maybe make it a little less academic, this is the difference between me as a graduate student writing some code on my laptop and finding some sort of result or answer versus that being implemented by one of the biggest companies in the world.

AZEEM AZHAR: The appeal of moving some kind of system out of decisions made by a human into a decision made by an algorithmic system is partly about efficiency, but it’s also about scalability. In other words, it doesn’t cost any more to do this decision a million times using an algorithm than it does to do it a hundred times. And that’s incredibly powerful. But equally at that kind of scale, if there is some kind of irregular or systemic bias, the harms that can be done can be magnified very, very rapidly.

RUMMAN CHOWDHURY: Exactly. I think the dream of scale and generalizability assumes a ground truth and an objectivity that does not exist in the systems that exist today, which are necessary for scaling the systems that we have.

AZEEM AZHAR: What do you mean by generalizability?

RUMMAN CHOWDHURY: That’s a bit of a data science term of art that I’m using more broadly. So specifically, generalizability is critical to using algorithmic system. So an algorithmic system would need to be used out of these specific contexts in which it is trained to be useful. And here is where we fall into the nomenclature trap of when we talk about the word bias. Models need to have “bias” or uncertainty in order to be generalizable, otherwise it’s what we call overfit. So if I only train a model to do a very specific thing and it only works in that use case, and I cannot, for example, train a model to identify Covid in patient data for particular kinds of patients, if I cannot then take that model and use it in India, in China, in Afghanistan, use it on children, use it on women, use it on people of color, people who’ve been smoking for most of their lives, then it is not actually a useful model. But what we do today is take an algorithm that has been trained on very limited data.

AZEEM AZHAR: Bias is an important part of training any kind of algorithm because it’s the bias that allows the algorithm to discriminate between an apple and an orange or a cat or a dog. So when is bias good and when do we mean bias in a bad sense?

RUMMAN CHOWDHURY: The first talk I ever wrote in this field is this talk I still give today, and it is called, “What do we talk about when we talk about bias?” I have given this talk to everywhere from graphics design organizations to the FTC quite literally, to clients, to data scientists. And it’s still a conversation we can have. For a data scientist, bias means a very specific thing. It’s a quantifiable value that exists in your algorithm. It means something very different in a legal perspective and often in policy, bias actually has a negative connotations. It means a bad thing. But to a data scientist, it’s actually independent of any normative judgment. Saying, “My model is biased,” does not actually necessarily mean a bad thing. One might want to create a biased model. So for example, if you are a creating recommendation system to advertise things for babies, you want to be biased towards people who have babies. You don’t want to broadly advertise your goods to people who do not have babies. And that’s an example of how the same word can be misunderstood or used in different contexts and language matters and language and context matters very much.

AZEEM AZHAR: So the word bias, just for people who are listening, actually has these different contexts, one of which is a rather technical term that is almost an objective measure of the nature of the data and the second is a term that formally could be a legal term or could be more informally, more a cultural or political comment on a particular situation. So we get to this question then about the bad effects that we see with these models. You described one relating to facial recognition and misrecognizing somebody. We’ve got some examples around the COMPAS algorithm that was used to guide sentencing for people who were at risk of re-offending. There was the example of a resume screening system, which seemed to prefer male candidates over female candidates just on the basis of names. How do these problematic outcomes get built into products? It doesn’t feel like it’s done intentionally. Is it done incompetently or is it done for some other reason?

RUMMAN CHOWDHURY: So one way to think about negative outcomes of algorithms is to divide it into two camps. We need to appreciate that all of these algorithms are probabilistic and not predictive systems. So what does that mean? That means around any given output of a model, there is a degree of uncertainty. So when I say that something is 95% likely to happen, that means in probability, it will, but there’s a 5% chance it will not. And that’s usually how model output looks. And actually 95% is rarely the case. You’re actually more looking at somewhere in the eighties for a very, very good model. So any given output from my model… Let’s say I’m deciding who should get a loan and who should not, it’s about 80% likely to be correct. And this is where data scientists really dig in. The way we think about it, and where bias shifts from being an objective thing to being a bad thing, is if that bias is systemic. If in that 20%, you only have low-income people of color, because we know that, historically, banks discriminate against them, or we have women, because historically, banks discriminate against women, then we have a problem. So, there’s two ways a negative output can be seen. One can be that your model is actually okay, but in this case it was wrong, because there was some probability that it was going to be wrong. That still means you deserve to have the problem corrected, but it does not necessarily mean that the model is flawed. The second is the instance in which it is that 20% that is biased against a particular kind of person. So if there is a pattern to who that 20% is, to who the model is wrong for, then we have a systems-level problem.

AZEEM AZHAR: Where does that systems-level problem come from?

RUMMAN CHOWDHURY: So, it can come from a bunch of different places. One, and this is a lot of the conversation on AI ethics today, is problematic data. Our data is biased and flawed and reflects the negative aspects of our society, and that is just what it is. I remember in the early days of AI, people would talk about how AI is a mirror, and sometimes you don’t want to look in the mirror, because then you see all the pockmarks and all the scars and all the bad things, and that’s sometimes what an AI system or an ML system shows us. Second can be simply flawed institutions. As individuals, it is sometimes hard to see the larger systems and institutions that we are in. The ML systems that we build aspirationally should not reflect the flaws of the larger institutions; but unfortunately, sometimes they do, because they have to live within them. So there’s this individual aspect of what is the decision-making that has been made historically? But then there’s a bigger question of what is the larger institution in which this algorithm lies in, and the broader decision-making that’s happening around it that’s influencing it?

AZEEM AZHAR: Some of these appear to be rather more identifiable problems than others. I mean, there are certain times where we can all agree that the law says that processes or systems should not discriminate against people on the basis of protected classes like gender. So if that is a process involving many humans, it should be the case for an automated process. And that feels like, well, that’s not really an issue, and a system that has that discrimination prejudice in it is going to fall afoul of the law, and whoever’s operating it or whoever built it will have to contend with that. But it also seems in the discussion of responsible AI and the question of machine learning ethics, that there are types of problems that don’t lend themselves so easily to existing legal frameworks, right? Perhaps where I haven’t put any gender data into my dataset, but the output of the model already seems to prejudice against men or prejudice against women. What’s common about that latter type of trickier problem where it’s not immediately clear that there’s been some kind of legal infringement?

RUMMAN CHOWDHURY: So, you’re tapping into the whole narrative around proxy variables. So sometimes, there are variables that may go into training your ML system. There’s pieces of data that may not directly reflect a particular protected class, but is biased because of the social outcomes that have blended to be that way. Let’s say you look at whether or not the individual has a college degree. That’s a great one, right? In the US, that would be sharply divided along racial lines. Broadly, that term has come to be called digital redlining. Redlining was quite literally the history of bankers drawing red lines around districts that were predominantly Black and choosing to not grant them loans. So historically, Black people were directly discriminated against. However, in future iterations of algorithms and models, they would use, for example, zip code in the US to determine likelihood to, let’s say, default on a loan or riskiness. What that ended up doing was reflecting that redlining through a roundabout manner. I’ve come to call that indirect risk. We have direct risk, which is specifically, there are ways in which individuals are directly discriminated against; indirect risk, which is kind of hard to untangle. I can appreciate how a data scientist might throw up their hands and say, “Well, how was I supposed to know that? I don’t know the history of Chicago in the 1930s,” but I think this is where we really need to think about systems of redress. We cannot nor will we be able to ever address every single root cause of every problem. What we can do is, to the best of our ability, do the appropriate investigations when a model is being built and developed, which we do not have today, and on top of that, offer appropriate systems of harms, escalation, and redress, which is something we definitely do not have today. What I’m hearing is people want to be able to interact with the system. They want to be able to say, “Hey, there’s something going wrong here.” Or to frame it maybe as another example, I was once asked by somebody, “What’s the difference between a biased image detection system that may wrongly think a kid in a hoodie is always trying to steal in a store, versus a biased security officer who’s going to target teenagers?” And I told them the difference is you can always go to that security officer’s boss. There is a management chain. You can go, you can complain. There is no way to do that with an AI system. We have no systems of redress.

AZEEM AZHAR: We have the companies that operate them.

RUMMAN CHOWDHURY: Yeah. But are we going to email Satya now every time we’re upset with a product that they’re building? I mean, it isn’t really obvious what people can do. I think the Apple Card example is one of the best examples that we see.

AZEEM AZHAR: This is the Apple Card, which was a credit card that Apple and Goldman Sachs launched back in 2019, I think?

RUMMAN CHOWDHURY: What happened was this one individual and his wife both applied for credit cards. Even though, if I remember the situation correctly, he had a lower credit score and made less money, he was approved for the card, and she was not.

And to sort of tell the whole story, they first went to Apple customer service, and they said, “Hey, we think there’s something wrong here. She’s not getting the credit card. I am.” And what they were told by the customer service representative is, “Well, the algorithm made this decision. I really don’t know why.” And it was then that he took to Twitter about it. It became this big story, a big scandal, and Goldman Sachs and Apple got investigated, and you can actually look at the findings.

But the pivotal part of that story to me was that he did try. He did go to Apple, he did ask them what was wrong, and he was not given an appropriate system of redress, and so he had to take to Twitter.

AZEEM AZHAR: That’s a really old problem, right? So it’s a problem with credit scores, which are very, very common, long-held, used in the US for five decades, and a similar length of time in the UK. And that credit score’s done an amazing job at broadening access to financial products, which in turn, has given people the chance to own their own homes and own cars, and so on. And yet, those few credit agencies are extremely difficult to secure redress from when they wrongly score you. As I’m sure many of listeners and I certainly have been on the wrong end of credit scores, with very, very slow, difficult, and complex ways of getting those scores attended to.

RUMMAN CHOWDHURY: I think one of the positive things about all of this conversation around algorithmic bias is that we’ve come to realize that the problem is not the algorithm, it’s the institution. The lack of explainability is not in the model. It is in the institutions that implement the models.

AZEEM AZHAR: So if AI is a mirror on society, is one beneficial outcome that it’s becoming a sandbox where we can identify societal problems that have been swept under the carpet? I mean, in a way, the power of some of these systems, we actually end up with this statistical evidence that there are issues. So is this mirror a kind of helpful one in those discussions?

RUMMAN CHOWDHURY: I think it is a helpful mirror. The unfortunate part of it is that there have been plenty of people who have been systematically pointing out these problems for very many years, and it’s unfortunate that we need to actually have empirical evidence of its existence for it to be taken seriously. But that being said, I am grateful to have the ability to contribute to an ongoing conversation in a way that helps move it forward by providing that empirical evidence.

AZEEM AZHAR: Is this something that’s been contained within the machine learning ethics community?

RUMMAN CHOWDHURY: I think the place that I have really liked seeing the narrative evolve in a very positive way is in the medical community. There’s a lot more conversation around biases when it comes to the medical profession that I personally had never seen before. And that came about because we started to have literal empirical evidence. So one example was a kidney transplant allocation algorithm. So the algorithm did not do the allocation. It helped prioritize who got to go on the list. It was biased against Black patients, and when they dug into why, it’s because Black patients’ symptoms are taken less seriously, and that started being reflected into this algorithm. That leads to the discussion of we now have empirical evidence that doctors discriminate against people of color. That’s something that medical profession had largely denied for a very, very long time. And we have plenty of other cases. There was a really great article I was reading about COVID fund distribution, and this was a sort of federal government fund distribution to different hospitals. And what it found was that there was bias against hospitals in low-income communities, because their patients tended to lack medical insurance. Therefore, they did not have as high insurance requests as other places did. And then that ended up becoming ingrained into the algorithm, where affluent hospitals literally just have higher bills. Why do they have higher bills? Because they have more affluent clientele. That got reflected in some of the COVID-19 funding allocation because of the use of an algorithm.

AZEEM AZHAR: So in a lot of the cases that we’ve discussed today, we are really talking about existing problems that get magnified and scaled up using technology. So if we need to fix this, the AI part is only part of the problem, right? There’s a wider issue in the system that needs to be tackled. So how should companies think about becoming what they need to change in order to become more responsible?

RUMMAN CHOWDHURY: One of the last things I did while I was at Accenture was work with three other brilliant folks, Henriette Cramer from Spotify Labs, Jingying Yang, formerly at Partnership on AI, and Bobby Rakova, who worked with me at Accenture, to do this study. And what we did was interview 25 individuals who work in responsible AI at various companies, and we asked them their perspective of what’s working, and what’s not, and what companies ought to do? And what’s interesting is what we tied it to was research on organizational change management. So how do we drive that change? And historically, what has worked in companies trying to drive change? Because we are not just talking about opening up a new department in your company. We are talking about a change and a shift in values and ways of working, and we need to appreciate it as that. But specifically, what the rubric kind of boiled down to was, number one, is tone from the top. So leadership needs to not just talk about it, but actively implement it, which is why teams like mine need to be close to leads and have peers and counterparts who are the people who build the systems and not be sort of structurally placed as an aside. The second is systems of rewards and punishment, both actually for models and for people. So what are the metrics by which we measure success and failure of AI systems, and what are the metrics by which we measure productivity and non-productivity of individuals? The third is having systems of redress of harm, and how do we react when bad things happen? You know, we react when bad things happen. And those were kind of the primary ones for companies to think about and it’s a lot to bite off.

AZEEM AZHAR: It’s an ambitious objective, but without any clear incentive for the company. The company is going to respond to legal risk. So you can persuade them and say, unless you fix this system, you expose yourself to gender discrimination risk. And that could be expensive. But in many of the cases that we’re talking about, we’ve got a lot of cases where the problems with these systems doesn’t show up in necessarily the kind of contravention. So what constructs the incentive for these executives to be more responsible?

RUMMAN CHOWDHURY: In most governance risk and compliance functions, yes, legal risk does play a role and it plays a very big role. I think an underexplored and an outsized impact for algorithmic bias is actually in reputational risk. And reputational risk is very much something that risk and compliance folks think about for any good GRC work, it is part of their rubric of understanding and what reputational risk is usually a leading indicator for is where legal risk is heading. So all of the things that folks like myself and a lot of my counterparts were talking about from a reputational perspective a couple of years ago, we’re starting to see get put into draft legislation, codified into law. So it makes me very happy to see people using language of risk and harm in legislation when talking about it. It makes me happy to see what the FTC is saying very boldly about companies and ethical use. So I think smart companies are looking at reputational risk as a way to understand future legal risk.

AZEEM AZHAR: And when we’re looking at building these systems, the tension between the performance of a machine learning system and your ability to diagnose what might go wrong. So there’s this whole field that’s called explainable AI. And what we’ve seen in the last decade with the rise of this technology called deep learning is that these deep learning systems are very powerful. They’re amongst the best at making predictions of the types that you and I have discussed over the last 30 minutes or so, but they’re also very hard to inspect, which means you can’t really lift up the hood and say, because X dial was turned here and Y dial was turned there, this is why we got the results that we got, which was unfavorable, and we need to change that. They are what’s known as a black box. These black box style algorithms are the ones that are getting all the airplay at the moment because they’re just so powerful. How big a problem is it in the field of responsible AI that these popular systems are not inspectable to a human?

RUMMAN CHOWDHURY: So there’s a lot of really interesting research that has been happening on explainability. So just to explain one of my favorite ones, it’s taking a black box algorithm, training an explainable algorithm on it, and any difference in the output is basically your bias, right? So you have an explainable model, you understand what’s going into it, you compare it to a black box. So there are ways in which we are doing explainability. And now to answer the broader question, explainability is a bit of a red herring. And there’s a couple of reasons why. Number one is that actually most companies are not even doing deep learning in any sort of meaningful way. They are actually relying on very blunt regression systems or very explainable data science models, in some cases because the law literally does not allow them, because it does not fall under explainability mandates. And this is why it’s a bit of a red herring. The second reason why it’s a bit of a red herring is I would like to ask explainable for whom? Even most of our explainability models or methods of approaching explainability give you an output that is understandable by a data scientist. And actually what people mean when they mean explainability is actually more about responsibility and not so much, what was the thing you tweaked that led it to be like X? Most people who would interact with an algorithm in any meaningful way do not care, nor will this impact anything in their lives as much as the holistic output of the model will. So I think there are ways in which we can induce explainability for a black box model without answering the literal narrow technical question of, how do we make this model explainable?

AZEEM AZHAR: The question though would be, if you take a look at some of the really big models that are becoming popular, there’s one called GPT-3, which is produced by OpenAI and listeners to the podcast would’ve heard me talk about it with Sam Altman, the founder of OpenAI. GPT-3 is a language model. It’s one of the most complicated machines ever built by humanity. 175 billion parameters. The way I think about a parameter just showing my age is that it’s like a graphic equalizer, of the kind that you used to have on an old Hi-Fi. And each parameter is a little slider that you could move up and down. And I had five parameters on my graphic equalizer, and my friend Manar had seven. He had a much better graphic equalizer. OpenAI’s GPT-3 has 175 billion of these individual sliders that you slide up and down to get output from the model. So that is hard for us to figure out what’s going on, but it’s a very powerful system. So how do you think about being able to check or test that something like that is a responsible piece of technology?

RUMMAN CHOWDHURY: It’s actually something I am thinking very deeply about in my role at Twitter. What I’ve come to realize there are two classes of problems. There’s the what problem and the why problem. And the what problem is, what are the ways in which this model could be doing something bad? So in GPT-3, that’s actually [inaudible 00:32:01], one would identify use cases that would be potentially harmful. For example, what does it say about people of color? What does it say about women? What does it say about the notion of beauty? What does it say about the notion of intelligence? One can think of a very long list of questions, we could interrogate the model by and get output and say, okay, we can feasibly say that we’re not seeing biases or we are seeing biases based on these different things that we know socially can be hot button issues.

The hard part is the why is it happening. So let’s say you go down this avenue and you ask it about women and it says something really terrible. What you’re getting at is, all right, if you then have to diagnose the problem, if you now have to solve the problem, what do we do? And I agree with you, that is an incredibly, incredibly difficult problem. But the thing is that it’s also not just limited to deep learning models with trillions of parameters, that also exists in simpler models that act in systems. For example, in hiring, you can have multiple models influencing the output of who gets hired. If you find that the process is biased, you’ve answered the what question, untangling the why question becomes very, very difficult. And you’re absolutely correct. And I would say that extends beyond having a complex model to even saying you have a simple model in a complex system.

AZEEM AZHAR: We know that these systems are statistical, they’re non-deterministic so they don’t get it right every time. And as a product manager, you release a model into production and you say, does it have an acceptable failure rate? And we do this with all of our products, right? The failure rate on the Pratt & Whitney engines on a two engine jetliner is much, much lower than the failure rate on the electric motor on one of those hoverboards you buy for 75 bucks from your online retailer. And we rightly don’t get as cross about the hoverboard failing as we do about the airliner failing. But a lot of the things that we’ve talked about today seem to be about testing. Now, the thing that strikes me about that process is it reminds me slightly about a conversation I had in a previous podcast with Bruce Schneier and we talked about a car company, Volkswagen, which had to do some emissions testing. And what the guys at Volkswagen did was they said, “Well, we are kind of not doing so well on the emissions testing. So rather than fix the emissions, we’ll create a system that figures out that when it’s being tested for emissions, it lowers its emissions output.” And so one of the things I’m curious about is that if we have this kind of posterior eye after the fact testing of harms and then go back and squeeze the problem at the front to avoid it, do we actually ever design systems that work well or are we designing systems that work badly and we’ve just put some kind of goalkeeper between those bad outcomes in the real world?

RUMMAN CHOWDHURY: That’s such a great question. And I think what you’re talking to is one of the ways in which I hope to see the field of responsible AI evolve. Right now we live in a world of reactive behavior. Something bad happens, we run around, we put out a fire. Where we need to do is move into a world of proactive behavior, creating the right kinds of investigations, norms, et cetera, such that when something is being built, we are understanding and addressing the harms before someone is actually injured by it. So there are a lot of similarities, as you mentioned with your engine analogy, et cetera. And to an extent it holds because we cannot assume 100% perfection of literally anything we built, let alone technologies. Where algorithmic systems differ is going back to this idea of scaling and generalizing. People are often not aware, nor can they consent to an algorithm being used. And again, I go back to alcohol and algorithm decision making for student test scores. They did not know that it was going to be decided that way. Things were not changed or addressed until after the harm was found out. So that’s kind of where it would differ. But this is also the idea of an audit, which often happens after the fact something is built versus one of the in-process steps. So again, like a bifurcation here, there is a difference between how I understand and investigate algorithmic bias in an existing algorithm versus what we will try to do to address these harms by changing the process. So one of the things that I’m looking into/thinking about is this idea of responsible MLOps.

AZEEM AZHAR: That means a Machine Learning Operation. So what’s happened in the last seven or eight years as machine learning has become an absolute must have for businesses is that it’s moved from being this thing that strange data scientists built in a corner to something that’s deeply operationalized in the everyday fabric of a business’s operations. And so the new discipline that is emerging is called MLOps or Machine Learning Ops. You say we need to have responsible MLOps, and so what does that look like?

RUMMAN CHOWDHURY: I am new to the field of MLOps, so the term responsible MLOps doesn’t even exist. So I think we just literally coined it in this podcast, or at least formalized it in this podcast. I’ve been talking about it internally at the company for a little bit. So here we are. We have made history, let’s plant a flag. It is still a very nebulous and growing field. I would think of it as almost like the industrial revolution where we moved from custom goods being built by skilled individuals to a standardized set of processes. So the Henry Fords of the world created things like assembly lines, and as a result we get faster and standardized products. Kind of the same thing with machine learning. Instead of it being this sort of alchemy of specialized behavior, especially because we’re demanding for it to be used at scale in a generalizable fashion, how do we start creating the processes so that we know what to do at every step of the way in this giant mystery? So, that is actually the same problem that you just asked in your previous question of, right, well, how do we move from a place where we’re not just constantly reacting to fires? Well, we have to actually have standard ways of looking at things from a responsibility perspective, so that if you’re a regulatory body, an internal and external auditor, you can ask the right questions. You can say, did you do the things you were supposed to do? Because right now you can’t even ask that because we don’t have a list of things you’re supposed to do. There is no checklist.

AZEEM AZHAR: Right. I think it’s important to think about what happens in other industries. So in pharmaceuticals, you end up with these clinical trials that test for the harms. But even before you get to the first clinical trial, you’ve had to go through a very, very clear process with a lot of checkpoints so that you’re not going to create something entirely crazy by the time you run your first tests. And the same is true in the financial services industry. It’s very, very heavily regulated for the kind of products that can come out. And yet despite that, we still see a lot of innovation within the financial services industry.

RUMMAN CHOWDHURY: Yes, and I would actually even add that it’s not in spite of that that they have innovation. It’s sometimes because of it. I think one of the hardest things for companies to do if they’re taking this technology seriously is understand what is and is not appropriate action to take. There is a lack of adoption of ML systems at scale because of this very real and correct fear that you will break the law or have a negative externality that was unforeseen. This narrative of unintended consequences have become hammered into people’s heads. And now the very real question they’re asking is, “All right, so what do I do?” So without those clearly defined lines, we cannot figure out how to innovate because you have no idea if you’re going to be driving off the road.

AZEEM AZHAR: Is that a plea on your part for there to be a clearer set of guidelines or laws coming from regulators or from legislatures?

RUMMAN CHOWDHURY: It is a plea from me, for everybody working in this field to give input that is actionable. We’ve had a lot of people contributing thoughts, opinions, perspectives. But we can’t just have lots of high level 25 page documents on our thoughts and feelings. We actually need someone to put pen to paper and say, “These are the things that one must do.” And yes, it’ll be problematic and you will be yelled at and everyone will tell you you’re wrong. But then the next person will iterate and at some point we will get to a better place than we are at today.

AZEEM AZHAR: So you recently joined Twitter. You’d started your own company working in the field of responsible AI with some new methodologies. Twitter decided they wanted to work with you. Why did you want to work with them?

RUMMAN CHOWDHURY: I made some very bold statements in my previous answer, and I wanted to say that that was literally why I started my company. Parity was an algorithmic audit company. The intent was to combine legal and risk functions with data. Twitter has had a team and has had a team since 2019 called Meta, ML, Ethics, Transparency and Accountability. And what I loved about Twitter was that it appeared to check off all of the things on that list that I mentioned earlier that companies should do. So I had started talking to the person who was my boss today, Ari Font, probably in August, and it took months. This whole process took months. I asked to talk to literally everybody. I think some of them were even surprised. I asked to talk to the comms lead. I asked to talk to policy people. I think I talked to everybody at the company in senior leadership, and it was very intentional. It was really to understand is this company quite serious about what they want to do? And my take was, yes, I would not have left what I was doing otherwise. The other thing is my lead investor asked me a very important question and this is somebody I’ve known for years and he’s the world of impact investing. And he said, “All this stuff aside, what is it you want to do? What is it you want to accomplish for yourself?” I told him, “I just want to make sure that this field, this industry of responsible AI goes in the right direction.” That was actually why I had started the company. I saw Twitter as an opportunity to have an outsized impact. A couple of things I love about the company, one is everything we do is publicly accountable and immediately so. The public is ruthless and they should be. They should tell us when we’re screwing up and they should tell us when they’re doing things right. And there is no faster way to do that than literally on the platform this company builds. And two, structurally from their ethos and also with this team, they were quite serious about wanting to do it and really just wanted the right people to help guide that initiative.

AZEEM AZHAR: And tell us about one breakthrough you’ve had in your first few weeks at Twitter.

RUMMAN CHOWDHURY: I think one is that thing I was saying about the what and the why problems was something I figured out when working at Twitter. Two is I think the big thing is the company’s really dedicated to transparency. So recently we released the results of a bias investigation we did on our image cropping algorithm. So back in October, we had input from the public that our algorithm that was used for image cropping was biased. It was biased towards white faces, against Black faces, and there was talk that it would crop women in their chest and men in their faces. So we actually did a bias investigation and we published a blog post about it that’s understandable by a wide range of people. You don’t need to have a data science degree to understand it. We published our technical paper for the data scientists to read and academically minded folks, and also importantly, we shared our open source code. And that’s really critically important. And what we want to do moving forward is literally share everything. And Twitter’s really the only company I’ve ever worked with or seen that says, “No, actually share your code, share your data. This is going to be painful for us, but we want to learn and grow positively.” What our conclusion was was that the bigger problem was representational harm. In other words, people just want to share their photo. They don’t want it to be automatically cropped in, quote unquote, “an unbiased way.” The fundamental problem was that this algorithm was making a decision that people did not want made for them. And the ultimate decision was to literally remove the algorithm. And now if you take a standard aspect ratio photo and you post it to Twitter on mobile, we’re working on it for web, but on mobile, it will show the entire photo. If it is not standard aspect ratio, it just center crops. So instead of forcing an algorithm decision-making system, when people actually did not ask for one, nor did they want one, we listened to what people wanted and just removed it.

AZEEM AZHAR: A couple of years ago you wrote an op-ed in Wired magazine where you said Silicon Valley culture celebrates a technical over the social. To what extent do you think that that is still true?

RUMMAN CHOWDHURY: I think it is still very true. I think that we have not as an industry, and when I say industry, I mean data science, ML, AI tech industry have found meaningful ways of incorporating what social scientists are saying. Here’s where roles like mine are very critical. So if you were to ask me what I’m doing at Twitter today, I’ll say we’re deeply investing in the kinds of tangible engineering systems. I mean, folks will see just based on the roles on hiring from hiring a lot of engineers. Why is that? Well, because we need to be able to translate what people are saying into things that are integrated into our systems. And unfortunately, that is the language and that is the currency of this industry. What I’m happy to see is increased engagement with those fields. Previously, they were completely ignored. So I am happy to see a lot of these folks engaging with tech companies. Now it is folks like myself whose job it is to make sure that that engagement translates into something meaningful. So yes, it does exist still today, but I see it less as a handicap and more of a leverage point for people like myself to be successful.

AZEEM AZHAR: So we’re several years into the discussions around responsible machine learning and AI ethics, and we’ve moved past the initial point of identifying the problem where people have been working on what solutions or systems changes are required. You’ve identified some of the cultural issues that are increasingly being tackled. So let’s assume this momentum continues for the next few years. If we speak again about this issue in say, three or four years time, how will the world feel different for the average person using their smartphone with a smart voice assistant or doing searches on the internet?

RUMMAN CHOWDHURY: I think it will go hand in hand with the evolution of these technologies. So the world will be very different because we’ll actually be using more of these systems and we’re using more of these systems because we are confident in that they will actually work for us. I think it is underestimated how much people do not want to interact with these systems, not because they’re scared of technology, et cetera, because they know that it can be potentially harmful and they have no way of fixing a problem if a problem arises. So my hope is that if folks like myself are successful and good at what we do, that we actually see increased adoption of technology and it’s done so in a meaningful way, in a way that actually benefits people’s lives and not just the lives of the affluent, privileged, tech enabled few.

AZEEM AZHAR: Well, Rumman, it’s been fantastic to speak to you. Thank you so much.

RUMMAN CHOWDHURY: Thank you very much, Azeem.

AZEEM AZHAR: Well, if you enjoyed this discussion, please check our podcast feed where you can find so many of my previous discussions on the political and environmental and social repercussions of AI, including episodes with Kate Crawford, Meredith Whitaker, Dee Kai, and Joanna Bryson. To become a premium subscriber of my weekly newsletter, go to Exponential View at www.exponentialview.co. That’s Exponential View at www.exponentialview.co. To stay in touch, follow me on Twitter. I’m @Azeem, that’s A-Z-E-E-M in America and A-Z-E-E-M in the rest of the world. The podcast was produced by Marija Gavrilov and Fred Casella. Our assistant producer is Ilan Goodman and our sound editor is the inestimable Bojan Sabioncello. Exponential View is a production of E to the Pi I Plus One, Limited.

Latest in this series

All episodes

This article is about DISRUPTIVE INNOVATION

Follow this topic

Following

Azeem’s Picks: How to Practice Responsible AI with Dr. Rumman Chowdhury

Latest in this series

This article is about DISRUPTIVE INNOVATION

Partner Center

Explore HBR

HBR Store

About HBR

Manage My Account

Follow HBR