In this podcast, Shane Hastie, Lead Editor for Culture & Methods, spoke to Emerson Murphy-Hill about how measuring developer productivity is tricky, why team dynamics and psychological safety matter more than things like meeting load, the impact of systemic bias and how new AI tools are shaping equity in engineering - sometimes helping, but sometimes risking new kinds of unfairness.
Key Takeaways
- Measuring developer productivity is complex, involving both quantitative outputs (like code or features delivered) and qualitative self-reflection, with no single metric giving a complete picture.
- Research shows that team dynamics - especially psychological safety - have a stronger correlation with productivity than factors like meeting load.
- The shift to remote work introduced challenges in measuring productivity due to self-selection biases, making it hard to compare in-office and remote workers directly, as people tend to choose the environment where they feel most productive.
- Systemic bias exists in engineering processes: women, engineers of colour, and older engineers receive more pushback during code reviews and are often asked to do fewer reviews.
- Anonymous code reviews have been shown to reduce bias in feedback without harming productivity, and are now offered as options in some large tech companies.
Subscribe on:
Transcript
Shane Hastie: Good day, folks. This is Shane Hastie for the InfoQ Engineering Culture podcast. Today I'm sitting down with Emerson Murphy-Hill. Emerson, welcome. Thanks for taking the time to talk to us.
Emerson Murphy-Hill: Of course. Happy to be here.
Shane Hastie: My normal starting point with these conversations is who's Emerson?
Introductions [01:05]
Emerson Murphy-Hill: I'm a research scientist at Microsoft at the moment. I've been here eight months, maybe not quite a year. Mostly my career has been working on developer tools and developer productivity and right now focused on AI tools. And in particular, actually at the moment I'm part of the Excel organization at Microsoft. And you might know Excel as the world's most popular functional programming language sort of on the end user side. But also I've got lots of experience on the professional developer side. So before this I was working with Teams working on Visual Studio and VS Code and GitHub.
And then before that I was at Google working on their internal developer tools. They have a big mono repo, so they've got some somewhat specialized developer tools there. And in particular on their engineering productivity research team. And before that I was a professor at North Carolina State University researching developer tools. So yes, that's my background.
Shane Hastie: So let's tackle the big one first, or one of the big ones first. What is developer productivity?
The Challenge of Measuring Developer Productivity [02:13]
Emerson Murphy-Hill: Yes, so this is a tough question, right? Because I think executives want to measure it. And even as individuals, I think we care about being productive. Me, all the time when I feel like I have a productive workday, I feel really good about myself and I can point to these things that I did or I can just feel like I was in the zone, I was in flow. But of course executives want to know about it too. And especially in the AI world that we live in, lots of people want to know about, if I invest in these AI tools, is that going to pay off in terms of developer productivity? So clearly an important topic.
There's been lots of different ways to measure it, lots of different frameworks and none of them are just that single number that you want, to say, "This thing is more productive than that thing". But I think one thing it comes down to is... I would characterize it as two different things. One is about product and one is about process. So in terms of product, what are you outputting? That could be lines of code, that could be features, that could be fixing bugs, and you can measure that in a variety of ways. So you can look at number of lines written per day or you can look at number of pull requests written or you could map that to features.
Again, not the greatest thing in the world. And when I think about what makes me productive, it's often not the number of pull requests that I'm writing. I think what's most important to me there when it comes to productivity is just self-reflection. And so at Google for instance, we would do large scale surveys. Actually, Microsoft does this too, where we ask developers about how they're feeling about their own productivity? And in some way you asking people to self-rate how productive they were. On one hand you look at that and you're like, "Well, why should we trust people to be able to self-rate?" And how does that allow you to compare across people and across products?
And definitely all those are valid concerns too. But I like to think of humans as entities that are taking in all sorts of signals, ones that are easily observable and ones that are much harder to observe, synthesizing them all together and they can take that all together and their gut feeling about how productive they are, I think all in all, is pretty reliable. And so that's typically how I think about it, those two aspects.
And some of the work that I've done in the past includes looking at the factors that drive developer productivity. There's, for instance, open offices versus closed offices. I'm in a closed office right now, but there's open offices just behind me, right? And the industry generally has been shifting to open offices over closed offices to many people's consternation. But there's a question of how much open offices changes people's productivity as opposed to closed. And it's not obvious, necessarily. Open offices are nice because you chat with your colleagues, sometimes you hear conversations that are relevant to you. That's just one factor that may influence productivity.
There's other things like what programming language are you using? How competent is the team of people around you? Are you getting yourself into a lot of technical debt? And so, one of the pieces of research we've done is to try to weigh those different factors. So we ran a study at three different companies at Google, ABB, and I'll send you the link later. I'm forgetting the name of the third company that we did it with.
It'll probably occur to me in a minute, but a smaller company than those two. And we just asked them a variety of questions about these things that prior research suggests drive productivity and how that actually relates to their self-related productivity, how those two correlate together? And as it might not surprise you, the people factor rose to the top. Things like having psychological safety on your team, turns out it's a big, strong correlate with productivity.
Other studies we've done have revealed some surprises. So for instance, meetings, I think is something that people think a lot about when they think about productivity, right? Meetings slow me down. I certainly have a lot of meetings these days and I feel like if I had less, I'd be more productive. And so there's questions about, on the other hand, meetings help you get unblocked too, right? They provide you information you wouldn't have gotten otherwise.
The Impact of Team Dynamics and Psychological Safety [06:35]
And so I think, when you ask people to complain, ask people about what's slowing them down, a lot of times they'll say, "I have too many meetings". But one study we ran at Google, we actually looked at the amount of meetings people had on their calendars and we looked at a few different ways to measure productivity and we looked at how they correlated. And in fact, we had this very clever statistical approach called diff and dip, where you look at changes over time and you look at how the changes correlate, right?
So what you might expect that is if my meeting load increases, my productivity would decrease, at least according to the traditional belief about meetings. And as if my meeting load decreased, my productivity should increase. Well, it turns out that wasn't true. It turns out there wasn't a relationship between changes in meeting load and changes in productivity, which contradicts this belief that many people, and myself sometimes, have about the relationship between those two things.
And so this sort of research about productivity has been a big part of my work over the years. And one of the things that I love about working in big tech is we've got lots of engineers doing lots of amazing work. And at scale you can study these folks and understand things about productivity. And, like I said, I think we're all individually, also as engineers, are interested in improving our own productivity and what makes us productive. And so not just interesting, it's also personally relevant to us all, I think.
Shane Hastie: Where does remote or not factor into that?
Difficulties in Comparing Remote and In-office Productivity [08:14]
Emerson Murphy-Hill: This is a good question. I think that a lot of the research that I did on this was either pre-pandemic or during the pandemic. And I think we did do some research on it during the pandemic. We had this shock where everybody was working in person. In fact, Google at the time was a very in-person company and everybody had to go home, right? It was tough for me personally, right? Because I'm actually, I'm in the office today, I'm a big in-office person even when I don't have to be.
And I think, I couldn't tell you what the results were, whether people were more or less productive. But I can tell you what we concluded by looking at this, which was, the change wasn't just that people were working from home. The change was people's lives were upended and people were really worried about getting a disease. I mean it wasn't even the standard work from home, right? At the time, it was working from home, plus our kids are there. We don't have quite as an efficient setup as we normally would. So it was a very abnormal time.
And then post-pandemic... I haven't done any research on it since, but we can think about post-pandemic, okay, we're settling down, we've got more of a routine. My kids are back in school, so that's a plus. I've got a pretty good setup. And now if you were to compare people who are in office versus people who are out of office, it's tougher because now we have a self-selection issue. I would say that people who are working from home more often have gravitated to that, often because they feel that they're more productive from home. And people like me who are in the office tend to be there because we feel that we're more productive.
And so differences you might see might be sort of differences in preferences and difference in these types of people. Maybe for lack of a better term, introverts versus extroverts, although that probably doesn't divide that cleanly along those lines. But differences between the two populations who have self-selected, I think make it quite difficult to compare the groups at the moment. But maybe there's an opportunity to look at these companies that are forcing people back to the office because then you take out that element of choice.
Although I will say I'm curious about your experience, but what I've seen is that at lots of companies, people who have more sway at the company, people in leadership roles can often decide for themselves about whether they're going to be at home or not, independent of mandate. And also even if you mandate working from office at a high level, often individuals will make their own choices and will cover for each other. And so even if you were to study a back-to-office mandate, I'll bet actually at many companies, there's still a large amount of variation from team to team and from company to company.
Shane Hastie: Well certainly what little I've seen and discussions in the organizations that are doing the mandates, those folks who would self-select to not be there are often choosing to not be there. And so you end up... I think my take at this point is that we are going to see the organizations for whom people who want to be in the office self-select, and those who are more fluid, self-select to that space.
Emerson Murphy-Hill: Yes, it's an interesting dynamic, the self-selection too. And I feel we're also at a weird point in the industry where workers, and engineers included, have less power than they used to have even four or five years ago, right? So the job market for engineers has definitely tightened up. And so five years ago while we might've been able to strongly enforce our preferences and if we didn't want to work in the office and someone was making us, we could easily get a job somewhere else. And that's definitely become more challenging now. So it's some combination of what your preferences are of working in office versus working from home, and your ability to move, for which there doesn't seem to be a lot of options.
Shane Hastie: One of the points that's been made about remote work, there's two elements to it that I think you, I know, dig into areas that you're particularly focused on. The one is being able to work remote actually allows us to be more inclusive and working remote potentially impacts our career growth in that we're not in the face of our managers. So let's tackle the equity one there at the beginning to start with. So equity in engineering I know is an area that you are passionate about and interested in. What's happening there?
Systemic Bias and Inequality in Many Dimensions [13:13]
Emerson Murphy-Hill: Yes, for sure. And I'd love to say I got into this area because it was a deep conviction that I had, but I'll say I probably just stumbled into it. I'll give you an example. When I was a professor, I was teaching a graduate level software engineering class and we were talking about diagrams and we were talking about UML diagrams at the time. And a blind student in my class, he raises his hand and he says, "Okay, what am I supposed to do?" And I was like, "Oh no, I actually don't know what you're supposed to do. That's a really good question".
It was foolish of me for walking into that lecture and not having an answer for what folks who can't see the squares and the lines on the display, what they're supposed to do. But it was a moment of reflection for me to think about, "Well, what are these folks supposed to do?" So that student and I ended up going on and writing a research paper about this. He went out and he found a bunch of blind professional engineers, blind and low vision engineers working in industry, and talked to them about what their experience was? How did they work with their colleagues? When they went to the whiteboard, what did the blind colleagues do? What sort of development environments they were using? How were they coding? Were some languages easier than others? And so that was an educational experience for me.
And then also when I was a professor, I was just sitting around as professors do, with some colleagues. And I had been a fan of some of the social science research about equity. In particular, I always thought that these studies where they send out resumes to a bunch of companies and they systematically change the name at the top of the resume, so they change a man's name to a woman's name or vice versa, or they change what sounds like a white name to what sounds like a Black name at the top. And then they look at these resumes you send out to a bunch of different jobs. How many people come back and how many interviews are these people offered? They're fake. They're not real people, so they don't actually interview. But you look at how many callbacks, essentially, you get from that.
And sure enough, both in tech and outside for certain types of jobs, women often will get fewer callbacks than ostensibly a man will. I always thought this was really interesting and very good evidence, right? I'm interested in research, I'm interested in data, and I thought it was just really convincing evidence that there's bias in hiring, for instance. And the hiring is not really my research focus, but what I am interested in is the developer tools.
And so anyway, this other professor and I were sitting around and we were talking about this idea and how it might relate to us. And it just occurred to us that a pull request is something like an interview, where you're making a judgment about the code or a judgment about the person or a judgment about the resume. And we said, "Well, this thing where you put different people's names at the top, could you do that with pull requests and would it make any difference?" And we didn't end up going in an experimental direction. We said, "Well, there's already a million pull requests out there on GitHub with different people's names at the top. And does it make a difference if the pull request was authored by a man or authored by a woman?"
And so we did a research project about this and we looked at millions of pull requests and we had this technique where we could connect it with gender data from social media and we could infer people's genders based on that. And that story was a little bit complicated because it turns out there's not just a raw difference between genders and acceptance of their pull requests. It turns out it depends on whether you signal your gender through your profile picture. So if you're a woman and you use a profile picture, your pull request is less likely to be accepted. But if you don't use a picture, you just use the standard avatar, so you can't really tell who the person's gender is, women actually do better than men.
And I thought that's really interesting. This is congruent with what you would expect from that social science literature, which suggests that there's some discrimination happening. And certainly congruent with individual people's experience who I've talked to who they get their pull requests rejected. And sometimes women will do that and they'll say, "It seems like I got more pushback here than I might have if I were a man", or, "Why was that rejected but this other one was accepted?"
And on an individual level it's really hard to tell what's going on, right? Same with job applications. Are you being discriminated against or not? And so the GitHub study was my second foray and I just thought it was a really interesting... Engineering, it's just a very interesting process to study these social dynamics. And I think what's especially interesting about engineering is the type of work that we do is, at least at a high level, it's very well regimented. We do code reviews of each other's code. A lot of it's public. Even if it's private at a large company for instance, it's typically on some platform, there's a process, there's approval, there's comments, there's merging. And that sort of structure allows you to look at these social factors in a more structured way than you might be able to otherwise.
And then we did the study on GitHub and I actually joined Google to do something very similar. They were also interested in equity issues in engineering. And so we repeated that study that we did on GitHub inside of Google, and it turned out it was quite a similar thing, as you might expect. They don't exactly get their pull requests accepted or rejected, but they can have more or less pushback, more or less feedback that has to be addressed. Turns out women tend to get more feedback than men do on their pull requests and it takes longer to get them accepted. It turns out it's the same for engineers of color.
And I think what the biggest surprise for me was... Maybe not a surprise, but it was about age. And it turns out the older you are, the more pushback you get too. And it's not about level, how highly leveled you are, because the higher level you are, the less pushback you get, which arguably that could just be about confidence. You've been promoted, so you're probably pretty good at your job. And also the longer you've been at a company, the less pushback you get too. Again, I think for similar reasons.
Although that issue of leveling or of tenure, it's interesting for me to think about, well, is it because these folks who have been there longer are more competent? Or is it just because someone at a higher level, we just defer to them more often? We assume they're more competent, so we just give them a pass. And in fact, some of the very, very senior engineers that I talk to, a lot of them will say, "I actually can't get a very fair code review often, because people just assume that I know exactly what I'm doing".
So in any case, after you control for that, it turns out the older you are, the more pushback you get. And in fact, it was the strongest effect in that study, which is that if you are compared to a new college grad, if you're 60 year old plus, you're about three times higher likelihood of having a high pushback code review. Very strong effect there.
I'll give you maybe one other example of a study within Google where equity in the engineering system comes up. It turns out that not only do folks from historically marginalized groups tend to get more pushback than folks from majority groups, it turns out that those folks also are asked to do fewer code reviews. So women are actually asked to do fewer code reviews than men.
Well, we think part of that, these sort of large scale studies, it's a little hard to pin it on why exactly. But I think the theory that's typically used here is role congruity theory, which is are you stereotypically the person who fits into that role? And stereotypically men are engineers. And so when you want someone to review your code, who do you typically think of? Well, you'll think about people on your team, but you might be a little bit more likely to choose a man. Because if you want to think of a good engineer, you're just a little bit more likely to think of a man than a woman. That's the hypothesis anyway.
But it turns out there are also some systemic reasons why this happens too, why people are more likely to select men than women. So one of those reasons is that, at least in the code base we were looking at, it turns out that men were more likely to be owners of the code base that they work with. So different parts of the code base, you can specify different owners, and for many different teams, that means they just choose an owner. So a lot of times people will choose the tech lead, they'll choose someone senior on the team, but it's an individual process where you make choices.
And what we know about those choices is that people's biases inevitably creep in. Any sort of decisions you're making like that, it's hard to keep biases out. So what we found is that men were just more likely to be owners. And if you're an owner, you're a more, let's say, useful code reviewer, right? Because any pull request typically either the author or the reviewer has to be an owner. So owners tend to be more useful reviewers. So the way we think about how ownership is assigned, this is a structural way that biases sort of creep in and have downstream consequences.
One engineer gave me an example where she was a tech lead for a team. She was working with four or five other engineers, I think they were all men. And it turns out the rest of the team, they were reviewing each other's code and they wouldn't send her a code review very often. It didn't really bother her very much. In some sense doing more code review is just more work so she could focus on her own thing so it wasn't such a big deal.
But when the performance review time came around, what happened was is her boss gave her feedback. She said, "You should probably really do more code reviews. It would help you sort of integrate with the team, show more technical leadership". And she was like, "Ugh, they're not even sending me the code reviews". So not doing code reviews ends up sometimes having some negative downstream consequences. So anyway, some research insights into equity in the engineering process.
Shane Hastie: How do we overcome these, what are almost systemic structures?
Anonymous Code Reviews to Avoid Bias [23:49]
Emerson Murphy-Hill: Yes, it's a great question. So this one about when I'm reviewing someone's code, am I going to give them more or less feedback depending on their gender? Or am I going to give people more or less feedback for their race or ethnicity? I think that we all consider ourselves good people. I'm not doing it intentionally. I think what we can look at, is we can just look at parallel fields where this happens and what do they typically do about it?
And I think with the code review example, they call it the blind auditions was a good example, where they audition musicians to be in an orchestra and they do it behind a curtain. So you can hear the person doing the audition but you don't know anything about what they look like. And when they did this originally for this orchestra, they tended to get more women than when they were actually watching the person themselves. So hiding people's demographics when it's appropriate to do so.
And in fact, this is something that we did at Google to help with the code review process. And so what we did is we implemented an anonymous code review. So as a reviewer, what you would do is if someone sends you a code review, you wouldn't see the person's name at the top of the code review. What you'd see is you'd see actually something like in Google Docs where you see anonymous animal. It would show anonymous aardvark or something, and you would just try to review the code.
And we did a study about this where we ran it with a few hundred engineers. We had them do it for a couple of weeks, and we looked at some of the engineering outcomes to see whether it was really harmful to an engineering process, whether it slowed people down. You imagine it might, right? Because if you know who it is, you can make some certain assumptions. Whether that's a good idea or not is another question, what they know and what they don't know.
And so what we found is actually it didn't significantly slow down the engineering process. People reviewed the code with approximate thoroughness. It took them about the same amount of time to do it. There was a little evidence that actually they were a little bit more thorough. There's a way that we could link these changes to rollbacks. When there was a pull request that went bad, sometimes they had to roll it back. It turned out for the anonymous code reviews, they were a little less likely to be rolled back than the ones where people could see people's identities. Now, we don't know whether we solved the equity issue. We just didn't have enough people to know, necessarily. But we showed that it's pretty feasible to not show the authors' names at the top of the code review.
One of the things we did learn though, is that occasionally people do really need to know who wrote the code. So for certain types of security changes, you want to make sure that people aren't giving themselves access to something that they shouldn't have. But it turned out these types of changes where you really need to know who the author is, they're pretty rare. And we found out, I think less than 5% of the cases that people need to do that. And in fact, what we implemented in the anonymous code review tool is there was just a button you could press, just two clicks, and it would show you who the person is.
And so the idea there, is lots of these systems where the default way of doing the process is a way that's anonymous, but if you really need to break the glass, go ahead. But it turned out it doesn't happen very often. So it's not a huge deal. As far as I know, today at Google, they still have anonymous code review as option as a way that you can review each other's code and it's still being practiced in certain parts of the code base where it's a good place to do that, where you don't necessarily even know who the author or who the reviewer is.
Shane Hastie: It's almost impossible to have a conversation today without talking about AI and equity and bias in AI. What do we need to do with that in terms of developer tools?
AI Tools and Equity in Engineering [27:47]
Emerson Murphy-Hill: Yes, I think there's some interesting opportunities and challenges here. My engineering, on a day-to-day basis, I feel like I'm really enjoying, sorry to prop Microsoft's own product here, but I'm really enjoying working in VS Code with agent mode. I can specify what I want at a high level and the agents go and do it. And it doesn't always get me what I want but, back to productivity, I do feel more productive while I'm doing it. I like being able to ask dumb questions essentially to different Copilots and say, "It feels like it's too late to ask, but I don't know what this acronym means".
And so I think there are some equity, again, opportunities and challenges here. So one of the opportunities I would say, is a study that we did at Google, we asked people about reaching out and asking questions to others. And we asked it to folks at different groups. And folks from historically marginalized groups said that they were somewhat less likely to want to ask for help in a semi-public forum. So Google has an internal system that's something like Stack Overflow. People can ask questions. It's highly encouraged.
And with some people from marginalized groups, women for example, told us is, "It doesn't feel like people are always treated respectfully on those. And I don't want to subject myself to that. And also I don't want to... If that does happen to me, and it may not, but if that does happen to me, I don't want it to be there permanently for my manager to see or for some other person at the company to see. I don't want it to be such a permanent fixture".
And questions and the answers are supposed to be permanent so that other people can learn. But this dynamic means that folks from marginalized groups, if they're less comfortable asking questions like this, then they're not as productive as they could be. I know for myself, a lot of times I don't ask questions as soon as I should, and I'm really nervous about it and I'm nervous about looking dumb. And so you really need to have an engineering environment where it's easy to ask questions. And if the engineering systems are getting in the way or making it harder for me to ask questions, that's just going to slow down engineering.
And so what this has to do with folks from historically marginalized groups, if people are less likely to ask these questions, then that's going to be a problem. And in terms of AI, what I love about Copilot is that my questions are not visible to other people. No one else can see it. It's just me. I feel very comfortable. And so I think an opportunity here is, I think it will allow people who wouldn't normally be super comfortable asking questions in a public setting or even in a private setting with just a few colleagues can get those questions answered.
And I think there's a secondary opportunity, which is when asking those questions to a Copilot first before you ask a colleague often feels like a good step, just like Googling something is a good step. It shows you put in some work. It also demonstrates that it's not a dumb question. If you can't find it via Googling and you can't find it via a large language model, it gives you some confidence that asking a person is not going to be embarrassing. So I think that as the models get better and better, I think this is a great opportunity for equity to increase in our engineering environment.
Just to give you an example of something that I worry about with AI, is how people who are using AI are treated. So I'll give you an example. Just this week I'm talking to my manager and talking about this change that I'm making, and I'm saying, "I don't know, this change seems to work not a hundred percent. Claude 3.7 helped me write it".
Now, if you were my manager and you heard me say that, what do you think about that? Does that make me a competent engineer because I'm relying on the best possible AI tools that are out there? Or does that make me lazy? Because I'm not certain, does that make me a bad engineer because I wasn't able to verify it? And maybe you're thinking I'm not competent enough to write my own code, I have to get AI to do it for me.
Those are two possible views, and your actual view might be somewhere in between. But my worry is that the way you think about my use of AI depends on my demographics and depends on what you think about me as a competent engineer. So what I worry about is if I were a woman and I said that you would think, "Oh, she doesn't know what she's doing. This is a crutch for her". Whereas if I'm a man, I think you're a little bit more likely to say, "Oh, he's just doing best practices here".
So I think those sorts of inequities, it's not really the AI that's causing it, but because there's uncertainty around AI, I worry about these social phenomenons that we've seen in code review and we see in meetings. I'm worried that they're exacerbated and AI is just going to prod further inequities like that.
Shane Hastie: I will confess, the first time I used AI tools to do real work, I felt like I was cheating.
Emerson Murphy-Hill: Yes, it's magic, right? Many, many times. And it feels like it's not real work, but you're going to save. When you can get a job done faster, it's hard to argue with that, right?
Shane Hastie: For sure. Emerson, some really interesting points, really good conversation here. If people want to continue the conversation with you, where do they find you?
Emerson Murphy-Hill: Yes, find me on LinkedIn for sure. I haven't updated my public webpage in a while, but if you find any of my papers, my email address is at the top of it. But yes, feel free to reach out, happy to talk. You can certainly find the research papers that we've talked about. Maybe we can link to them in the show notes, but you can also find them on Google Scholar, for instance.
Shane Hastie: Wonderful. Well, thank you so much for taking the time to talk to us today.
Emerson Murphy-Hill: Of course. Thanks, Shane.
Mentioned:
- Emerson Murphy-Hill on LinkedIn
- What Predicts Software Developers’ Productivity?
- What improves developer productivity at google? code quality
- Gender differences and bias in open source: pull request acceptance of women versus men
- The Pushback Effects of Race, Ethnicity, Gender, and Age in Code Review
- Engineering impacts of anonymous author code review: A field experiment
- On women being less likely to have code ownership