I sat, dumbfounded, staring at the computer screen starkly displaying my failure to find support for my study hypothesis. My first thought was, How could I admit how wrong I had been to my supervisor and to the doctors leading the study?
I had spent hundreds of hours developing the survey, attending biweekly research meetings with the doctors and nurses who tracked drug errors in two nearby hospitals, and periodically jumping on my bicycle to get to the hospital soon after a caregiver had reported a major error, to interview people to identify the error’s underlying causes. I had been entrusted with the medical-error data and permitted to ask hundreds of busy doctors and nurses to fill out my survey. I felt guilty for taking up their valuable time and ashamed of my failure.
One of the people I’d have to talk to about the failure was Dr. Lucian Leape, a pediatric surgeon who had shifted his professional attention later in his career to the study of medical errors. One research goal for the larger study was simple: to measure the rate of medication errors in hospitals.
Back then, little was known about how frequently errors happened, and Lucian and his colleagues had a National Institutes of Health grant to find out. Adding to that goal, inspired by some research in aviation that showed that better teamwork in the cockpit meant safer flights, Lucian had asked whether the same might be true in hospitals.
The aviation research that inspired Lucian hadn’t intended to look at teamwork, but rather at fatigue in the cockpit. It was another failed hypothesis. A team of researchers at NASA, led by human-factors expert H. Clayton Foushee, ran an experiment to test the effects of fatigue on error rates.
They had twenty two-person teams; ten were assigned to the “postduty” or “fatigue” condition. These teams “flew” in the simulator as if it were the last segment of a three-day stint in the short-haul airline operations where they worked. The fatigued teams had already flown three eight- to ten-hour daily shifts. Those shifts included at least five takeoffs and landings, sometimes up to eight. The other ten teams (the “pre-duty,” well-rested condition) flew in the simulator after at least two days off duty. For them, the simulator was like their first segment in a three-day shift.
To his surprise, Foushee discovered that the teams who’d just logged several days flying together (the fatigued teams) performed better than the well-rested teams. As expected, the fatigued individuals made more errors than their well-rested counterparts, but because they had spent time working together through multiple flights, they’d made fewer errors as teams. Apparently, they were able to work well together, catching and correcting one another’s errors throughout the flight, avoiding serious mishaps. The fatigued pilots had essentially turned themselves into good teams after working together for a couple of days. In contrast, the well-rested pilots, unfamiliar with one another, didn’t work as well as teams.
The fatigued individuals made more errors than their well-rested counterparts, but because they had spent time working together through multiple flights, they’d made fewer errors as teams.
This surprise finding about the importance of teamwork in the cockpit helped fuel a revolution in passenger air travel called crew resource management, which is partly responsible for the extraordinary safety of passenger air travel today. This impressive work is one of many examples of what I call the science of failing well.
Research on cockpit crews blossomed in the 1980s and included the work of J. Richard Hackman, a Harvard psychology professor, who studied the interplay of pilots, copilots, and navigators on both civilian and military planes to understand what effective teams had in common. His cockpit-crew research had attracted the attention of Lucian Leape. Seeing a parallel between the high-stakes work of cockpit crews and that of hospital clinicians, Lucian picked up the phone to see if Richard might be willing to help with Lucian’s medication-error study. Lacking the time to commit to the project, Richard suggested that I, his doctoral student, might be put to work instead. Which is how I found myself hunched over my findings, gripped by anxiety.
* * *
I’d hoped to build on the aviation research to add another small finding to the team-effectiveness literature. The research question was simple: Does better teamwork in the hospital lead to fewer errors? The idea was to replicate the aviation findings in this new context. So what if it would not be a major discovery? As a new graduate student, I wasn’t trying to set the world on fire, but just to satisfy a program requirement. Simple, unsurprising, would be just fine.
A small team of nurses would do the hard work of tracking error rates for six months in the hospital wards, talking with doctors and nurses and reviewing patients’ charts several times a week. All I had to do was distribute a survey to measure teamwork in these same wards in the first month of the six-month study. Then I had to wait patiently for the error data to be collected so I could compare the two data sets—connecting my team measures with the error data collected over the full six months. A healthy 55 percent of the surveys I distributed were returned, and the data showed plenty of variance across teams. Some teams appeared to be more effective than others. So far so good.
Would those differences predict the teams’ propensity to make mistakes?
At first glance, everything looked fine. I immediately saw a correlation between the error rates and team effectiveness, and better yet, it was statistically significant. For those who haven’t taken a stats course, this was reassuring.
The data were saying the opposite of what I’d predicted. Better teams appeared to have higher, not lower, error rates. My anxiety intensified, bringing a sinking feeling in my stomach.
But then I looked more closely! Leaning toward my computer screen, I saw that the correlation was in the wrong direction. The data were saying the opposite of what I’d predicted. Better teams appeared to have higher, not lower, error rates. My anxiety intensified, bringing a sinking feeling in my stomach.
Although I didn’t yet know it, my no longer straightforward research project was producing an intelligent failure that would lead to an unexpected discovery.
* * *
That day in William James Hall, staring at the failure displayed on my old Mac screen, I tried to think clearly, pushing aside the anxiety that only intensified as I envisioned the moment when I, a lowly graduate student, would have to tell the esteemed Richard Hackman that I had been wrong, that the aviation results didn’t hold in health care. Perhaps that anxiety forced me to think deeply. To rethink what my results might mean.
Did better teams really make more mistakes? I thought about the need for communication between doctors and nurses to produce error-free care in this perpetually complex and customized work. These clinicians needed to ask for help, to double-check doses, to raise concerns about one another’s actions. They had to coordinate on the fly. It didn’t make sense that good teamwork (and I didn’t doubt the veracity of my survey data) would lead to more errors.
Why else might better teams have higher error rates?
What if those teams had created a better work environment? What if they had built a climate of openness where people felt able to speak up? What if that environment made it easier to be open and honest about error?
To err is human. Mistakes happen—the only real question is whether we catch, admit, and correct them. Maybe the good teams, I suddenly thought, don’t make more mistakes, maybe they report more. They swim upstream against the widely held view of error as indicative of incompetence, which leads people everywhere to suppress acknowledging (or to deny responsibility for) mistakes. This discourages the systematic analysis of mistakes that allows us to learn from them. This insight eventually led me to the discovery of psychological safety, and why it matters in today’s world.
Mistakes happen—the only real question is whether we catch, admit, and correct them. Maybe the good teams, I suddenly thought, don’t make more mistakes, maybe they report more.
Having this insight was a far cry from proving it. When I brought the idea to Lucian Leape, he was at first extremely skeptical. I was the novice on the team. Everyone else had a degree in medicine or nursing and deeply understood patient care in a way that I never would. My sense of failure deepened in the face of his dismissal. That in those fraught moments Lucian reminded me of my ignorance was understandable. I was suggesting a reporting bias across teams, effectively calling into question a primary aim of the overall study—to provide a good estimate of the actual error rates in hospital care. But his skepticism turned out to be a gift. It forced me to double down on my efforts to think about what additional data might be available to support my (new and still-shaky) interpretation of the failed results.
Two ideas occurred to me. First, because of the overall study’s focus on error, when I had edited the team survey to make its wording appropriate for hospital work, I had added a new item: “If you make a mistake in this unit, it won’t be held against you.” Fortunately, the item correlated with the detected error rates; the more people believed that making a mistake would not be held against them, the higher the detected errors in their unit! Could that be a coincidence? I didn’t think so. This item, later research would show, is remarkably predictive of whether people will speak up in a team. This, along with several other secondary statistical analyses, was entirely consistent with my new hypothesis. When people believe mistakes will be held against them, they are loath to report them. Of course, I had felt this myself!
Second, I wanted to get an objective read on whether palpable differences in the work environment might exist across these work groups, despite all being in the same health-care system. But I couldn’t do it myself: I was biased in favor of finding such differences.
Unlike Lucian Leape, with his initial skepticism, Richard Hackman immediately recognized the plausibility of my new argument. With Richard’s support, I hired a research assistant, Andy Molinsky, to study each of the work groups carefully with no preconceptions. Andy didn’t know which units had more mistakes, nor which ones had scored better on the team survey. He also didn’t know about my new hypothesis. In research terminology, he was double-blind. Andy observed each unit for several days, quietly watching how people interacted and interviewing nurses and physicians during their breaks to learn more about the work environment and how it differed across units.
Andy reported that the hospital units in the study appeared wildly different as places to work. In some, people talked about mistakes openly. Andy quoted the nurses as saying such things as a “certain level of error will occur” so a “non- punitive environment” is essential to good patient care. In other units, it seemed nearly impossible to speak openly about error. Nurses explained that making a mistake meant “you get in trouble” or you get put “on trial.” They reported feeling belittled, “like I was a two-year- old,” for things that went wrong. His report was music to my ears. It was exactly the kind of variance in work environment that I had suspected might exist.
My eureka moment was this: better teams probably don’t make more mistakes, but they are more able to discuss mistakes.
But were these differences in climate correlated with the error rates so painstakingly collected by the medical researchers? In a word, yes. I asked Andy to rank the teams he’d studied from most to least open, the word he had used to explain his observations.
Astonishingly, his list was nearly perfectly correlated with the detected error rates. This meant that the study’s error-rate measure was flawed: when people felt unable to reveal errors, many errors remained hidden. Combined, these secondary analyses suggested that my interpretation of the surprise finding was likely correct. My eureka moment was this: better teams probably don’t make more mistakes, but they are more able to discuss mistakes.
* * *
Much later I used the term psychological safety to capture this difference in work environment, and I developed a set of survey items to measure it, thereby spawning a subfield of research in organizational behavior. Today, over a thousand research papers in fields ranging from education to business to medicine have shown that teams and organizations with higher psychological safety have better performance, lower burnout, and, in medicine, even lower patient mortality. Why might this be the case?
Because psychological safety helps people take the interpersonal risks that are necessary for achieving excellence in a fast-changing, interdependent world. When people work in psychologically safe contexts, they know that questions are appreciated, ideas are welcome, and errors and failure are discussable. In these environments, people can focus on the work without being tied up in knots about what others might think of them. They know that being wrong won’t be a fatal blow to their reputation.
When people work in psychologically safe contexts, they know that questions are appreciated, ideas are welcome, and errors and failure are discussable.
Today, I don’t doubt that my failure to find support for the simple research hypothesis that guided my first study was the best thing that ever happened to my research career. Of course, it didn’t feel that way in the moment. I felt embarrassed and afraid that my colleagues wouldn’t keep me on the research team. My thoughts spiraled out to what I would do next, after dropping out of graduate school. This unhelpful reaction points to why each of us must learn how to take a deep breath, think again, and hypothesize anew.
Psychological safety plays a powerful role in the science of failing well. It allows people to ask for help when they’re in over their heads, which helps eliminate preventable failures. It helps them report—and hence catch and correct—errors to avoid worse outcomes, and it makes it possible to experiment in thoughtful ways to generate new discoveries.
From Right Kind of Wrong: The Science of Failing Well by Amy Edmondson. Copyright © 2023 by Amy Edmondson. Reprinted by permission of Atria Books, a Division of Simon & Schuster, Inc.
When you buy a book using a link on this page, we receive a commission. Thank you for supporting Behavioral Scientist’s nonprofit mission.