Paging Dr. Data: Big Data Goes to the Emergency Room


Paging Dr. Data: Big Data Goes to the Emergency Room

For doctors who treat trauma patients, prediction is key: Will a patient die in the next 30 minutes? Why or why not? What about the next six hours? And after that? What’s the best treatment? How might the patient respond to that treatment in the best- and worst-case scenarios? Then what? These are mortal questions: Trauma kills more people between the ages of 1 and 44 than any other cause. Decisions made in those first few minutes and hours post-injury have the potential to save lives and speed recovery. Clinicians make critical judgments informed by hard-earned experience, best practice guidelines, and intuition. But those decisions are fraught with uncertainty.

This is where Alan Hubbard comes in. Hubbard, an associate professor of biostatistics at the School of Public Health, has been collaborating with Dr. Mitchell Cohen, a trauma surgeon at San Francisco General Hospital, to develop a predictive computer model for the prognosis of trauma patients. The model will predict the answer to one question—will the patient live or die?—but it could go much deeper. Clinicians might also consult it to determine the likely outcomes for different treatment options.

The basic idea is akin to weather forecasting: Dump in all available data, and predict the immediate future. Time is of the essence to trauma patients, and that urgency is built into the model, which can adjust its predictions over time. In addition, it’s smart: The more data it analyzes, the better it learns to predict outcomes.

Such a tool could customize treatment to patients’ own characteristics, rather than how they compare to an average patient.

“That is what personalized medicine is all about,” says Hubbard.

The doctor in the data

Welcome to science in the Information Age. Projects in a wide spectrum of disciplines use statistical tools to separate value from noise in enormous data sets. Retail giants predict the buying patterns and interests of their future consumers; astronomers probe the secrets of deep space by studying the flood of data pouring in from telescopes; geneticists can home in on pathologic genetic mutations by surveying entire genomes of many people. But the work by Hubbard and his team represents the first time researchers have taken such a deep and systematic approach to the messy world of trauma care.

From the moment they’re whisked through the doors, patients are measured, tested, and watched. These nearly continuous observations produce a lot of data. But even after a particular patient’s ordeal ends, those data become valuable in the hands of statisticians like Hubbard. These numbers quantify an individual’s experience. But when combined with data from many other patients, they can reveal the critical measurements, or indicators, that most accurately predict whether a patient is improving or not.

It’s easier said than done: ER patient data are messy. A patient’s condition can change in a heartbeat; as a result, measurements become more or less important at different times. Hubbard says an accurate model of care must accommodate not only changing variables—but also changing significance.

“Practitioners shouldn’t be making the same decisions at every point in time because the dynamics of the patient— and the process—change over time,” says biostatistician Iván Diaz PhD ’13, who worked on the project as a doctoral student at Berkeley and is now a postdoctoral fellow at the Johns Hopkins Bloomberg School of Public Health.

Hubbard says the model will help doctors identify which critical measurements to watch, and when.

“We don’t believe that outcomes will depend on only one particular variable,” he says. “It’s going to be a combination of values. That’s why we’re useful. There’s information there that we don’t think the physicians are using at this point.”

The right combination of measurements that will best predict outcomes, he says, isn’t obvious. It hides in the data.

The surgeon and the statisticians

Hubbard, whose research spans a number of disparate areas, calls himself a statistical “jack of all trades.” But his projects frequently involve “high-dimensional data,” where one or more of many variables might be responsible for a particular outcome. He doesn’t take every collaboration; instead, he looks for projects where the outcomes might actually be used. When Cohen approached him in 2010, Hubbard had never thought about working on data for trauma care. But the project offered him a chance to use statistics to answer questions that clinicians really cared about.

“New techniques are born out of necessity, and they’re the only way to handle these big data sets.”

“Care of acute trauma is one area where evidence-based medicine is difficult to do because it’s just chaos,” he says. “The practice is informed not so much by the quantitative evaluation of data using statistics, but more a priori principles. There’s little validation of the approaches taken to acute trauma, and there’s a vast difference in treatment from one trauma unit to another.”

Cohen, who is also an associate professor of surgery at the UCSF Department of Surgery, had been laying the groundwork for this project for years before he found Hubbard. “We treat patients in the Emergency Department through the Operating Room and into the Intensive Care Unit,” he says, “but we make big decisions on the sickest patients in the hospital with inadequate data.”

In a paper published in the journal Critical Care in 2010, Cohen and his colleagues analyzed high-dimensional data from patients in the ICU. Using an approached called hierarchical clustering, they found 10 patient “states,” each connected to the likelihood of certain outcomes. Patients passed through different states as treatment progressed: In one state, they were more susceptible to infections; in another, the risk of death was higher. Knowing how patients fit into these groups could help clinicians make treatment decisions.

Cohen wanted to similarly transform trauma patient data into a robust forecaster of outcome. But he knew he needed help: Trauma data was particularly unruly.

“I work in the messiest research environment,” he says. “Our research is done at three in the morning on Saturday that happens to be Christmas Eve in a place where a patient is bleeding to death on the floor.”

A number of prediction algorithms already existed, but he thought they were simplistic, failing to capture the complexity and dynamic nature of trauma care. When he sat down with Hubbard and his team, though, Cohen knew they could help.

Their “computational abilities were the perfect marriage for what we wanted to do,” Cohen remembers. “They really took the time to learn how we think, and what our problems are. They spent the time to get it, rather than just applying the statistics to the data.”

At their first meeting, Cohen outlined two goals. First, he wanted a model that could predict the likely medical outcome for a patient at a given time. Second, he wanted to identify the specific variables that, at a given time, could predict a patient’s outcome. For the second goal, identifying variables, Hubbard would call on his past work on causal inference.

For predicting outcomes, Diaz knew immediately what they needed: An algorithm called SuperLearner.

Letting the data speak

Diaz refers to professor of biostatistics Mark van der Laan, creator of the SuperLearner, as “the brain behind all these methods.” In van der Laan’s lab at Berkeley, researchers strive to improve the use of statistics to arrive at meaningful conclusions. Van der Laan has a beef with modern statistical methods: “The current practice of statistics often fails to learn the truth from data,” he wrote in his lab’s online mission statement.

Many statisticians are limited by their allegiance to particular models, he says. Conventional estimation procedures like least squares, or linear or logistic regression are applied erroneously to high-dimensional data sets. No one model can pull meaningful information or value from these data sets, he says.

He likens the practice of trying to apply one simple model to big data to picking the highest-achieving student in a class on the first day. SuperLearner doesn’t use one method: It consults a built-in library of different approaches to find the optimal strategy. Then, it uses some data to “train” the algorithms, and other data to evaluate the trained algorithms.


  “I work in the messiest research environment. Our research is done at three in the morning on Saturday that happens to be Christmas Eve in a place where a patient is bleeding to death on the floor.”

“Then you say, okay, now I know which algorithm is doing the best, so choose that one,” says van der Laan. The most successful approach might be one type of regression—or it might be a weighted average of many approaches. SuperLearner identifies and selects the best tool at a given point in time. In terms of its use with patient data, the SuperLearner uses patient data to train itself on the best-fit algorithms, and then uses those algorithms to make predictions on other patients.

“New techniques are born out of necessity, and they’re the only way to handle these big data sets,” says van der Laan. In a paper published earlier this year in the Journal of Trauma Acute Care Surgery, the researchers tested the SuperLearnerbased system at different time intervals on data from trauma patients. And during each time interval they tested, up to six hours, the system more accurately predicted patient outcomes than existing models.

The future of trauma care

SuperLearner, says Diaz, was a perfect fit for what Cohen wanted to do with trauma patient data. “Severe trauma is a process that varies over time,” he says.

“When I first started talking about this stuff and presenting it to my world, old-timer luminaries in our field and clinical trauma people said, ‘You’re trying to replace people with computers,’” recalls Cohen. “And I say, ‘No, no, no, we’re trying to model the clinical gestalt you’ve developed over 30 years of sitting at a bedside.’” Experienced clinicians can look at a patient’s vital signs and just know that something’s not right, and more often than not, they’re right, Cohen says.

Of course, it’s one thing to design a predictor in theory—and another to make it a life-saving reality. That’s why doctoral student Anna Decker MA ’11, an integral member of Hubbard’s research team, is now working with the medical staff at San Francisco General Hospital to implement the technology that can predict outcomes.

The researchers don’t yet know how clinicians will interface with the model when it’s up and running. Perhaps they’ll consult an app on a tablet computer that changes color according to predicted outcomes; perhaps patients will be hooked up to a smart monitor that automatically measures the most critical variables at that time.

But what researchers do know is that the model will continue to evolve, getting smarter and better as it acquires new data. Cohen and Hubbard both say that the tool represents the thought-process of a well-trained clinicians, who already knows that patients’ conditions change rapidly and need to be treated accordingly. They’re trying to quantify—and even boost—the keen sense of a well-seasoned clinician.

“What we’re trying to do,” says Cohen, “is to model that intuition.” founders-swirl-16px

Leave a Reply