How ‘Big Data’ is Transforming Today’s Healthcare Sector

Meir Rinde | October 15, 2014 | Health Care
Aneesh Chopra, former chief technology officer for the federal government, believes that the value of data sources grows exponentially as new ones are added.

The term “Big Data” seems to be everywhere these days. It’s being used to describe how marketers learn about shopper’s preferences, security organizations pinpoint potential risks, and demographers identify major trends. But nowhere does the use of big data have more potential to impact our quality of life than in healthcare.

As electronic medical records become the norm, and computers and mobile devices become ubiquitous, crunching large volumes of digital records to enhance healthcare decision-making is now possible.

Researchers are demonstrating how inventive uses of data can reveal patterns of illness that were previously obscure. Some hospitals in New Jersey, Pennsylvania, and other states are getting better at identifying and treating the sickest members of their communities. Insurance companies are tracking patient data as part of new schemes to reward doctors financially for keeping people well.

In point of fact, “Big Data” is used to cover a wide range of disparate activities enabled by information technology, whether they involve sifting through hundreds of millions of records or only a few thousand. It includes the “hot spotting” of frequent emergency-room users innovated by Dr. Jeffrey Brenner in Camden; a hospital workflow that makes sure diabetes patients get scheduled blood tests; a mapping project by Princeton economist Janet Currie that shows how home foreclosures lead to increased hospital admissions; and a smartphone app that lets users look up product recalls, among many other efforts.

To pinpoint high utilizers and focus cost-saving strategies, Dr. Jeffrey Brenner's Camden Coalition of Healthcare Providers analyzed hospital finance records and patients' home addresses to see where the most expensive patients lived. The pie charts show the percentage of receipts, visits, patients, land area, or city blocks covered by each map color. (Courtesy of Camden Coalition of Healthcare Providers)

Big data boosters say the field has great promise, with the potential to focus limited resources in ways that will improve the quality of patients’ lives, prevent needless deaths, and cut costs. At the same time, the productive use of data and analytics still faces a number of challenges, some of them unique to healthcare.

Privacy, in particular, is a concern. Current privacy laws often hamper research. Yet, some of the most cutting-edge public health research efforts and commercial ventures seek to “mash up” multiple sets of health records. This can put patients’ information to uses they never envisioned, employing information in ways that makes people uncomfortable.

A variety of solutions have been proposed for different kinds of privacy challenges, ranging from updated state and federal legislation to computer systems that allow data to be queried without revealing the subjects’ identities.

Patterns, Prediction, Surveillance

Healthcare organizations and researchers have been collecting and analyzing computer data for decades, but big data has gained currency as a buzzword only in the past two to three years. Experts refer to a new “volume, variety and velocity of data” that has resulted from the automated or large-scale collection of information — for example, from a wearable heart monitor — that allows real-time tracking and response.

Dr. Farzad Mostashari, the former national coordinator for health information technology at the U.S. Department of Health and Human Services, cited an early instance of relatively small “big data” from his work detecting disease outbreaks in New York 15 years ago.

While working for the Centers for Disease Control, he learned about the fire department’s records of ambulance calls, which were categorized by the problem described by the caller. While the information was scientifically unreliable “dirty data,” in the aggregate it showed “beautiful” patterns, like increases in respiratory calls at certain times.

The data turned out to reveal surges in flu cases well before individual doctors could become aware that something unusual was happening, Mostashari explained during a big data conference at Princeton University earlier this year.

> Big data boosters say the field has the potential to focus limited resources in ways that will improve the quality of patients’ lives

“That was kind of my first exposure to this idea that you could take data, which is now electronic, because we had some sort of transactional system — and the data is being collected for some totally other purpose, right, to dispatch an ambulance — but if you could reuse and repurpose it and look for patterns within it, it might be useful,” he said.

At the very least, ambulance-call data could serve as an early-warning system, allowing hospitals to prepare for higher patient volume and public officials to broadcast advice on how to avoid getting sick. But for Mostashari and many others, the greater goal of big data work is prediction. They want to know who is likely to get sick, weeks or months in advance, so that interventions can be put in place and tested for effectiveness, and causes of illness can be studied in detail.

Predictive analytics is in its infancy and its long-term utility is unclear. At the clinical level, the term has been used to describe systems that monitor a premature baby’s vital signs and give earlier warnings of a new infection, for example. In the future, a computer might automatically adjust the baby’s medicine without a nurse’s intervention.

Danish Researchers Supersize Big Data, Analyze Nation’s Full Patient Registry

Working with medical records for more than 6 million people, Danish scientists uncover unknown disease patterns that could ultimately improve healthcare worldwide

In the United States, researchers can only dream of the ultimate health database — one that contains complete electronic records spanning decades for all Americans, allowing analysis of long-terms patterns of illness.

One country that already has such a database is Denmark. For the first time, researchers have analyzed the nation’s full patient registry — encompassing 6.2 million people — and spotted disease patterns that were previously unknown or not well understood. Their analysis and extensions of their work could improve diagnosis and treatment of disease in Denmark and around the world.

“This is the first in the world for analyzing an entire population,” said researcher Søren Brunak of the University of Copenhagen and Technical University of Denmark. He said the analysis published earlier this year was also the first time disease development had been tracked over time in this way, using 15 years of health records.

The researchers observed which diseases tend to occur in sequence and sketched out a set of common disease trajectories and clusters. Two of the most striking clusters show that chronic obstructive pulmonary disease (COPD) is a key diagnosis preceding a number of fatal diseases, and gout appears to be a central signal of future cardiovascular problems.

Danish researchers studying disease trajectories across the country's entire population found that a diagnosis of gout is a key indication that a patient may develop cardiovascular disease. (Source: Temporal Disease Trajectories Condensed from Population-Wide Registry Data Covering 6.2 Million Patients).

Brunak cautioned that it isn’t clear, for example, that a number of conditions that seem to precede COPD, such as angina and psoriasis, actual arise first or are just diagnosed earlier. But he said the strong statistical correlations between certain diseases could guide doctors to test for undiagnosed conditions.

“It’s actually super useful if you systematically can see that certain diseases are discovered in the wrong order,” he said. “Then when you discover the first one, you know you should also be looking for the second one.”

Brunak said this kind of large-scale meta-analysis will become particularly useful if genetic data is added, potentially allowing clinicians to predict which of a few likely trajectories a patient’s disease will take and to tailor treatment accordingly. He said hospitals and biomedical firms have shown significant interest in the findings. As a next step he will integrate medical records from as far back as the 1970s that use older disease codes, extending the analysis to cover entire adult lifespans.

He noted that the study was possible because Denmark has a registry that includes all health records unless a patient opts out, a system some are advocating for U.S. recordkeeping systems. Denmark’s government-sponsored healthcare system means there are no health insurers, simplifying data collection and privacy issues.

Brunak said he expects the study to affect healthcare by showing the importance of large analyses, with hundreds of thousands rather than a few thousand people. This has already become clear in genetics research, he said.

“In a clinical trial you might have 3,000, 5,000 people, but eventually it’s clear that it doesn’t buy you a whole lot,” he said. “Meta-analysis across many people — this is the way to go, whether you want to do genetics or clinical data analysis.”

A number of organizations are also researching ways to predict and prevent hospital readmissions, which are used as a measure of health quality. Providers with high readmission rates can be penalized by Medicare.

Geisinger Health System in Pennsylvania, an innovator in the advanced use of data, has studied the characteristics of readmitted patients and identified risk factors such as pulmonary disease, heart failure, and advanced age. Among patients with those factors, who also had a previous admission in the past year, fully half will die or end up back in the hospital within 30 days of being discharged, according to Dr. Jonathan Darer, Geisinger’s chief innovation officer.

But though Geisinger uses staff calls, robocalls, and home health visits to monitor certain sets of newly discharged patients, the organization is so far not using its findings on readmissions in a meaningful way, Darer said during a recent NJ Spotlight webinar on big data. It continues to analyze a long list of variables, including the patient’s home situation and other factors, in an effort to refine its predictive power.

Meanwhile Brenner, who has won plaudits and awards for pioneering uses of patient health records, criticizes health IT advocates who he calls “obsessed” with prediction. Instead of focusing on possible future illness, he says healthcare organizations should get better at surveillance, drilling down into data and building systems that alert them to current patients’ problems.

“So we want to know, ‘Tell me which person is going to be hospitalized three months from now so I can call them on the phone.’ Meanwhile, the hospital is full of sick people who’ve been back over and over and over,” Brenner said during the Princeton conference. “Or, this month there’s a women in Camden who’s been to the emergency room three times for sexually transmitted disease. No one is going to call her, no one is going to follow up, her primary care provider is unaware of it. So that’s a failure to surveil data.”

Brenner is best known for treating poor, chronically ill “super-utilizers” who generate astronomical medical costs. His organization, the Camden Coalition of Healthcare Providers, identifies them by looking at maps of ambulance calls or hospital admission records, or simply by asking doctors. Nurses and social workers visit those people and find out what they need — reminders to take medications, drug rehabilitation, or better housing, for example — and make sure they get it rather than repeatedly going to the emergency room for help.

Mostashari cited a similar effort at a San Diego hospital system that received a grant from the federal Beacon Community program to make better use of information technology. He said it achieved $8 million in savings by focusing on just 32 high-cost patients, including one woman who was continually calling for ambulances, according to the system’s records.

“They’d had 100 ambulance dispatches going to her house, and not a single transport,” he recalled. “No one had stopped to say, ‘And what happens when you go to her house?’ They said, ‘Usually we make her a sandwich.’ So they got her Meals on Wheels. It’s a lot cheaper than scrambling a rig.”

A year of emergency room visits and hospital visits.

Beacon hospital and others have also succeeded in improving health outcomes by installing and exploiting better communication and records systems. These may let ambulances send information about a patient ahead to the hospital, or keep a primary-care doctor in the loop when a patient sees another provider or visits the ER.

Such improvements are essential for the new accountable care organizations, or ACOs, that have sprung up since the passage of the Affordable Care Act. Hospitals and doctors in ACOs are paid for making sure members of their community undergo scheduled tests and stay well, particularly people with chronic conditions. Such systems require electronic health records, which often can be configured to send alerts to doctors, nurses, or even patients when gaps in care arise.

Digitizing ‘Bundles’ of Medical Procedures To Ensure Patients Get Complete Care

Geisinger Health System built a computer-based system that alerts nurses and other health practitioners when patients need to come in for tests, reducing so-called care gaps

Geisinger Health System began digitizing health records at its hospitals in rural Pennsylvania in the mid-1990s, well before most other providers. The system, which includes both providers and health plans, then created bundles of clinical care processes — a set of steps for every patient with a particular medical condition — and used its electronic records database as part of a reengineered workflow to make sure every step was followed.

Geisinger applied the method to a range of populations, such as patients who needed mammograms, had abnormal pap smear results, were identified as diabetic, or were at risk of dying from chronic kidney disease, said Dr. Jonathan Darer, the system’s chief innovation officer.

For kidney disease, the bundle includes annual tests of blood pressure, urine protein, cholesterol, and several other measures. Initially, only 3 percent of patients were getting all their tests. As part of the newly created workflow, the computer system alerted nurses to order the tests needed for each patient, increasing compliance to about 12 percent in two years. A diabetes bundle provided similar results, while a bundle for people with coronary artery disease boosted the percentage to 20 percent from less than 10 percent.

Nurses became overwhelmed with their increased duties, so in 2011 Geisinger began having the computer generate orders for tests automatically, reducing the nurses’ workload.

“We send physicians an enrollment order that says, these are all the patients with these conditions, and if you sign on this order, you’ll give us permission to data mine that patient’s chart every month for five years, and generate the needed orders for all those conditions,” Darer said during an NJ Spotlight webinar. “And if the physician signs on, that’s the last thing they need to do.”

One Geisinger program alerts nurses to call patients with upcoming appointments so they can schedule tests and help the patient prepare for the visit, greatly aiding doctors, he said. Another identifies care gaps, and uses autodial and nurse phone calls to remind patients to make needed appointments.


Geisinger’s efforts have had real impacts on health outcomes and costs. An example is a project that identified older women at high risk for osteoporosis, which leads to bone fractures associated with greatly increased mortality.

Staff called and wrote letters to the women and arranged hospital visits with educational sessions and physical exams. As a result they were much more likely to get tested for osteoporosis and to be prescribed medication. Over a five-year period, Geisinger saw a drop in hip fractures among the patients tracked, particularly those 85 and older, preventing an estimated $7.8 million in treatment costs, according to a study of the program.

At Geisinger, doctors design care bundles for target populations, such as people with diabetes. A bundle includes specific items — vaccinations, blood-pressure readings, and glucose tests, for example — that nurses order up, or that the computer automatically turns into work orders for providers. In population after population — people with diabetes, coronary disease, osteoporosis, and other conditions — the system has resulted in better patient outcomes, Darer said.

Mashing Up Data

Beyond the clinical setting, careful analysis of large datasets can also reveal global patterns of disease and help policymakers decide how to channel resources.

Optum, a leading health analytics firm, has done large-scale hot spotting for a number of states, including Maryland, which has been working to make its Medicaid program more efficient. For example, Optum discovered a high rate of emergency-room admissions for colds, a relatively minor illness, and found that one hospital accounted for most of the visits, said Dr. Lewis Sandy, the senior vice president for parent company UnitedHealth Group.

With that information, the state could encourage the hospital and those patients to manage their colds using less expensive alternatives to the emergency room.

In New Jersey the company created a statewide map down to the level of census tracts showing the prevalence of diabetes. That could be used to identify problems such as food deserts, where healthy food is hard to find, and drive improvements in program like Medicare and Medicaid, Sandy said.

“It’s not just data from the healthcare delivery system. You can actually use data from personal health records, patient surveys, from publicly available data, for example, from the U.S. Census, or from other government programs,” Sandy said during the NJ Spotlight webinar. “This information can be brought together to bring knowledge and insight to improve public health programs.”

At the cutting edge of big data mashups, developers combine public data with mobile devices to show where health problems are happening in real time.

To help people with respiratory conditions, the company Propeller Health created a device that attaches to an inhaler and uses publicly funded GPS signals to record where and when it is used, giving the patient a precise electronic record. In addition, officials in Louisville, Kentucky used the aggregate data to map out the worst locations for respiratory problems in their city and to examine how they corresponded to environmental factors. They then redeployed city resources to reduce air pollution.

Aneesh Chopra, the former chief technology officer for the federal government, cited the Louisville trial as an example of a project that can illuminate a health problem by generating and drawing on multiple sources of data.

“From a mathematical standpoint, the value of data isn’t one source itself — ‘Hey, this is a GPS source.’ It’s the mashup of multiple sources,” Chopra told the audience at the Princeton conference. “Adding one more data source on your proprietary data source doesn’t create value in a linear fashion, but actually creates value in an exponential fashion. So keep thinking about ways you can enhance or enrich your data with external data that is increasingly open.”

Greater openness about cost data is the goal of another growing movement within the healthcare sector. Insurance companies, either voluntarily or under legislative mandate, are increasingly releasing data on the actual amounts patients pay for different medical procedures, as well as measures of their outcomes.

More than a dozen states have or are creating all-payer claims databases (APCDs) so they can better understand the costs and quality of their healthcare systems. At the national level, three large insurers have given the Health Care Cost Institute cost data that consumers will be able to search using an online tool, and the organization recently won access to national Medicare claims data. Several universities have licensed the massive HCCI dataset so their faculty and students can use it for big data research.

Blurry Privacy Lines

The ubiquity of electronic data collection and the power of high-speed computer analysis have created a remarkably rich resource for innovation, but have also challenged established notions of privacy and even the definition of health data.

A frequently cited example of the dangers of big data comes from a New York Times article about analytics efforts at Target, the department store chain. By analyzing purchases, the company can determine with fairly high certainty if a customer is pregnant, and then will send her coupons for baby-related products. In the widely publicized incident, an angry father complained that the store should not be sending his teenage daughter such advertisements, only to apologize later after he learned she actually was pregnant.

Target did not have access to the young woman’s medical records, but did have her purchase history and potentially a wealth of financial, demographic, and other information obtained from data brokers and public sources. It was thus able to discern facts that had been known only to the woman and possibly her doctor.

“A big part of the big data project is not just analyzing information, it’s creating information,” Julie Brill, a commissioner of the Federal Trade Commission, said at the Princeton conference. “From innocuous retail purchases, health information is created.”

Brill said the federal government set up rules to protect consumers’ confidential information in the 1990s through the Health Insurance Portability and Accountability Act (HIPAA), the Fair Credit Reporting Act, and other legislation, but the laws do not address a newer generation of companies and products that have sprung up since then.

Brill raised the spectre of companies using proprietary and public data to learn whether a specific individual has diabetes, cancer, or mental illness, possibly when the person is ignorant of his or her condition. Using information on car ownership and other data, search firms have guessed that families are obese or diabetic and asked them to join a medical trials, discomfiting some of those contacted, she said. Wearable devices like Fitbits record a user’s physical activity, but the person may not have complete control over how the data is used.

Even as privacy experts warn about gaps in legal protections, others chafe at existing restrictions that require patients to give explicit consent for most uses of their personal health information.

Janet Currie is the Henry Putnam Professor of Economics and Public Affairs at Princeton University and the Director of Princeton’s Center for Health and Well Being.

Princeton’s Currie has mashed up data from different sources to gain new insights. In addition to analyzing the relationship between foreclosures and poor health, another of her studies show correlations between flu season and premature birth.

In addition, the requirement that researchers use only deidentified data, from which names and other details have been removed, makes it difficult to do longitudinal studies that track super-utilizers or to review the effects of a drug over time, said Joel Cantor, director of the Center for State Health Policy at Rutgers.

For such projects a researcher needs to have all the hospital admission data for each particular person being studied. The deidentification efforts required exceed what understaffed and underfunded state agencies can do, he said.

“They said they no longer have the capacity to do that. We’re asking for too much,” Cantor said during the Princeton University conference.

A number of reforms and new systems have been suggested to ameliorate both privacy gaps and access problems.

To respond to concerns about how patient data is used, “baseline privacy legislation” is needed at the federal level, Brill said, while acknowledging such laws are not currently in the offing. HIPAA and other legislation could be amended to recognize that health data exists in places beyond clinical and insurance databases. Sound data management practices, risk analysis, privacy officers, and audits must be standard at any firm that handles sensitive information, she and other experts said.

To prevent privacy rules from handicapping data analysis, government agencies could use computer systems that let researchers submit statistical queries and get answers without possessing the data, said Edward Felten, a professor of computer science and public affairs at Princeton. Brill said the U.S. Census uses such a system and could serve as a model for others.

As for the use of personal information outside of traditional healthcare settings, Chopra argues for engaging patients and teaching them how to view their own data. He advocates a control-panel model in which people are encouraged to actively decide how they want their data used and can easily opt out of giving access.

Others argue more aggressively for releasing data, while using institutional review boards or ethics review committees to weigh the potential benefits and risks.

A recent Health Affairs article on ethical concerns in predictive analytics said patients should be included in the early stages of big data projects, but developers also “should be allowed to use already collected patient data without explicit consent, provided that they comply with federal regulations regarding research on human subjects and the privacy of health information.”

Meir Rinde is a freelance writer based in Philadelphia..