American Association for Physician Leadership

Peer-Reviewed

Questions to Ask When Considering the Use of Big Data

Kenneth G. Poole, Jr., MD, MBA, FACP, CPE


David Upjohn, MS


Eric Pool, EdD, PMP, ITIL


James S. Hernandez, MD, FCAP


Nov 2, 2023


Physician Leadership Journal


Volume 10, Issue 6, Pages 8-12


https://doi.org/10.55834/plj.5302561406


Abstract

Savvy healthcare leaders must understand big data and how invaluable big data platforms are for improving patient care and generating potential sources of revenue. Leaders must acknowledge, however, the potential financial, legal, security, and ethical pitfalls when others work with their de-identified patient healthcare data. The authors provide a guide for healthcare leaders on how to use the sensitive and important patient data they house.




Many healthcare organizations are considering how they can use big data (large volumes of patient data that require nuanced management) to help patients and the healthcare community. In doing so, questions often arise: When is the right time to engage big data? What is the responsibility of the organization’s leadership and what are the terms of engagement? What is the best way to optimize the value and use of the data?

These questions and others are addressed herein through a review of definitions and examples of big data. This information should help leaders of healthcare organizations and those with limited digital capabilities use their data effectively and develop platform strategies. Such organizations may need guidance to understand the implications of working with big data vendors and to determine whether to collaborate with larger healthcare organizations or other third parties. In addition, knowledge of the legal aspects surrounding big data is critical because patients may have ethical concerns about their data being included in a big data set, even if their data is legally unidentifiable.

Healthcare leaders should ask the following questions before engaging in a big data agreement.

  • Are there rules and regulations that must be adhered to?

  • How are the practitioners in my organization collaborating with others?

  • How would alignment with other organizations help my patients?

  • How can I ensure my patients’ data are safe?

  • How can this process facilitate trust with the communities I serve?

  • Is my patient population diverse?

  • How should this process be governed?

Without answers to these important questions, conflict over data use (privacy vs. public good) may continue, making promoting big data difficult. To promote big data, strong legal and ethical policies are needed that focus on reducing public concern over big data while positively affecting patient outcomes. With the answers, healthcare organizations can use big data and platforms to improve patient care and potentially improve financial performance.

FIRST, IS THIS EVEN LEGAL?

Before an organization participates in big data analytics, the threshold legal question must be answered: Is the use of big data legal? Although disclosing or commercializing patient data is generally illegal for identifiable data, once de-identified, there is a surprising amount of flexibility under current law.(1) This is consistent with a policy rationale that these data hold great promise for good, and that if de-identified, the risk to the individual is reasonably mitigated while the good to society is maximized.

De-identification is far more easily discussed than accomplished. U.S. federal and state laws typically apply only to identifiable data. Data can be de-identified by removing all 18 prescribed elements (e.g., name, dates, values) or by removing or masking enough data so that a statistician can opine that the risk of re-identification is minimal when the data are combined with other publicly available data.

However, removing too many patient identifiers can render data unhelpful. The statistical analysis, alternatively, can be complex and difficult to accomplish technically. De-identification is especially difficult to automate for unstructured data. The statistical analysis must be refreshed when external factors change. The law does not require, and technology cannot guarantee, perfect de-identification. That said, meeting this standard makes it unlikely that patient data can be re-identified.

The European Union’s requirements for “anonymization” are even stricter than its requirements for de-identification. This area of law will continue to evolve and may, in the future, affect the use of de-identified data for commercial or other purposes. In this situation, the law trails innovation and ethics. The trend is for more regulation of the use of identifiable personal data using controls such as notices, consents, opt-outs, audits, and revocations. It is not entirely clear the extent to which de-identified data will be further regulated in the future or accorded individual rights.

COLLABORATING WITH OTHERS

Because big data technology is required for continued innovation and advances in healthcare, the question of collaborating with other healthcare organizations seems to be moving from “Should we?” to “How?”. These collaborations generally are partnerships, and healthcare organizations can benefit from a better understanding of how to partner with others in the use of big data and in knowing what types of questions to ask during this process.

For example, an organizational initiative at Mayo Clinic is the Mayo Clinic Platform. The platform is based in the cloud and allows sharing of de-identified(2) patient data from Mayo Clinic and a network of international health systems. This collaboration requires strong security oversight and technical assurances as well as effective strategic partnerships to meet stakeholder expectations. Participants of the platform interact with data in a secure fashion that enables unprecedented innovation while also having strong governance.

However, not all partnerships have met patient and public expectations regarding transparency, such as Project Nightingale, a partnership between Google and Ascension Healthcare. The public data security concerns that arose from Project Nightingale underscore the importance of data privacy, given the transfer of data between the two organizations.(3) The lessons we need to learn from Project Nightingale and others like it (e.g., Google’s efforts with the University of Chicago) come back to the essential need for trust.(3) Big data partnerships can be successful or fail, which is why it is important to know what to ask when partnering and how to do so successfully.

Data-Sharing Methods

One model to consider in big data sharing is a public-private partnership (PPP). A PPP is characterized by collaboration between a government agency and the private sector. A PPP creates a cooperative agreement across parties.(4) However, the use of PPPs does not guarantee a successful partnership. Organizations still need to ask important questions and focus on a mutually beneficial partnership.

If organizations are to be successful when sharing big data, they must also answer questions concerning the privacy of patient data. In his article “Sharing Is Caring — Data Sharing Initiatives in Healthcare,” Hulsen(2) describes three options that organizations should consider when partnering: distributed learning and the platforms Personal Health Train (PHT)(5) and DataSHIELD.(6)

Distributed learning is based on a statistical model rather than patient data, allowing for a safer collaborative environment between organizations because of the lack of sensitive data.(2) PHT works by using algorithms to analyze data where the data reside rather than by creating a centralized database. DataSHIELD analyzes data through commands originating from a central computer that communicates with other computers housing the needed data.(2) The partnering organizations should discuss what makes the most sense on the basis of experience and specific project needs.

Organizations partnering on the use of big data also should ask key questions about social license and community acceptance. Social license has strong ties to trust; therefore, organizations should ask how they can uphold both patient and community trust. They should consider including information on transparency, regulation, and good governance.(2) Partnering organizations should also include these areas in their PPP documents to help them communicate and follow the processes outlined.

Aligning with Larger Organizations

A collaborative approach to big data and information sharing among organizations can have substantial patient-care benefits. Big data are collected, stored, and combined at scale. Small healthcare organizations and medical practices with only a few providers often lack the volume and diversity of patient data to derive reliable population and community health trends. Furthermore, small, homogenous data sets lack the power to drive statistically significant, generalizable research and the ability to create scalable health solutions.

A collaborative approach enables predictive analytics and data analysis for diagnostic tools, precision medicine, disease prevention, medical research, and reduction of prescription errors, which in turn can improve quality.(7)

Patient health information and other related data can come directly from traditional healthcare clinical information systems and from commercially available devices and apps. Companies with big data, such as Google and Apple, can provide access to prominent major health systems. Small organizations likely will not be able to leverage such partnerships independently. Still, the reluctance to share and pool data is a valid concern for organizations of all sizes, given concerns related to policies and regulation, privacy, and academic reluctance.

Fortunately, numerous solutions are available to alleviate the apprehension of healthcare organizations to share data. These include open science initiatives, which can be thought of as public open access to data; use of federated data systems, such as PHT; standardized consent forms for collection; medical crowdsourcing platforms; and increased development and use of data generalist experts.(2)

ENSURING YOUR DATA ARE SAFE

Once the decision is made to contribute patient data to a big data project, due diligence is needed to choose a partner and evaluate its information security practices. Of note, any organization that has customer data should be conducting its own due diligence, especially if it is housing sensitive patient data. This is also an evolving area, but external standards such as those of the National Institute of Standards and Technology (NIST) at the U.S. Department of Commerce (www.nist.gov ), the Health Information Trust Alliance (HITRUST) (www.wolfandco.com/hitrust-certification ), and Service Organization Control 2 (SOC 2) (https://us.aicpa.org/interestareas/frc/assuranceadvisoryservices/aicpasoc2report ) are indicators of a quality security program.

The mission of NIST is to promote U.S. innovation and industrial competitiveness by advancing measurement science, standards, and technology in ways that enhance economic security and improve quality of life. HITRUST ensures that healthcare organizations prioritize information security. SOC 2 provides audit reports focused on information security.

Ideally, third-party vendors implement these standards as a means of third-party risk management for organizations. Relatedly, the vendors can attest to an organization’s information security and the extent to which they successfully safeguard the data they store.

An SOC 2 type 2 audit, for example, is an audit that takes place over a defined time, typically between three and 12 months, that reviews an organization’s security in terms of the controls it has in place to safeguard customer data. The auditor reviews the company’s written policies and procedures as well as evidence that the policies and procedures are being followed. The auditor also examines the systems integrations the company has with other vendors to determine whether any of those connections could lead to risks.

During the audits, penetration tests are generally conducted: A third-party company runs a series of simulated attacks on an organization’s information technology (IT) systems to learn whether access can be gained to sensitive information. The penetration tests demonstrate a company’s information security maturity and dedication to safeguarding its patient data.

Organizations are advised to hire a knowledgeable lawyer who can oversee contractual provisions for information security standards, audits, liability, indemnification, cyber insurance applicability, and notification of a breach. As big data grow in volume and value, increasing numbers of wrongdoers will attempt to access sensitive data, but there are also increasing numbers of information security companies and professionals who can serve as guides and gatekeepers for those beginning the big data journey. Organizations must continually audit their own systems, but they must also require evidence of information security from any company with whom they partner.

BUILDING COMMUNITY TRUST AROUND BIG DATA

To build trust with a community, a company should be transparent about how data are stored and used, ethical brand development and company culture, informed consent and controls for patients, and proof of IT security and privacy. Any company holding consumer data should view itself as the custodian of the data and take responsibility for its security as the guardian entrusted with sensitive information. Good intentions around the data that a company stores must be backed by proper due diligence to ensure that data remain secure.

Laws require company compliance with the protection of healthcare data. But trust goes beyond checking the boxes of the Health Insurance Portability and Accountability Act (HIPAA). Healthcare data are some of the most sensitive and personal data created, and laws such as the HIPAA Privacy Rule, which addresses the privacy and security aspects of protected health information, have created a guide for the storage and use of healthcare data. Just being in compliance with these laws, however, will not create long-term trust with patients.(8)

Trust requires ongoing proof of IT security and privacy. Data breaches erode the trust consumers have in an organization; therefore, IT security must continuously improve to meet new threats for any organization that stores patient data, and companies partnering with healthcare organizations need to be transparent with their partners. Transparency can be increased by providing clearly articulated informed consent that outlines all potential data use and accessible privacy policy. Transparency is also connected to how much information is on an organization’s website about data security and usage.

Organizational brand and culture also have a key role in the development of trust. Brands can inspire trust in patients by communicating the healthcare organization’s patient-centric mission and business goals. Patients understand that data are valuable, so an organization’s mission and transparency around how data are used can inspire trust in patients.

THE IMPORTANCE OF PATIENT DIVERSITY

For big data to have a central role in the development of medicines and therapies, data sets need to include data from diverse populations. However, promoting inclusive and diverse population health is not without challenges. Many social determinants affect this space, including internet access, life expectancy, access to education and level of education achieved, economics, and place of residence.

Racial and socioeconomic bias is also an area where problems have arisen because of the way we build and use machine learning. For example, primarily light-colored skin was used in machine learning to detect skin cancer,(9) and in a study using Framingham data, an algorithm underestimated cardiovascular mortality by 48% in persons who were resource-limited.(10,11) These examples highlight the need for attention to diversity in the use of big data while also underscoring the need for high-quality data.

Data quality also needs to be addressed. In the article “Putting the Data Before the Algorithm in Big Data Addressing Personalized Healthcare,” Cahan, et al.,(10) identify two types of bias. The first is “... sampling bias — whereby certain patient cohorts are absent from the inputs — yields nonrepresentative algorithmic outputs,” and the second is “... observation bias, denoting systematic miscalibration of measurement.”

If we are to address the issues with data quality and help promote inclusive and diverse populations, bias must be addressed. In his book Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again, Topol(12) also addresses the negative impact bias is having on people who are resource-limited. Ignoring bias will only make the problem worse. Much like addressing diversity and inclusion in the workplace, these important topics must also be addressed at the foundation in big data.

LEADERSHIP, GOVERNANCE, AND ETHICS

After the above considerations have been managed, leaders of healthcare organizations still need to make business and strategic decisions about whether and how to engage their patients’ data in the promise (and the attending risks) of big data. Big data is the future for healthcare innovation, research, and, increasingly, patient care. However, if the data pool is breached, an organization’s name may be associated with that story. Whether to engage deserves strategic discussion with the organization’s board of directors and other stakeholders, including staff and, potentially, the patient community.

In addition, many important philosophical questions arise for deciding whether to engage with big data: Is this consistent with the organizational mission and values? Does the promise of progress for cures outweigh the risks? What type and level of consent should be used even if not legally required? How granular should that consent be? Should patients benefit financially from commercial uses of their data? Which direction will the law and public opinion trend? Does the risk appetite permit a breach or possible re-identification of this data, however unlikely?

Key areas to consider when partnering around big data include:

  • Public-private partnerships are good for initial exploration but are not without risk.

  • Small organizations still have leverage in data-sharing agreements.

  • De-identification is essential but can be challenging.

  • Use of third-party vendors for data security is advised.

  • Transparency about data use is key to building patient and community trust.

  • Diversity should be prioritized in data collection.

  • Big data use must be formally governed by organizational leadership.

FINAL THOUGHTS

Healthcare organizations should view themselves as stewards and custodians of their patients’ data, which should be used to guide and treat patients and populations toward better health. There are many positive reasons for contributing to big data in healthcare. Big data can be used to help develop new treatments and cures and to promote health equity. To do so, developing and maintaining trust with patients is of utmost importance.

In 2021 alone, healthcare data breaches affected 45 million people.(13) Healthcare organizations must place the utmost importance on information security for their data, and when considering partnering with another organization, organizations should carry out proper due diligence to ensure that the partner organization is secure as well.

In addition, support and incentives are growing for organizations and initiatives committed to big data. The National Institutes of Health, for example, has awarded close to $75 million through 19 grants that will collectively establish a data science platform and coordinating center in Africa.(14)

Additionally, the aftermath of the COVID-19 global pandemic has accelerated the development and use of data platforms in a variety of industries, particularly healthcare. An example is the increasing popularity of virtual health. Now healthcare systems are likely to rely more on artificial intelligence and big data.(15) Healthcare systems are thinking nontraditionally in seeking to transform care, and big data appears to be a mechanism full of untapped potential.

Big data platforms have the potential to improve patient care and be a potential revenue source. Technological and scientific advances have made sharing of big data safer.(16) Before partnering with a big data company, however, healthcare organizations need to know how to share data safely and reliably. We hope the questions we have discussed will help healthcare leaders know what questions to ask and what to consider when deciding whether to use big data.

Acknowledgment: The Scientific Publications staff at Mayo Clinic provided editorial consultation and proofreading, administrative, and clerical support. The authors also acknowledge Kim Otte for her initial work in drafting the manuscript and Jeff Wu for his help with final additions. The authors did not use OpenAI GPT or a similar tool to write any portion of this article.

References

  1. U.S. Department of Health and Human Services. Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Accessed June 14, 2023. www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html

  2. Hulsen T. Sharing Is Caring–Data Sharing Initiatives in Healthcare. Int J Environ Res Public Health. Apr 27 2020;17(9). https://doi.org/10.3390/ijerph17093046

  3. Trinidad MG, Platt J, Kardia SLR. The Public’s Comfort With Sharing Health Data With Third-Party Commercial Companies. Humanit Soc Sci Commun. 2020;7(1). https://doi.org/10.1057/s41599-020-00641-5

  4. Ballantyne A, Stewart C. Big Data and Public-Private Partnerships in Healthcare and Research: The Application of an Ethics Framework for Big Data in Health and Research. Asian Bioeth Rev. Sep 2019;11(3):315–326. https://doi.org/10.1007/s41649-019-00100-7

  5. GO:FAIR PHT Implementation Network. Personal Health Train PHT-meDIC. Accessed May 25, 2023. https://personalhealthtrain.de/

  6. DataSHIELD Secure Bioscience Collaboration. Accessed May 25, 2023. www.datashield.org/

  7. NEJM Catalyst. Healthcare big data and the promise of value-based care. NEJM Catal Innov Care Deliv. 2018.

  8. Solove DJ. HIPAA Turns 10. J AHIMA. Apr 2013;84(4):22–8; quiz 29.

  9. Adamson AS, Smith A. Machine Learning and Healthcare Disparities in Dermatology. JAMA Dermatol. Nov 1 2018;154(11):1247–1248. https://doi.org/10.1001/jamadermatol.2018.2348

  10. Cahan EM, Hernandez-Boussard T, Thadaney-Israni S, Rubin DL. Putting the data Before the Algorithm in Big Data Addressing Personalized Healthcare. NPJ Digit Med. 2019;2:78. https://doi.org/10.1038/s41746-019-0157-2

  11. Brindle PM, McConnachie A, Upton MN, Hart CL, Davey Smith G, Watt GC. The Accuracy of the Framingham Risk-Score in Different Socioeconomic Groups: A Prospective study. Br J Gen Pract. Nov 2005;55(520):838–845.

  12. Topol E. Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again. Basic Books; 2019.

  13. Landi H. Healthcare Data Breaches Hit All-time High in 2021, Impacting 45M People. FIERCE Healthcare. Accessed October 17, 2022. www.fiercehealthcare.com/health-tech/healthcare-data-breaches-hit-all-time-high-2021-impacting-45m-people

  14. National Institutes of Health. NIH Awards Nearly $75M to Catalyze Data Science Research in Africa. News Release. Accessed October 17, 2022. www.nih.gov/news-events/news-releases/nih-awards-nearly-75m-catalyze-data-science-research-africa

  15. Tabata RC. The Future Challenges of Big Data in Healthcare. Forbes. June 18, 2021. Accessed October 17, 2022. www.forbes.com/sites/forbestechcouncil/2021/06/18/the-future-challenges-of-big-data-in-healthcare/?sh=67326f4346b2

  16. Shortreed SM, Cook AJ, Coley RY, Bobb JF, Nelson JC. Challenges and Opportunities for Using Big Healthcare Data to Advance Medical Science and Public Health. Am J Epidemiol. May 1 2019;188(5):851–861. https://doi.org/10.1093/aje/kwy292

Kenneth G. Poole, Jr., MD, MBA, FACP, CPE

Kenneth G. Poole, Jr., MD, MBA, FACP, CPE, is the chief medical officer for clinician and provider experience UnitedHealth Group in Minnetonka, Minnesota. He previously was the medical director of patient experience at the Mayo Clinic in Arizona and served on the Mayo Clinic Alix School of Medicine Admissions Committee.


David Upjohn, MS

David Upjohn, MS, is the operations manager for the Department of Otolaryngology – Head & Neck Surgery at Mayo Clinic in Arizona and an instructor in healthcare systems engineering in the Mayo Clinic College of Medicine and Science.


Eric Pool, EdD, PMP, ITIL

Eric Pool, EdD, PMP, ITIL, is a lead analyst at Mayo Clinic and assistant professor of health care administration at the Mayo Clinic College of Medicine & Science. He is also an instructor at Harvard University and UC Berkeley.


James S. Hernandez, MD, FCAP

James S. Hernandez, MD, FCAP, is an emeritus associate professor of laboratory medicine and pathology and the past medical director of the laboratories, Mayo Clinic in Arizona.

Interested in sharing leadership insights? Contribute


For over 45 years.

The American Association for Physician Leadership has helped physicians develop their leadership skills through education, career development, thought leadership and community building.

The American Association for Physician Leadership (AAPL) changed its name from the American College of Physician Executives (ACPE) in 2014. We may have changed our name, but we are the same organization that has been serving physician leaders since 1975.

CONTACT US

Mail Processing Address
PO Box 96503 I BMB 97493
Washington, DC 20090-6503

Payment Remittance Address
PO Box 745725
Atlanta, GA 30374-5725
(800) 562-8088
(813) 287-8993 Fax
customerservice@physicianleaders.org

CONNECT WITH US

LOOKING TO ENGAGE YOUR STAFF?

AAPL providers leadership development programs designed to retain valuable team members and improve patient outcomes.

American Association for Physician Leadership®

formerly known as the American College of Physician Executives (ACPE)