Artificial intelligence, machine learning, and large language models typically use de-identified data for development. Although there are many de-identified and anonymized datasets available that confer some knowledge to models, this type of data preparation has been shown to strip out essential information, introduce biases, and result in a homogenized data set that lacks the precision required for personalized treatment. Utilizing Microsoft Azure confidential computing, BeeKeeperAI® is enabling privacy-enhancing computing on real-world, personally identifiable information (PII) and protected health information (PHI) across the lifecycle of model development to deployment, thereby significantly shortening the time from bench to bedside.
“Through our collaboration with Microsoft and our platform’s integration of Microsoft Azure services, we provide an automated zero trust software infrastructure that addresses the key risks associated with model development, including the protection of patient privacy and model intellectual property, enabling the organizations responsible for protecting patient privacy to maintain control of the data and enhance its security while they participate in innovation.”
Michael Blum, MD, Co-founder and CEO, BeeKeeperAI
BeeKeeperAI’s patent-protected EscrowAI™ integrates and automates the use of Azure services, including confidential computing, into a collaboration workflow for AI/ML model development that protects PII/PHI and model intellectual property at rest, in transit, and in use. This disruptive protection enables improved models, an acceleration of the current development timeline, and optimized security.
Thirty percent of the world’s data volume is generated by the healthcare industry[1] and produced at a daily rate of 137 terabytes[2] within an average-sized hospital. The current paradigm of gaining access to data for model development is based on sharing and seeing the data—removing the data from the data stewards' direct control. When access to PII and PHI is even possible, it typically occurs under the auspices of a clinical trial. Trial initiation can take 12-18 months to complete contracting and another 3-6 months to secure the Institutional Review Board's permission to gather and use the data. Only then can the process of subject identification, recruiting, consenting, and data collection begin. Combined, it can take 18–36 months and considerable cost to secure the appropriate approvals to use the PII/PHI.
The approval process exists to minimize the risk of patient privacy breaches. However, during 2022 there was a 233 percent[3] increase in US data breaches costing healthcare organizations an average of $10.1M compared with $4.45M experienced in other industries.[4]
AI technologies are predicted to deliver transformative improvement to the Health and Life Sciences (HLS) sector and generate $200-360B[5] in annual US healthcare savings. Unfortunately, most models remain in the early development phase, trained only on de-identified or synthetic data. As developers move into the pre-commercial phase, they face an almost insurmountable challenge in gaining access to the data the model will face in the treatment environment (the real-world, PII/PHI data).
BeeKeeperAI is a spin-out of the University of California, San Francisco’s (UCSF) Center for Digital Health Innovation. While at UCSF the founders worked on industry-sponsored research projects to validate and train AI models to improve outcomes. It was during this work that they realized how difficult, if not impossible, it was to gain access to PII/PHI in an era of ever-increasing cyber-attacks and heightened requirements for preserving data privacy. They realized that the paradigm had to change to enable efficient and secure direct use of PII/PHI.
Seldomly secured
Within the non-confidential computing environment, it was difficult/impossible to:
- Gain access to PII/PHI to improve the sensitivity and specificity of clinical models.
- Protect the intellectual property of models as they computed within data steward organizations.
- Protect patient privacy within de-identified data due to the risk of re-identification.
- Complete the approvals and contracting process in less than a year.
- Enable the data stewards to retain control of data once it was shared with a third-party.
In addition, there were certain data types that could not be de-identified resulting in its exclusion. This included PHI such as genomic data and PII such as social determinants of health. For certain diseases and conditions this type of data was essential for specific models.
The challenges meant that the models took far too long, if ever, to validate on real-world PII/PHI data and were not able to be calibrated to operate consistently in an actual clinical environment. As a result, the founders began to search for a better way to accelerate model development with a solution that would:
- Protect patient privacy across all data types.
- Enable data stewards to retain control of the data, even during computing.
- Protect the intellectual property of the model.
- Provide protection of the patient data and intellectual property at rest, in transit, and during computing.
Around that same time, Microsoft Azure confidential computing was being deployed into the fin/tech and government security sectors. The BeeKeeperAI founders had a hypothesis that confidential computing capability was a key component to addressing one of the greatest barriers to healthcare model development.
“We chose Azure because of its privacy and security for data and algorithms as well as the infrastructure needed to compute at scale, in the cloud, across multiple locations,” says Dr. Michael Blum, Co-Founder and CEO of BeeKeeperAI.
As various cloud environments were analyzed, the founders realized the data steward’s secure cloud environment was a critically important variable. It was imperative that BeeKeeperAI select a cloud partner with the most mature and secure confidential computing stack. It ultimately chose Microsoft Azure.
“The opportunity for AI to enable the delivery of better healthcare outcomes continues to expand exponentially, but developers are limited by access to real world clinical data to train and to deploy their algorithms. We are pleased to partner with BeeKeeperAI to help the healthcare industry develop the understanding and expertise it needs to leverage confidential computing within healthcare innovation,” says John Doyle, Global Chief Technology Officer, Healthcare & Life Sciences at Microsoft.
Thanks to a grant from Microsoft the founders successfully demonstrated in 2020 the use of confidential computing as it enabled the protection of a third-party model as it traveled to UCSF where it sightlessly computed on data within UCSF’s secure environment. Armed with the successful POC, the founders spun the intellectual property out of UCSF and established BeeKeeperAI, Inc. in February 2022.
“Through our collaboration with Microsoft and our platform’s integration of Microsoft Azure services, we provide an automated zero trust software infrastructure that addresses the key risks associated with model development, including the protection of patient privacy and model intellectual property, enabling the organizations responsible for protecting patient privacy to maintain control of the data and enhance its security while they participate in innovation,” Blum said.
Safe and secure
EscrowAI enables computing on PII/PHI within the data steward’s Azure tenant. EscrowAI has integrated and automated the use of Azure’s confidential computing virtual machines, attestation service, Blob storage, Network Address Translation (NAT) gateway for confidential compute nodes, and an isolated Azure Virtual Network (VNET) within its collaboration workflow. EscrowAI runs on Azure taking advantage of its inherent security.
Without the above capabilities, BeeKeeperAI would have faced a much longer product development cycle due to the need to build the security infrastructure required for its platform. In addition, the use of Azure services enables data stewards to activate EscrowAI in their tenant within 6-12-hours and the software is maintained by BeeKeeperAI requiring no maintenance from the data steward.
“The impact of what we are enabling aligns with the goals of the ever-increasing number of developers who are creating new ways to help to detect and support the treatment of diseases,” Doyle said.
Personalized treatment for everybody
EscrowAI accelerates the pace of innovation by enabling:
- Model developers to preserve their intellectual property and gain sightless computing access to privacy protected information across the model lifecycle.
- Data stewards to maintain data sovereignty, protect patient privacy, and secure ethical commercial arrangements to compute on real-world PII/PHI.
As a result of the added security and workflow features provided by Azure, EscrowAI accelerates the time to market for model developers as they strive to deliver transformative improvement to the Health and Life Sciences (HLS) sector.
Benefits
While AI technologies are predicted to deliver transformative improvement to the Health and Life Sciences (HLS) sector and generate $200-360B[6] in annual US healthcare savings this will not occur if AI technologies and other types of models are not able to compute on PII/PHI in privacy enhancing environments. Together, Microsoft Azure and BeeKeeperAI will accelerate the realization of innovation that enables personalized care on a scale that results in improving human life around the globe.
References:
[1] https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
“The impact of what we are enabling aligns with the goals of the ever-increasing number of developers who are creating new ways to help to detect and support the treatment of diseases.”
John Doyle, Global Chief Technology Officer, Healthcare & Life Sciences, Microsoft
Follow Microsoft