Centralized architectures are not the answer for AI in medicine
While the temptation to address healthcare issues such as inequality and lack of access by defaulting to old-school methods and training more doctors, is understandable, there is a more powerful treatment for what ails today’s healthcare systems: Artificial intelligence (AI) applications.
Of course, not all AI is created equal. Consumer AI applications have made dramatic progress over the last decade based on deep learning technology and the availability of large quantities of diverse data. However, the centralized architecture approach of consumer applications will not work for AI applications in medicine.
As we discussed in the previous episode, even in adult medicine most of the data comes from just 3 states and 2 countries. It’s not that the data isn’t out there. Take pediatric echo cardiograms, for example. The 500 children’s hospitals in the world, located in at least 25 states in the US and at least 40 countries around the globe, produce a staggering 6,000,000 TB of pediatric echo data per year. To put that in context, 6M TB is 100,000x the amount of all the data currently available in NIH’s centralized Imaging Data Commons. Imagine the accuracy that could be achieved in diagnosing every pediatric cardiology condition, both locally and globally, if this pediatric echo data were used to train AI applications.
If you’re wondering why we couldn’t just repeat ImageNet, the centralized database that makes 14+ million images available to train AI algorithms, and simply aggregate our pediatric echo (as well as all our other pediatric imaging) data in some central site, it’s a good question. Let’s consider the possibility by first choosing a location, let’s say a data center in Ireland, and then think about the realities of this sort of centralized proposition. As you’ll see, we quickly find ourselves facing at least five significant challenges.
1. Centralized architectures are not network preserving.
Centralized architectures demand higher bandwidth and more expensive networks. So, one of our first considerations will be the network cost. We of course want global participation, right? So we’ll have to be able to move the cardiac echo data to Ireland from Gertrude’s Children’s in Kenya, Hospital de Criança e Maternidade (HCM) in Brazil, as well as from Nemours Children’s Health in the US. Of course, we’ll also need to factor in that some locations will have less than reliable network connections. And, as everyone familiar with today’s cloud service providers knows, there will be additional network costs associated with the transfer data from our centralized site in Ireland to any other location.
2. Centralized architectures are not application friendly.
The process of aggregating data in a centralized architecture, repository, or data commons inherently requires that that data be organized. Anytime a centralized database, like our hypothetical one in Ireland, is created, along with it comes a schema, or a designated way to organize the data. Unfortunately, a schema perfectly designed for one application is nearly impossible to use for another application. Years of experience in enterprise software have taught us this lesson. Locking in a schema that makes data friendly in one application makes it unwelcome (if not downright rejected) in another.
3. Centralized architectures are not real-time.
To understand why real-time access to accurate AI-generated learning is important, simply imagine if an autonomous car had to access a central server to decide what to do at every turn or stop sign, or to avoid a pedestrian. The time it would take to ask and retrieve each decision from such a centralized architecture based in Ireland (or anywhere, for that matter) would be entirely unrealistic. The same challenge would hold true for any real-time pediatric application.
4. Centralized architectures are not privacy preserving.
Let’s assume we were to agree to aggregate all the pediatric echo data from around the world in Ireland. How would we preserve privacy for a patient in California whose data is sent to Ireland? How would we control who was able to access that data? And how might we set limits on which data is shared? Someone in Ireland with no connection to the patient may have access not only to a patient’s pediatric echo data, but their personally identifying information as well.
These questions bring to the forefront a really important consideration of data sharing and privacy known as “Purpose Limitation”. In a world where we increasingly expect specified data to be used only with our permission, by certain people and for clearly defined purposes, accumulating data in a centralized architecture offers the opposite, with no stated purpose other than keeping the data for the future. The pooling of data in this manner, in Ireland or any other country, does nothing to preserve privacy.
5. Centralized architectures can violate data residency/sovereignty.
Organizations sometimes require that their data is stored in a specific location or region within the country. Sovereignty takes issues of data control a step further by subjecting data to the laws of the country in which it resides. More simply put, and nothing against Ireland, but there would almost certainly be many countries that would object to having their citizens’ healthcare data under Irish rule.
While centralized AI architectures have done an impressive job of powering the development of many consumer AI applications to date, there are multiple reasons why this is not the right approach for the training and deployment of AI applications for use in children’s medicine. Centralized architectures are simply not application friendly, not network preserving, not real-time, and not privacy nor residency preserving. So what’s the answer? How can these objections be addressed so we can build and deploy accurate AI applications for children’s medicine? What eight attributes would a decentralized architecture need to support?
If you’d like to stay up to date on our progress to the moon, register for the newsletter at www.pediatricmoonshot.com, follow us on LinkedIn, subscribe to the Pediatric Moonshot podcast, listen to the Spotify playlist, and subscribe to the Pediatric Moonshot Youtube channel.
Many thanks for extensive editing by Laura Jana, Pediatrician, Social Entrepreneur & Connector of Dots; Leanne West, Chief Engineer of Pediatric Technology at Georgia Tech. Special thanks to Alberto Tozzi, Head of Predictive and Preventive Medicine Research Unit at Ospedale Pediatrico Bambino Gesù for the translation to Italian.