Gemini: Distributed AI Lab

Timothy Chou
6 min readMar 16, 2023

--

We have the ability to transform healthcare for children who are not geographically or socially lucky using artificial intelligence applications. As we’ve discussed, the traditional centralized architectures will not work to train and deploy accurate, real-time, privacy-preserving healthcare applications. That’s where federated learning for consumer AI applications comes in, having been developed on highly distributed architectures (millions of smart phones) to both preserve network and privacy.

This effort to build accurate, real-time, privacy-preserving AI applications for children’s medicine will require access to data from all 1,000,000 healthcare machines in all 500 children’s hospitals in the world — a Pediatric Moonshot. To make progress towards reaching the “moon,” we are building Gemini, a distributed AI lab. Much like a pharmaceutical research lab Gemini provides the basic tools and infrastructure to development of accurate, real-time, privacy preserving AI applications in cardiology, orthopedics, radiology and emergency medicine.

There are a wide variety of research questions. Can the software developed for consumer AI work in pediatric AI? What strategy should we use for learning? Given that it’s impossible to develop AI methodologies in the absence of data, with a Distributed AI Lab we intend to address and overcome that challenge.

Our Phase 1 goal is to implement a lab in 32 sites, with all real-time data available from imaging machines as well as offline data from PACS and EMRs to authorized applications. Furthermore we intend to fund 4–8 AI applications and take them from the research bench to the bedside. The lab will have five components.

1. A Distributed AI Infrastructure Cloud Service

As with the original moonshot, we too need a new rocket. BevelCloud has engineered distributed, in-the-building edge cloud service. Initially, there are five component services: edge compute, edge storage, edge network, edge data, and edge application services. These services were engineered with 37 security-by-design features, including fine-grain data sharing mechanisms. The edge servers also provide image sanitization with software partner, Glendor. Much like Apple has created a way to develop and deploy consumer applications, BevelCloud is doing the same, first with pediatric cardiology and then with orthopedic, radiology, cancer, and neonatology applications.

This BevelCloud distributed AI infrastructure offers solutions to some of the challenges that have previously plagued consumer federated learning applications. In particular, BevelCloud edge servers have continuous high bandwidth communication with the healthcare machines within the edge zone, as well as secure network communications outside the edge zones. Second, the edge servers are always powered on —this is important if you consider the hours of use for a typical ultrasound machine leave at least 16 hours 7 days a week of available compute power that can be dedicated to federated learning. Finally, because of edge data services architecture— every edge cloud application accesses identically formatted data whether the edge servers are in Vatican City, Orange County, or Sao Paolo.

2 2000 Servers, 2000 Twinned Imaging Machines

The second component is to deploy the distributed AI infrastructure in-the-building of 32 children’s hospitals on three continents. Furthermore authorized applications will have real-time access to non-compressed images from 2000+ imaging machines by 2000+ distributed cloud servers.

3 Large, Continuous Diverse Data Sharing

Twinning 2000 imaging machines will provide AI researchers access to real-time and raw (non-compressed) data. As the distributed servers are twinned directly, applications will not suffer from the current challenges of pulling compressed data from a variety of picture archiving and communication system (PACS) applications. During the first year of operation, an AI application can be trained on over 100,000+ Terabytes of diverse ultrasound, CT, MRI, Xray data.

4 Distributed AI Application

Depending on which specialization we begin in the objective is to fund the development and deployment of AI applications in cardiology, radiology, oncology, orthorpedics or emergency medicine. These applications will be chosen from the many projects from around the world indexed at the AI Application Commons. These will all follow the three steps of the Chang Method described in the follow on post.

5 Distributed AI Governance Framework

Finally, we have established a multi-site, multi-country, governance framework that covers the various agreements for both the Bevelcloud distributed AI Infrastructure as well as the agreements for the AI applications. These include:

MSA = Master Services Agreement
ToU = Terms of Use
BAA (US) = Business Associate Agreement (US)
DUA = Data Use Agreement (US)
DPA = Data Processing Agreement (EU only)
DTA = Data Transfer Agreement (EU only)
SA = Security Addendum (attached to the DPA + DTA

It then remains to answer the many questions, which cannot be answered without access to data.

Starting with the most basic, will the software developed for consumer AI be applicable to AI in medicine? While most work in federated learning for healthcare has occurred in the cross-silo model, Gemini will operate in the more familiar cross-device model. So how well will they work in a new domain? There are many software stacks to consider.

· Acuratio is an enterprise platform with solutions for horizontally and vertically partitioned data.

· Bitfount was founded by engineers who developed Apple Siri.

· DynamoFL simplifies model training across privacy-critical datasets using Federated Learning and Differential Privacy.

· FedML provides an open-source community, as well as an enterprise platform for open and collaborative AI.

· FLARE (Federated Learning and AI for Robotics and Edge) is a software stack built for deploying AI models on edge devices.

· Flower an open source research platform for training models in a federated manner.

· HP Swarm Learning focused on de-centralizing the aggregation step

· OpenFL is a Python* 3 library for community supported projects, originally developed by Intel Labs and the Intel Internet of Things Group.

· NimbleEdge focuses on hyper-personalized machine learning on mobile edge.

· PySyft is an open-source library developed for secure and private federated learning.

· TensorFlow Federated (TFF) is an open-source framework developed by Google for implementing federated learning.

Even beyond the question of which federated learning software lie many other important questions that computer scientists actively working in this area will be able to address, including….

· What are the implications of relaxing the traditional constraints of consumer federated learning on federated learning for medicine? Does relaxing the constraints change any of the features provided by the federated learning software?

· Does one aggregation strategy work better than others? Should you aggregate neural network weights within a zone before aggregating globally?

· Should we consider split learning (learning on just half of the neural network model) before sending results to the aggregation server, as proposed by the MIT team ?

· When should a model change be declared as the production version, given the potential for continuous learning?

These are but a small fraction of the important research questions to be answered.

Finally, today there are many isolated AI research projects around the world. Most of the AI research works today ends up with a research paper published, but little else.

Read on to understand how we can move AI research from the lab bench to the bedside.

If you’d like to stay up to date on our progress to the moon, register for the newsletter at www.pediatricmoonshot.com, follow us on LinkedIn, subscribe to the Pediatric Moonshot podcast, listen to the Spotify playlist, and subscribe to the Pediatric Moonshot Youtube channel.

Many thanks for extensive editing by Laura Jana, Pediatrician, Social Entrepreneur & Connector of Dots; Leanne West, Chief Engineer of Pediatric Technology at Georgia Tech. Special thanks to Alberto Tozzi, Head of Predictive and Preventive Medicine Research Unit at Ospedale Pediatrico Bambino Gesù for the translation to Italian.

--

--

Timothy Chou

www.linkedin.com/in/timothychou, Lecturer @Stanford, Board Member @Teradata @Ooomnitza, Chairman @AlchemistAcc