AI shows promise when it comes to discovering new patterns hidden in mountains of data. However, some data remains isolated across different silos for technical, ethical, and commercial reasons. A promising new AI and machine learning technique called amalgamated learning could help break down these silos to find new cures for diseases, prevent fraud, and improve industrial facilities. It can also provide a way to create digital twins from inconsistent forms of data.
At the Imec Future Summits conference, in an exclusive interview with VentureBeat, Roel Wuyts explained how amalgamated learning works and how it compares to related techniques like federated learning and homomorphic encryption. Wuyts is manager of the ExaScale Life Lab at IMEC, a cross-industry scientific collaboration in Europe, and professor at the Katholieke Universiteit Leuven in Belgium.
He leads a team focused on exploring different approaches to scaling AI to different participants to improve semiconductor manufacturing, medical research, and other fields.
“We would like to do population-level data-driven analysis to look for novel markers that nobody has seen before,” Wuyts said. “The more data we collect, the more difficult it becomes for people to see hidden patterns in that data.” Population-level privacy-enhanced computing could help.
Privacy protection of medical research
At one end of the spectrum, new computing techniques such as homomorphic encryption allow multiple participants to share data to collaborate on new AI models with high confidence. However, it also adds a lot of computational overhead. Previously, older implementations ran about 10,000 times slower than comparable plaintext algorithms, and now researchers put this at about 1,000 times slower. Wuyts said it’s still not practical for large-scale population research.
At the other end of the spectrum, federated learning techniques allow different participants to update a machine learning model locally without sending sensitive data to others. In this case, only updates to the model are shared with others. This type of approach is far more efficient than homomorphic encryption. His team has investigated ways to predict atrial fibrillation using federated learning across multiple hospitals.
Atrial fibrillation is an irregular heart rhythm that can lead to blood clots in the heart. The hope is that better medical data and new smartwatches could provide better warning signals to reduce these risks. However, hospitals face various ethical and privacy issues when sharing this type of population-level data. His team has already achieved some promising early results from these collaborations. He later predicts that we could all benefit from the data collected by our neighbor’s smartwatch.
Federated Learning Boundaries
However, federated learning has some challenges. First of all, all hospitals or healthcare companies involved must use the same model and techniques. This could be an issue if a hospital wants to commercialize a new AI model.
“In some cases, they are reluctant to share the data or models they develop because it could provide a competitive advantage,” Wuyts said.
Another problem is that all data must be normalized. This is not a big problem in areas like heart research, where there is a consensus on how and what to measure. However, it can pose a bigger problem when teams try to bring in more data from new sources when teams have different processes for collecting and annotating data. Wuyts noted that even in areas like genome research, each hospital could differ in how it collects the data, affecting study results.
Another problem is how doctors code different diseases. For example, in some of their research, they found regional differences in how physicians in different health systems would assess the same health system conditions. This may result from the types of reimbursement for different diseases that are treated with similar approaches.
Connected Learning
His team recently started experimenting with amalgamated learning for large-scale cancer research. Like federated learning, it is much faster than homomorphic encryption and does not require participants to exchange data. Another benefit is that multiple models are supported, so participants don’t have to share the intellectual property baked into them. This could encourage cross-industry medical research by competitors that improves outcomes for all while protecting commercial interests.
The technique seems to work even if each participant encodes data slightly differently. The key is that the technique takes advantage of differences detected within each local data set. This would allow everyone to learn from the experiences of others, even if their own hospital data collection procedures are different, as long as those procedures are internally consistent. “We believe that we don’t need to normalize the data between parties as much to train a local model,” Wuyts said.
One concern is that this amalgamated learning makes it more difficult to dispel bias or figure out how a model arrived at a particular conclusion compared to traditional approaches. Consequently, they are focusing on using more explainable AI techniques that will allow them to identify and examine the various factors that can affect the results.
“You have to build a whole stack of tools to investigate and log what’s happening so people can take a look,” Wuyts said. “They’re focusing on more explainable models so that when things go wrong, people can investigate and pinpoint what went wrong.”
Another benefit is that amalgamated learning will also help match digital twins of individuals, even if their local set points for things like temperature or other vital signs are slightly different. For example, some individuals are more prone to overheating than others. It is more important to monitor individual changes than global set points for the population.
“If we can capture the right signal, that’s more interesting than showing the raw value,” Wuyts said.