AI shows tremendous promise in discovering new patterns buried in mountains of data. Yet, some data remains isolated across various silos for technical, ethical and commercial reasons. A promising new AI and machine learning technique called amalgamated learning might help overcome these silos to find new cures for diseases, prevent fraud and improve industrial equipment. It may also provide a way to construct digital twins from inconsistent forms of data.
At the Imec Future Summits conference, Roel Wuyts detailed how amalgamated learning works and how it compares to related techniques like federated learning and homomorphic encryption in an exclusive interview with VentureBeat. Wuyts is a manager of the ExaScale Life Lab at Imec, a cross-industry scientific collaboration in Europe and a professor at Katholieke Universiteit Leuven in Belgium.
He has been leading a team focused on exploring different approaches to scale AI across various participants to improve semiconductor manufacturing, medical research and other areas.
“We would like to do population-level data-driven analytics to look for novel markers no one has seen before,” Wuyts said, “As we gather more data, it becomes harder for humans to notice patterns buried in this data.” Privacy enhanced computing at the population level might help.
Privacy preserving medical research
At one end of the spectrum, new computing techniques like homomorphic encryption allow multiple participants to share data to collaborate on new AI models with high trust. However, it also adds a lot of computational overhead. Older implementations used to run about ten-thousand times slower than comparable algorithms in the clear and now researchers are getting this down to about one-thousand times slower. Wuyts said it is still not practical for large-scale population research.
At the other end of the spectrum, federated learning techniques allow different participants to update a machine learning model locally without sending sensitive data to others. In this case, only updates to the model are shared with others. This kind of approach is far more efficient than homomorphic encryption. His team explored ways to predict atrial fibrillation by applying federated learning across multiple hospitals.
Atrial fibrillation is an irregular heart rhythm that can lead to blood clots in the heart. The hope is that better medical data and new smartwatches could provide better warning signs to help reduce these risks. But hospitals face various ethical and privacy issues in sharing this kind of data at the population level. His team has already seen some promising early results in these collaborations. Down the road, he predicts that we could all benefit from the data collected from our neighbor’s smartwatch.
Federated learning limits
However, federated learning has some challenges. For starters, all the hospitals or healthcare companies involved must use the same model and techniques. This could be a problem if a hospital hopes to commercialize a new AI model.
“In some cases, they are unwilling to share the data or models they develop because it could provide a competitive advantage,” Wuyts said.
Another issue is that it requires all the data to be normalized. This is not a big issue in areas like heart research, where there is consensus about how and what to measure. Though, it can be more of a problem as teams try to bring more data in from new sources when teams have different processes for collecting and annotating data. Wuyts noted that even in areas like genomics research, each hospital might differ in how they collect the data, affecting the study results.
Another issue is how doctors code different diseases. For example, in some of their research, they found regional differences in how doctors in different healthcare systems would record the same ailments into the healthcare systems. This may result from the types of reimbursements for different diseases treated using similar approaches.
His team has recently begun experimenting with amalgamated learning for large-scale cancer research. Like federated learning, it is much faster than homomorphic encryption and it does not require participants to share data. Another benefit is that it supports multiple models, so the participants do not have to share the intellectual property baked into them with each other. This could encourage cross-industry medical research by competitors that improves the outcomes for everyone while also protecting commercial interests.
The technique seems to work even when each participant encodes data slightly differently. The key is that the technique takes advantage of differences detected within each local data set. As a result, everyone could learn from the experience of others, even when their own hospital data collection procedures are different, as long as these procedures are internally consistent. “We think we won’t need to normalize data across parties as much to train a local model,” Wuyts said.
One concern is that this amalgamated learning makes it harder to tease out bias or figure out how a model has reached a particular conclusion compared to traditional approaches. Consequently, they are focusing on using more explainable AI techniques that make it identify and audit the different factors that can affect results.
“You need to build a whole stack of tools to probe and log what is happening, so people can have a look,” Wuyts said. “They are focusing on more explainable models so that if something goes wrong, people can investigate and pinpoint what went wrong.”
Another benefit is that amalgamated learning will also help to customize digital twins of individuals, even when their local set points for things like temperature or other vital signs are slightly different. For example, some individuals are more prone to heat up than others. It is more important to monitor the changes in each individual than global set points across the population.
“If we can capture the right signal, that is more interesting than displaying the raw value,” Wuyts said.