kMoL, a machine learning library for AI drug discovery with federated learning capabilities

TOKYO–(BUSINESS WIRE)–Elix, Inc., an AI-based drug discovery company with a mission to “Rethink Drug Discovery” (CEO: Shinya Yuki/Headquarters: Tokyo, Japan; hereinafter referred to as “Elix ”) has developed kMoL, a machine learning library for AI drug discovery with functionally federated learning. This work was discussed and developed with lecturer Ryosuke Kojima and Professor Yasushi Okuno of Kyoto University Graduate School of Medicine. It has since been released as open-source on October 20, 2021.
kMoL is a library for building machine learning models for the drug discovery and life science fields. This library has been improved based on knowledge obtained from kGCN1, an open-source AI library for drug discovery and life sciences that was developed by lecturer Ryosuke Kojima and professor Yasushi Okuno of Kyoto University Graduate School of Medicine. It also includes graph neural networks capable of handling graph structures that are widely useful in life sciences, such as molecular structures and chemical pathways.
One of the most important features of kMoL is that it is the only publicly available library for AI drug discovery that has a “federated learning” feature. As federated learning allows the library to access huge amounts of data while ensuring security, it has recently gained attention as a learning method for handling confidential information, such as data compounds, in the pharmaceutical industry. Elix’s federated learning library, Elix Mila, is part of kMoL.
As kMoL supports advanced models with a wide range of applications and can safely access large amounts of data for learning, it is expected to be widely adopted by pharmaceutical and chemical companies.
Introducing kMoL: A Machine Learning Library for AI Drug Discovery with Federated Learning
Name: kMoL (Machine Learning Library for Molecular Systems)
Abstract: A machine learning library for AI drug discovery with federated learning capabilities. It has features like support for federated learning and graph-based predictive models.
Release date: October 20, 2021
open source url: https://github.com/elix-tech/kmol
This library was developed in collaboration with Elix and lecturer Ryosuke Kojima and Professor Yasushi Okuno under contract from Kyoto University, which executed a research consignment contract with the Japan Research Agency and Development (AMED) as part of the “Development of a Next Generation Drug Discovery” (DAIIA).
Functions and features of kMoL
1. Federated learning support
Federated learning is a method of machine learning in which data is not aggregated but rather distributed (i.e. the data is not shared outside the enterprise). In industries that deal with highly confidential data, it is difficult to share data. Therefore, federated learning is gaining attention as a method to ensure data privacy and security.
kMoL integrates Elix Mila, a federated learning module developed by Elix, making kMoL the only machine learning library with federated learning capabilities among those published for AI drug discovery. Using this library allows using a large amount of data for training without compromising the confidentiality of the composed data. Since machine learning models often benefit from large amounts of data, federated learning allows more data to be used to achieve greater accuracy without compromising the privacy of compound data.
2. Support for graph-based predictive models
One of the best features of kMoL as a machine learning library for life sciences is that it can seamlessly use state-of-the-art graph-based predictive models with federated learning. One of the best ways to represent the structure of a molecule is to use a graph. As a result, graph-based predictive models have a significant advantage over other architectures because they can use much more information about a compound’s molecular structure. This additional information should greatly increase the accuracy of the learning.
kMoL has been validated on ADME (A: absorption, D: distribution, M: metabolism, E: excretion), toxicity and binding affinity datasets. It is also possible to learn and predict single tasks for single data sets.
3. Other features
Another feature of kMoL is that it can be used with the PyTorch machine learning framework. When Elix started developing kMoL, most machine learning libraries with federated learning capabilities were based on the TensorFlow machine learning framework. PyTorch is currently one of the most popular machine learning frameworks due to the ease of model implementation2 and, to make it accessible to a wider audience, kMoL supports PyTorch-based model development.
Additionally, to protect data privacy, some models also support a technique called differential privacy. This is a method that makes it impossible to distinguish which data contributed to the model while minimizing the impact on forecast accuracy. kMoL can also run on both GPUs and CPUs, a feature that was not supported by previously released machine learning libraries with federated learning capabilities.
Shinya Yuki, CEO of Elix, Inc., said, “We are very excited to release kMoL as an open source library that we have jointly built based on our federated learning module Elix Mila and the kGCN of the ‘Kyoto University. By combining federated learning with predictive models, we will be able to achieve things that cannot be done by a single organization. We hope this library will accelerate drug discovery research and contribute to the development of this field.
Professor Yasushi Okuno, Graduate School of Medicine, Kyoto University, said: “The remarkable advances in AI over the past few years have had a powerful impact on drug development. Under the leadership of Lecturer Kojima, we have developed cutting-edge AI programs for drug discovery, developing deep learning technologies to address the chemical structures of drugs and molecular networks in living organisms. We are now collaborating with Elix to develop a federated learning package on top of the technologies we have developed so far and release it as the AI Library for Drug Discovery “kMoL”. We hope that this library will be widely applied in industry through Elix.
He also added that “kMoL is an extension of the drug discovery AI library ‘kGCN’ which was developed by the research team of lecturer Ryosuke Kojima and professor Yasushi Okuno. The federated learning feature of this software was developed within the project “Development of a comprehensive AI platform for drug discovery combining multi-target prediction and structure generation using AI technology from edge” under the project “Drug Discovery Support Promotion Project: Development of Next-Generation Drug Discovery AI through Industry-Academia Collaboration (DAIIA) of Japan Agency for Medical Research and Development (AMED) ).
In addition, the multimodal neural network incorporates the knowledge accumulated through the results of the project “Development of AI for the design of drug formulations to improve efficiency and accelerate drug development”, organized by the New Energy and Industrial Technology Development Organization (NEDO).
The large-scale graph neural network incorporates knowledge gained from the project results “Building and expanding a case database and developing a drug target estimation algorithm to accelerate the creation of new drugs” , organized by the PRISM Public-Private R&D Investment Strategic Expansion Program.”
Reference
1 R. Kojima, S. Ishida, M. Ohta, H. Iwata, T. Honma, Y. Okuno: kGCN: A graph-based deep learning framework for chemical structures. pathcomputer, Springer, vol. 12, p. 1–10, 2020.
2 Excerpt from The Gradient presentation “The State of Machine Learning Frameworks in 2019” (October 2019). https://thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry/. The most recent data can be viewed at http://horace.io/pytorch-vs-tensorflow/.
About Elix, Inc.
Elix, Inc is an AI-based drug discovery company with a mission to “Rethink Drug Discovery”. In order to dramatically improve the time-consuming and expensive drug discovery process, we have applied state-of-the-art deep learning and machine learning technologies to develop business for a variety of clients. These include pharmaceutical companies, chemical companies and universities.
Visit https://www.elix-inc.com/ for more details.
About Lecturer Ryosuke Kojima and Professor Yasushi Okuno from Kyoto University Graduate School of Medicine
Lecturer Ryosuke Kojima and Professor Yasushi Okuno from the Kyoto University Graduate School of Medicine aim to pioneer simulation science and data science for medical and drug discovery applications. They are developing new methodologies for medical big data analysis and medical simulation using real clinical data from Kyoto University Hospital and working on drug discovery simulation and big data drug discovery using Fugaku supercomputer to achieve their goal.
Visit http://clinfo.med.kyoto-u.ac.jp/en/ for more details