您好,欢迎访问湖南省农业科学院 机构知识库!

MantaID: a machine learning-based tool to automate the identification of biological database IDs

文献类型: 外文期刊

作者: Zeng, Zhengpeng 1 ; Hu, Jiamin 1 ; Cao, Miyuan 1 ; Li, Bingbing 1 ; Wang, Xiting 1 ; Yu, Feng 2 ; Mao, Longfei 1 ;

作者机构: 1.Hunan Univ, Coll Biol, Dept Pharm, 27 Tianma Rd, Changsha 410082, Peoples R China

2.Hunan Univ, Coll Biol, State Key Lab Chemo Biosensing & Chemometr, Hunan Key Lab Plant Funct Genom & Dev Regulat, 27 Tianma Rd, Changsha 410082, Peoples R China

3.Hunan Acad Agr Sci, Hunan Agr Biotechnol Res Inst, State Key Lab Hybrid Rice, 27 Tianma Rd, Changsha 410125, Peoples R China

期刊名称:DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION ( 影响因子:5.8; 五年影响因子:4.6 )

ISSN: 1758-0463

年卷期: 2023 年 2023 卷

页码:

收录情况: SCI

摘要: The number of biological databases is growing rapidly, but different databases use different identifiers (IDs) to refer to the same biological entity. The inconsistency in IDs impedes the integration of various types of biological data. To resolve the problem, we developed MantaID, a data-driven, machine learning-based approach that automates identifying IDs on a large scale. The MantaID model's prediction accuracy was proven to be 99%, and it correctly and effectively predicted 100,000 ID entries within 2 min. MantaID supports the discovery and exploitation of ID from large quantities of databases (e.g. up to 542 biological databases). An easy-to-use freely available open-source software R package, a user-friendly web application and application programming interfaces were also developed for MantaID to improve applicability. To our knowledge, MantaID is the first tool that enables an automatic, quick, accurate and comprehensive identification of large quantities of IDs and can therefore be used as a starting point to facilitate the complex assimilation and aggregation of biological data across diverse databases.

  • 相关文献
作者其他论文 更多>>