Evaluating Named-Entity Recognition Approaches in Plant Molecular Biology

TitleEvaluating Named-Entity Recognition Approaches in Plant Molecular Biology
Publication TypeConference Paper
Year of PublicationSubmitted
AuthorsDo H, Than K, Larmande P
Secondary AuthorsKaenampornpan M, Malaka R, Nguyen DD, Schwind N
Conference NameMulti-disciplinary Trends in Artificial Intelligence
PublisherSpringer International Publishing
ISBN Number978-3-030-03014-8
Abstract

Text mining research is becoming an important topic in biology with the aim to extract biological entities from scientific papers in order to extend the biological knowledge. However, few thorough studies are developed for plant molecular biology data, especially rice, thus resulting a lack of datasets available to exploit advanced machine learning methods able to detect entities such as genes and proteins. In this article, we first developed a dataset from the Ozyzabase - a database of rice gene, and used it as the benchmark. Then, we evaluated the performance of two Name Entities Recognition ({NER}) methods for sequence tagging: a Long Short Term Memory ({LSTM}) model, combined with Conditional Random Fields ({CRFs}), and a hybrid method based on the dictionary lookup combining with some machine learning systems to improve result. We analyzed the performance of these methods when apply to the Oryzabase dataset and improved the results. On average, the result from {LSTM}-{CRF} reaching 86% in