Evaluating Named-Entity Recognition Approaches in Plant Molecular Biology
Title | Evaluating Named-Entity Recognition Approaches in Plant Molecular Biology |
Publication Type | Conference Paper |
Year of Publication | Submitted |
Authors | Do H, Than K, Larmande P |
Secondary Authors | Kaenampornpan M, Malaka R, Nguyen DD, Schwind N |
Conference Name | Multi-disciplinary Trends in Artificial Intelligence |
Publisher | Springer International Publishing |
ISBN Number | 978-3-030-03014-8 |
Abstract | Text mining research is becoming an important topic in biology with the aim to extract biological entities from scientific papers in order to extend the biological knowledge. However, few thorough studies are developed for plant molecular biology data, especially rice, thus resulting a lack of datasets available to exploit advanced machine learning methods able to detect entities such as genes and proteins. In this article, we first developed a dataset from the Ozyzabase - a database of rice gene, and used it as the benchmark. Then, we evaluated the performance of two Name Entities Recognition ({NER}) methods for sequence tagging: a Long Short Term Memory ({LSTM}) model, combined with Conditional Random Fields ({CRFs}), and a hybrid method based on the dictionary lookup combining with some machine learning systems to improve result. We analyzed the performance of these methods when apply to the Oryzabase dataset and improved the results. On average, the result from {LSTM}-{CRF} reaching 86% in |