We propose an approach for named entity recognition in medical data, using a character-based deep bidirectional recurrent neural network. Such models can learn features and patterns based on the character sequence, and are not limited to a fixed vocabulary. This makes them very well suited for the NER task in the medical domain. Our experimental evaluation shows promising results, with a 60% improvement in F 1 score over the baseline, and our system generalizes well between different datasets.
The dataset presented in this paper can be downloaded from https://github.com/olofmogren/biomedical-ner-data-swedish/. It can be freely used, but please cite our paper. See “bibtex” below.
The source code used for the experiments can be downloaded from https://github.com/withtwist/medical-ner/.
Simon Almgren, Sean Pavlov, Olof Mogren
Fifth workshop on building and evaluating resources for biomedical text mining (BioTxtM 2016) at COLING 2016 in Osaka, December 12.