We propose an approach for named entity recognition in medical data, using a character-based deep bidirectional recurrent neural network. Such models can learn features and patterns based on the character sequence, and are not limited to a fixed vocabulary. This makes them very well suited for the NER task in the medical domain. Our experimental evaluation shows promising results, with a 60% improvement in F 1 score over the baseline, and our system generalizes well between different datasets.
The dataset presented in this paper can be downloaded from https://github.com/olofmogren/biomedical-ner-data-swedish/. It can be freely used, but please cite our paper. See “bibtex” below.
The source code used for the experiments can be downloaded from https://github.com/withtwist/medical-ner/.
Simon Almgren, Sean Pavlov, Olof Mogren
To appear in Fifth workshop on building and evaluating resources for biomedical text mining (BioTxtM 2016) at COLING 2016 in Osaka, December 12.
PDF Fulltext bibtex.