Fork me on GitHub

Named entity recognition in Swedish health records with character-based deep bidirectional LSTMs

Biomedical NER illustration.

We propose an approach for named entity recognition in medical data, using a character-based deep bidirectional recurrent neural network. Such models can learn features and patterns based on the character sequence, and are not limited to a fixed vocabulary. This makes them very well suited for the NER task in the medical domain. Our experimental evaluation shows promising results, with a 60% improvement in F 1 score over the baseline, and our system generalizes well between different datasets.

Dataset

The dataset presented in this paper can be downloaded from https://github.com/olofmogren/biomedical-ner-data-swedish/. It can be freely used, but please cite our paper. See “bibtex” below.

Source code

The source code used for the experiments can be downloaded from https://github.com/withtwist/medical-ner/.

Simon Almgren, Sean Pavlov, Olof Mogren

Fifth workshop on building and evaluating resources for biomedical text mining (BioTxtM 2016) at COLING 2016 in Osaka, December 12.
PDF Fulltext
bibtex.

Olof Mogren, PhD, RISE Research institutes of Sweden. Follow me on Mastodon.