Fork me on GitHub

Named entity recognition in Swedish health records with character-based deep bidirectional LSTMs

Biomedical NER illustration.

We propose an approach for named entity recognition in medical data, using a character-based deep bidirectional recurrent neural network. Such models can learn features and patterns based on the character sequence, and are not limited to a fixed vocabulary. This makes them very well suited for the NER task in the medical domain. Our experimental evaluation shows promising results, with a 60% improvement in F 1 score over the baseline, and our system generalizes well between different datasets.

Dataset

The dataset presented in this paper can be downloaded from https://github.com/olofmogren/biomedical-ner-data-swedish/. It can be freely used, but please cite our paper. See “bibtex” below.

Source code

The source code used for the experiments can be downloaded from https://github.com/withtwist/medical-ner/.

Simon Almgren, Sean Pavlov, Olof Mogren

Fifth workshop on building and evaluating resources for biomedical text mining (BioTxtM) at COLING
PDF Fulltext
arxiv:
bibtex.

Olof Mogren, PhD, RISE Research institutes of Sweden. Follow me on Bluesky.