View as list instead. Visit my Scholar page.
IEEE BigData 2021
Adversarial representation learning for synthetic replacement of private attributes: Data privacy is an increasingly important aspect of the analysis of big data for many real-world tasks. Privacy enhancing transformations of data can help unlocking the potential in data sources containing sensitive information, but finding the right balance between privacy and utility is often a tricky trade-off. In this work, we study how adversarial representation learning can be used to ensure the privacy of users, and to obfuscate sensitive attributes in existing datasets. While previous methods using this kind of approach only aim at obfuscating the sensitive information, we find that adding new information in its place strengthens the provided privacy. We propose a two step data privatization method that builds on generative adversarial networks: in the first step, sensitive data is removed from the representation, and in the second step, a sample which is independent of the input data is inserted in its place. The result is an approach that can provide stronger privatization on image data, and yet be preserving both the domain and the utility of the inputs.
Click to read more!
Decentralized federated learning of deep neural networks on non-iid data: We tackle the non-convex problem of learning a personalized deep learning model in a decentralized setting. More specifically, we study decentralized federated learning, a peer-to-peer setting where data is distributed among many clients and where there is no central server to orchestrate the training. In real world scenarios, the data distributions are often heterogeneous between clients. Therefore, in this work we study the problem of how to efficiently learn a model in a peer-to-peer system with non-iid client data. We propose a method named Performance-Based Neighbor Selection (PENS) where clients with similar data distributions detect each other and cooperate by evaluating their training losses on each other’s data to learn a model suitable for the local data distribution. Our experiments on benchmark datasets show that our proposed method is able to achieve higher accuracies as compared to strong baselines.
Scaling Federated Learning for Fine-Tuning of Large Language Models: Federated learning (FL) is a promising approach to distributed compute, as well as distributed data, and provides a level of privacy and compliance to legal frameworks. This makes FL attractive for both consumer and healthcare applications. While the area is actively being explored, few studies have examined FL in the context of larger language models and there is a lack of comprehensive reviews of robustness across tasks, architectures, numbers of clients, and other relevant factors. In this paper, we explore the fine-tuning of Transformer-based language models in a federated learning setting. We evaluate three popular BERT-variants of different sizes (BERT, ALBERT, and DistilBERT) on a number of text classification tasks such as sentiment analysis and author identification. We perform an extensive sweep over the number of clients, ranging up to 32, to evaluate the impact of distributed compute on task performance in the federated averaging setting. While our findings suggest that the large sizes of the evaluated models are not generally prohibitive to federated training, we found that the different models handle federated averaging to a varying degree. Most notably, DistilBERT converges significantly slower with larger numbers of clients, and under some circumstances, even collapses to chance level performance. Investigating this issue presents an interesting perspective for future research.
Federated learning using a mixture of experts: Federated learning has received attention for its efficiency and privacy benefits,in settings where data is distributed among devices. Although federated learn-ing shows significant promise as a key approach when data cannot be shared orcentralized, current incarnations show limited privacy properties and have shortcomings when applied to common real-world scenarios. One such scenario isheterogeneous data among devices, where data may come from different generating distributions. In this paper, we propose a federated learning framework usinga mixture of experts to balance the specialist nature of a locally trained model withthe generalist knowledge of a global model in a federated learning setting. Ourresults show that the mixture of experts model is better suited as a personalizedmodel for devices when data is heterogeneous, outperforming both global and local models. Furthermore, our framework gives strict privacy guarantees, whichallows clients to select parts of their data that may be excluded from the federation. The evaluation shows that the proposed solution is robust to the settingwhere some users require a strict privacy setting and do not disclose their modelsto a central server at all, opting out from the federation partially or entirely. The proposed framework is general enough to include any kind of machine learningmodels, and can even use combinations of different kinds.
Adversarial representation learning for private speech generation: As more and more data is collected in various settings across organizations, companies, and countries, there has been an increase in the demand of user privacy. Developing privacy preserving methods for data analytics is thus an important area of research. In this work we present a model based on generative adversarial networks (GANs) that learns to obfuscate specific sensitive attributes in speech data. We train a model that learns to hide sensitive information in the data, while preserving the meaning in the utterance. The model is trained in two steps: first to filter sensitive information in the spectrogram domain, and then to generate new and private information independent of the filtered one. The model is based on a U-Net CNN that takes mel-spectrograms as input. A MelGAN is used to invert the spectrograms back to raw audio waveforms. We show that it is possible to hide sensitive information such as gender by generating new data, trained adversarially to maintain utility and realism.
Blood glucose prediction with variance estimation using recurrent neural networks: Many factors affect blood glucose levels in type 1 diabetics, several of which vary largely both in magnitude and delay of the effect. Modern rapid-acting insulins generally have a peak time after 60–90 min, while carbohydrate intake can affect blood glucose levels more rapidly for high glycemic index foods, or slower for other carbohydrate sources. It is important to have good estimates of the development of glucose levels in the near future both for diabetic patients managing their insulin distribution manually, as well as for closed-loop systems making decisions about the distribution. Modern continuous glucose monitoring systems provide excellent sources of data to train machine learning models to predict future glucose levels. In this paper, we present an approach for predicting blood glucose levels for diabetics up to 1 h into the future. The approach is based on recurrent neural networks trained in an end-to-end fashion, requiring nothing but the glucose level history for the patient. Our approach obtains results that are comparable to the state of the art on the Ohio T1DM dataset for blood glucose level prediction. In addition to predicting the future glucose value, our model provides an estimate of its certainty, helping users to interpret the predicted levels. This is realized by training the recurrent neural network to parameterize a univariate Gaussian distribution over the output. The approach needs no feature engineering or data preprocessing and is computationally inexpensive. We evaluate our method using the standard root-mean-squared error (RMSE) metric, along with a blood glucose-specific metric called the surveillance error grid (SEG). We further study the properties of the distribution that is learned by the model, using experiments that determine the nature of the certainty estimate that the model is able to capture.
Grammatical gender in Swedish is predictable using recurrent neural networks: The grammatical gender of Swedish nouns is a mystery. While there are few rules that can indicate the genderwith some certainty, it does in general not depend on either meaning or the structure of the word. In this work wedemonstrate the surprising fact that grammatical gender for Swedish nouns can be predicted with high accuracyusing a recurrent neural network (RNN) working on the raw character sequence of the word, without using anycontextual information.
Semantic segmentation of fashion images using feature pyramid networks: We approach fashion image analysis through semantic segmentation of fashion images, using both textural information and cues from shape and context, where target classes are clothing categories. Our main contributions are state-of-the-art semantic segmentation of fashion images with modest memory and compute requirements.
Generative modelling of semantic segmentation data in the fashion domain: In this work, we propose a method to generatively model the joint distribution of images and corresponding semantic segmentation maps using generative adversarial networks. We extend the Style-GAN architecture by iteratively growing the network during training, to add new output channels that model the semantic segmentation maps. We train the proposed method on a large dataset of fashion images and our experimental evaluation shows that the model produces samples that are coherent and plausible with semantic segmentation maps that closely match the semantics in the image.
Character-based recurrent neural networks for morphological relational reasoning: We present a model for predicting inflected word forms based on morphological analogies. Previous work includes rule-based algorithms that determine and copy affixes from one word to another, with limited support for varying inflectional patterns. In related tasks such as morphological reinflection, the algorithm is provided with an explicit enumeration of morphological features which may not be available in all cases. In contrast, our model is feature-free: instead of explicitly representing morphological features, the model is given a demo pair that implicitly specifies a morphological relation (such as write:writes specifying infinitive:present). Given this demo relation and a query word (e.g. watch), the model predicts the target word (e.g. watches). To address this task, we devise a character-based recurrent neural network architecture using three separate encoders and one decoder. Our experimental evaluation on five different languages shows that the exact form can be predicted with high accuracy, consistently beating the baseline methods. Particularly, for English the prediction accuracy is 95.60%. The solution is not limited to copying affixes from the demo relation, but generalizes to words with varying inflectional patterns, and can abstract away from the orthographic level to the level of morphological forms.
Preliminary version appeared in Subword & Character Level Models in NLP (SCLeM) workshop at EMNLP 2017 in Copenhagen, Denmark, September 7.
The source code used for the experiments can be downloaded from https://github.com/olofmogren/char-rnn-wordrelations.
Disentanglement by Penalizing Correlation: Deep neural networks have been tremendously successful in a number of tasks. One of the main reasons for this is their capability to automatically learn representations of data in levels of abstraction, increasingly disentangling the data as the internal transformations are applied. In this paper we propose a novel regularization method that penalize covariance between dimensions of the hidden layers in a network, something that benefits the disentanglement. This makes the network learn nonlinear representations that are linearly uncorrelated, yet allows the model to obtain good results on a number of tasks, as demonstrated by our experimental evaluation. The proposed technique can be used to find the dimensionality of the underlying data, because it effectively disables dimensions that aren't needed. Our approach is simple and computationally cheap, as it can be applied as a regularizer to any gradient-based learning model.
Character-based recurrent neural networks for morphological relational reasoning: Given a demo relation (a pair of word forms) and a query word, we devise a character-based recurrent neural network architecture using three separate encoders and a decoder, trained to predict the missing second form of the query word. Our results show that the exact form can be predicted for English with an accuracy of 94.7%. For Swedish, which has a more complex morphology with more inflectional patterns for nouns and verbs, the accuracy is 89.3%.
Named entity recognition in Swedish health records with character-based deep bidirectional LSTMs: We propose an approach for named entity recognition in medical data, using a character-based deep bidirectional recurrent neural network. Such models can learn features and patterns based on the character sequence, and are not limited to a fixed vocabulary. This makes them very well suited for the NER task in the medical domain. Our experimental evaluation shows promising results, with a 60% improvement in F 1 score over the baseline, and our system generalizes well between different datasets.
C-RNN-GAN: Continuous recurrent neural networks with adversarial training: Generative adversarial networks have been proposed as a way of efficiently training deep generative neural networks. We propose a generative adversarial model that works on continuous sequential data, and apply it by training it on a collection of classical music. We conclude that it generates music that sounds better and better as the model is trained, report statistics on generated music, and let the reader judge the quality by downloading the generated songs.
Assisting discussion forum users using deep recurrent neural networks: In this work, we present a discussion forum assistant based on deep recurrent neural networks (RNNs). The assistant is trained to perform three different tasks when faced with a question from a user. Firstly, to recommend related posts. Secondly, to recommend other users that might be able to help. Thirdly, it recommends other channels in the forum where people may discuss related topics. Our recurrent forum assistant is evaluated experimentally by prediction accuracy for the end--to--end trainable parts, as well as by performing an end-user study. We conclude that the model generalizes well, and is helpful for the users.
Extractive summarization by aggregating multiple similarities: Many existing methods for extracting summaries rely on comparing the similarity of two sentences in some way. In this paper, we present new ways of measuring this similarity, based on sentiment analysis and continuous vector space representations, and show that combining these together with similarity measures from existing methods, helps to create better summaries. The finding is demonstrated with MULTSUM, a novel summarization method that uses ideas from kernel methods to combine sentence similarity measures. Submodular optimization is then used to produce summaries that take several different similarity measures into account. Our method improves over the state-of-the-art on standard benchmark datasets; it is also fast and scale to large document collections, and the results are statistically significant.
Visions and open challenges for a knowledge-based culturomics: A white paper outlining some ideas and challenges within the field of culturomics.
Editing simple graphs: Inspired by the word-co-occurrence graph from Wikipedia documents, this paper presents an FPT approach to cluster the words.
Extractive summarization using continuous vector space models: A workshop paper showing preliminary results on multi-document summarization with continuous vector space models for sentence representation. The experiments were performed on opinionated online user reviews.
Adaptive dynamics of realistic small-world networks: Continuing in the steps of Jon Kleinberg's and others celebrated work on decentralized search in small-world networks, we conduct an experimental analysis of a dynamic algorithm that produces small-world networks. We find that the algorithm adapts robustly to a wide variety of situations in realistic geographic networks with synthetic test data and with real world data, even when vertices are uneven and non-homogeneously distributed.