FeatureCloud Publications (count: 41)
Relevance for FeatureCloud: For analysis of large-scale biomedical data, e.g. in the Apps of the FeatureCloud AI Store, several obstacles need to be overcome. Disease module mining methods (DMMMs), for example, often include non-robust steps in their workflows. This lack of robustness has a negative effect on the trustworthiness of the obtained subnetworks, such as protein-protein interaction networks. To overcome this problem, this publication presents a new DMMM called ROBUST (robust disease module mining via enumeration of diverse prize-collecting Steiner trees). In a large-scale empirical evaluation, we show that ROBUST outperforms competing methods in terms of robustness, scalability and, in most settings, functional relevance of the produced modules, measured via KEGG (Kyoto Encyclopedia of Genes and Genomes) gene set enrichment scores and overlap with DisGeNET disease genes.
Relevance for FeatureCloud: Anomaly detection is an important task to identify rare events such as fraud, intrusions, or medical diseases. However, it often needs to be applied on personal or otherwise sensitive data, e.g. business data. This gives rise to concerns regarding the protection of the sensitive data, especially if it is to be analysed by third parties, e.g. in collaborative settings, where data is collected by different entities, but shall be analysed together to benefit from more effective models. As part of FeatureCloud WP2, this paper describes an anomaly detection task on two different benchmark datasets, in supervised, semi-supervised, and unsupervised settings. The authors federated Multi-Layer Perceptrons, Gaussian Mixture Models, and Isolation Forests, and compared them to a centralised approach. Preprint (PDF | 297 kb)
Relevance for FeatureCloud: Providing automatic compliance verification in process choreographies is crucial for any cross-organisational collaboration, including research collaborations that will use the FeatureCloud AI Store in the future. An example would be the legal necessity to check at each particiüating hospital whether the federated models used for the data aggregation is indeed compliant with the GDPR requirements for data consent. This work deals with the question how to verify global compliance if affected tasks are not fully visible.
Relevance for FeatureCloud: This work applies to FeatureCloud’s WP6 (blockchain and user rights management). Over the past years, the interest in blockchain technology and its applications has tremendously increased, accompanied – however – by serious threats that raised concerns over user data privacy. This resulted in a multitude of privacy-preserving techniques that offer different guarantees in terms of trust, decentralization, and traceability. CoinJoin is one of the promising techniques. Using the example of cryptocurrency, this paper provides a comprehensive usability study of three main Bitcoin wallets that integrate the CoinJoin technique. Similar privacy-preserving techniques would have to be applied if blockchains will be used to manage patients’ rights in research centers using the FeatureCloud platform to analyse medical data.
Relevance for FeatureCloud: FeatureCloud’s WP6 recognises the importance of integrating as many federated machine learning nodes as possible, while being aware of privacy rights and regulations, especially GDPR and regulations for medical data, and conducts research into blockchain-based technologies for user rights management, consent, and data discovery mechanisms. This article studies users’ privacy perceptions of UTXO-based blockchains. It elaborates on a mental model of employing privacy-preserving techniques for blockchain transactions. Furthermore, it evaluates users’ awareness of blockchain privacy issues and examines their preferences towards existing privacy-enhancing solutions, i.e., add-on techniques to Bitcoin versus built-in techniques in privacy coins.
Relevance for FeatureCloud: This work investigates the challenges of moving principal component analysis (PCA), a widely used tool often serving as an initial step in machine learning and visualisation workflows, to the federated domain. It provides implementations of different federated PCA algorithms and evaluates them regarding their accuracy for high-dimensional biological data using realistic sample distributions over multiple data sites, and their ability to preserve downstream analyses. Complementing the simulated results, the authors used the FeatureCloud platform for a real-world implementation of the investigated algorithms. The corresponding App has multiple modes, including a batch mode and a train/test mode allowing for cross-validation splits. They also ran tests using the FeatureCloud ‘Testbed’, which allows simulating a federated setting by spawning multiple clients on the same machine while passing parameters through a remote relay server.
Relevance for FeatureCloud: In this article, we present a federated singular value decomposition (SVD) algorithm, suitable for the privacy-related and computational requirements of GWAS. Notably, the algorithm has a transmission cost independent of the number of samples and is only weakly dependent on the number of features, because the singular vectors associated with the samples are never exchanged and the vectors associated with the features only for a fixed number of iterations. Although motivated by GWAS, the algorithm is generically applicable for both horizontally and vertically partitioned data. A corresponding federated App was produced for the FeatureCloud AI Store and is available on the platform.
Relevance for FeatureCloud: Aligned with FeatureCloud’s goal to enable decentralised analysis across medical and research institutions by using federated learning architectures and to improve the performance of predictive models in medicine and healthcare, this paper evaluates the efficacy of federated Random Forests, algorithms that support both classification and regression analysis. In fact, the FeatureCloud AI Store already contains two such Apps (“Random Forest” and “Random Survival Forest”). The authors focus particularly on the heterogeneity within and between datasets, addressing three common challenges: (i) number of parties, (ii) sizes of datasets, and (iii) imbalanced phenotypes, evaluated on five biomedical datasets.
Relevance for FeatureCloud: As part of FeatureCloud’s WP4, this work shows how the so-called “personas method” can be adapted to support the development of human-centered artificial intelligence (AI) applications, as demonstrated in the example of a medical context. This work is – to our knowledge – the first to provide personas for AI using an openly available Personas for AI toolbox. The toolbox contains guidelines and material supporting persona development for AI as well as templates and pictures for persona visualisation. It is ready to use and freely available to the international research and development community.
Relevance for FeatureCloud: This paper elucidates the opportunity that modern information fusion provides in bridging the gap between research and practical applications in the context of future trustworthy medical artificial intelligence (AI). In this context, it aligns directly with FeatureCloud’s motivation and imperative to include ethical and legal aspects as a cross-cutting discipline, because all future AI solutions must not only be ethically responsible but also legally compliant.
Relevance for FeatureCloud: This work extends into WP2 (cyber risk assessment and mitigation) and WP6 (blockchain and user rights management) of FeatureCloud as it investigates the so-called “miner extractable value (MEV)”. The term is not solidly defined yet, so far mainly used in the world of gaming, game theory, and cryptocurrencies, but may in fact play an important role in assessing the security and stability of blockchain-based user consent and user rights management methods planned to be used in FeatureCloud as well.
Relevance for FeatureCloud: Estimating the probability, as well as the profitability, of different attacks is of utmost importance when assessing the security and stability of prevalent blockchain-based encryption technologies. In this paper, we present a simple yet practical model to calculate the success probability of finite attacks, while considering already contributed blocks and victims that do not give up easily. Hereby, we introduce a more fine-grained distinction between different actor types and the sides they take during an attack. With regard to cryptocurrencies, the presented model simplifies assessing the profitability of forks in practical settings, while also enabling fast and more accurate estimations of the economic security grantees in certain scenarios, but in principle, our work is equally relevant for encryption methods in the medical domain and therefore applies to both WP2 (cyber risk assessment and mitigation) and WP6 (blockchain and user rights management) of FeatureCloud.
Relevance for FeatureCloud: Directly relevant for FeatureCloud, we here demonstrate the principle of federated machine learning. While the FeatureCloud prototype platform emerged, we have worked on stand-alone solutions for typical medical application scenarios, including a federated genome-wide association study (GWAS) tool, called “sPLINK”.
Relevance for FeatureCloud: This paper directly relates to WP2 (Cyber risk assessment and mitigation) as it evaluates data poisoning attacks in federated settings. By altering certain inputs that are used in the training phase with a specific pattern, an adversary may later trigger malicious behaviour in the prediction phase. The observations described in this paper are very similar for both traffic sign and face recognition data, as well as different types of backdoors, and thus likely generalize well to other domains. Considering that the federated system is a distributed one, and the multitude of participants likely offers easier options for an adversary to manipulate one node, the power that a manipulated node receives over the training process is a reason for concern. Therefore, future work needs to specifically address the issue of defending against such attacks in a federated learning setting. Preprint (PDF | 2.2 MB)
Relevance for FeatureCloud: When applying artificial intelligence (AI) methods, such as graphical neural networks (GNNs) in biomedicine, major challenges with regard to practical relevance are comprehensibility, interpretability, and explainability. As part of WP4 (Supervised federated machine learning) of the FeatureCloud project, this article introduces a framework for the detection of disease subnetworks by using a simple modification of the so-called “GNNexplainer program”. An integrated protein-protein interaction (PPI) knowledge graph restricts the model to learn on more reliable and biological more meaningful trajectories compared to classical deep learning (DL) approaches. To support the EU’s open science policy, this new method is freely available to the research community on GitHub.
Relevance for FeatureCloud: To unravel new target genes with clinical relevance, this collaborative work used in vitro models to understand the regulatory functions of the oncogene CTCFL, a transcriptional factor highly expressed in ovarian cancer. We then analysed a selection of gene candidates using de novo network enrichment analysis. The resulting mechanistic candidates were further assessed regarding their prognostic potential and druggability. FeatureCloud Apps that are already available in the AI Store can be used to complete the same tasks (in a federated, privacy-preserving manner) that were applied to the data sets of this manuscript, namely preprocessing, Kaplan–Meier estimation, Cox proportional hazard model (CPH), and random survival forest (RSF).
Relevance for FeatureCloud: FeatureCloud is about the proof of feasibility and implementation of new security and privacy techniques in the medical domain, especially federated machine learning. This paper summarises the state of the art in privacy-enhancing technology for the processing of biomedical data and provides the basis for the techniques that FeatureCloud needs to support in order to ensure privacy-preserving AI in biomedicine.
Relevance for FeatureCloud: The computational methods that are reviewed and summarised in this paper, including methods to identify repurposable drugs and examining the reliability of underlying data resources, are very relevant for some apps of the FeatureCloud platform and also in line with the general objectives of the FeatureCloud project.
Relevance for FeatureCloud: Single-cell sequencing (scRNA-seq) technologies are a very powerful tool with unprecedented spatial resolution, but accompanying analyses are extremely challenging. This paper describes the novel algorithm “Scellnetor”, a network-constraint time-series clustering algorithm that allows the extraction of temporal differential gene expression network patterns (modules) that explain the difference in the regulation of two developmental trajectories. This algorithm (in its federated form) will likely enrich the FeatureCloud App Store in the future.
Relevance for FeatureCloud: Within the scope of WP5 (Unsupervised federated machine-learning), this paper presents a user-friendly tool to promote federated learning in less technically inclined communities, namely an improved federated principal component analysis algorithm. This, for instance, be used in federated population stratification for genome-wide association studies (GWAS). Unlike previous algorithms, the eigenvectors are not shared among the participants due to the use of fully federated QR orthonormalisation. This not only increases the scalabiliy of the proposed approach in terms of transmission costs but also improves the privacy of the algorithm. Preprint (PDF | 682 kb)
Relevance for FeatureCloud: This paper contributes to the main objective of WP3 (guideline development for the software development process, the documentation, and the machine learning process) as it presents a guideline for quality management systems (QMS) for academic organizations regarding the successful development of reusable biomedical software for research or clinical practice. It provides a starting point to implement a QMS tailored to specific needs effortlessly and greatly facilitates technology transfer in a controlled manner, thereby supporting reproducibility and reusability.
Relevance for FeatureCloud: One crucial aspect of the safe integration of artificial intelligence (AI) into medical decision-making is ensuring that a human medical expert maintains control. For FeatureCloud App development and future App contributions to the FeatureCloud AI Store, research results as described in this review article are essential. The article describes a concept of causability that is a measure of whether and to what extent humans can understand a given machine explanation. We motivate causability with a clinical case from cancer research. We argue for using causability in medical artificial intelligence (AI) to develop and evaluate future human-AI interfaces.
Relevance for FeatureCloud: In this paper we describe a novel holistic approach to an automated medical decision pipeline that builds on the latest machine learning research, integrating the human-in-the-loop via an innovative, interactive, and exploration-based explainability technique called counterfactual graphs. We outline how multi-modal representations enable joint learning of a single outcome, how embeddings can be learned in a distributed manner securely and efficiently, and how to leverage counterfactual paths for intuitive explainability and causability. This approach could be used as a basis for novel medical Apps in the FeatureCloud AI Store.
Relevance for FeatureCloud: This review covers bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets, and development of therapeutic strategies. Evaluating these tools helped us a lot in designing and optimising our own apps and tools during the development stage of the FeatureCloud AI Store.
Relevance for FeatureCloud: This method review paper is mainly relevant for systems medicine experts that want to use the FeatureCloud platform in the future or contribute an app to the platform as the paper’s key findings may have far-reaching consequences for the field of active module identification. The review paper found that, to date and in essence, active module identification methods (AMIMs) do not produce biologically more meaningful candidate disease modules on widely used protein-protein interaction (PPI) networks than on random networks with the same node degrees.
Relevance for FeatureCloud: The AIMe registry is a community-driven reporting platform for AI in biomedicine that the FeatureCloud team developed. It aims to enhance the accessibility, reproducibility, and usability of biomedical AI models and allows future revisions by the scientific community. AIMe stands for “artificial intelligence in biomedical research” and consists of a user-friendly web service, which guides authors of new AIs through the AIMe standard, a generic minimal information standard that allows reporting of any biomedical AI system. As such, the paper serves one of FeatureCloud’s main objectives, namely increasing transparency and thereby maximizing societal acceptance and patient trust.
Relevance for FeatureCloud: In this first large consortium publication, we present the FeatureCloud AI Store as an all-in-one platform for federated learning (FL) in biomedical research and other applications. The AI Store removes much complexity for developers and end-users by providing an extensible collection of ready-to-use apps. We show that the federated apps produce similar results to centralized ML, scale well for a typical number of collaborators, and can be combined with Secure Multiparty Computation (SMPC), thereby making FL algorithms safely and easily applicable in biomedical and clinical environments.
Relevance for FeatureCloud: Despite tremendous advances in next-generation sequencing technology, accumulating large amounts of omics data, study limitations due to small sample sizes remain an issue, especially in rare disease clinical research. Technological heterogeneity and batch effects limit the applicability of traditional statistics and machine learning (ML) analysis. In direct relevance for the ML approaches used in FeatureCloud, this paper presents a meta-learning approach to transfer knowledge from big data and reduce the search space in data with small sample sizes.
Relevance for FeatureCloud: To reach the future goal of precision medicine to best tailor medical decisions, health practices, and therapies to the individual patient, network-based algorithms in biomedicine will need to be interpretable by the “human-in-the-loop” (e.g. a medical doctor), trustworthy, and reliable. In this paper, the team around Prof. Dr. Andreas Holzinger who leads WP4 (Supervised federated machine learning) demonstrate subnetwork detection based on multi-modal node features using a new Greedy Decision Forest. This approach allows for better interpretability, which is a crucial factor in gaining the trust of biomedical experts in such algorithms.
Relevance for FeatureCloud: Federated Learning (FL) decreases privacy risks when training Machine Learning (ML) models on distributed data, as it removes the need for sharing and centralizing sensitive data, but this learning paradigm can also influence the effectiveness of the obtained prediction models. In this paper, we specifically study Neural Networks, as a powerful and popular ML model, and contrast the impact of Federated Learning on the effectiveness compared to a centralized approach – when data is aggregated at one place before processing – to assess to what extent Federated Learning is suited as a replacement.
Relevance for FeatureCloud: The term “permissionless” has established itself within the context of blockchain and distributed ledger research to characterise protocols and systems that exhibit similar properties to Bitcoin, but the technology behind it is also highly relevant for the blockchain-based user-consent-management that is planned in FeatureCloud. This paper sheds light on this topic by revising research that either incorporates or defines the term permissionless and systematically exposes the properties and characteristics that its utilisation intends to capture.
Relevance for FeatureCloud: As part of WP7, Flimma addresses the issue of patient privacy while preserving scientific accuracy when transcriptomics data from multiple hospitals are analysed by implementing the state-of-the-art workflow “limma voom” in a privacy-preserving, federated manner. Patient data never leaves its source site and results are identical to those generated by “limma voom” on combined datasets even in imbalanced scenarios where meta-analysis approaches fail.
Relevance for FeatureCloud: This paper investigates attack scenarios and success rates for a malicious node in federated learning settings such as in FeatureCloud, considering both sequential and parallel strategies, and thus builds a basis for estimating risks from potential adversaries participating in the federated learning.
Relevance for FeatureCloud: This paper estimates how well membership inference attacks, for example, determining whether a data sample was used in a machine learning model training process. This translates also to federated learning, for example, whether there is an increased risk to privacy if honest-but-curious participants can observe a number of exchanged model parameters. Results of this attack analysis fed into the risk analysis and will contribute to the mitigation strategies in WP2, and will influence directly the implementation of the federated learning in WP7 in the FeatureCloud project.
Relevance for FeatureCloud: This paper provides the first proof of principle for AI-enhanced systems medicine prediction of drug repurposing against COVID-19. It is the first paper to systematically provide still centralized network medicine AI, which will eventually be extended into a federated, decentralized approach that will be implemented in the FeatureCloud platform dedicated to a global anti-COVID-19 network headed by the International Network Medicine Consortium.
Relevance for FeatureCloud: Within the realm of FeatureCloud’s focus on privacy-preserving technology, this paper on k-anonymity is an important contribution. K-anonymity is an approach for enabling privacy-preserving data publishing of personal, sensitive data. As a result of the anonymisation process, however, the utility of the sanitised data is generally lower than on the original data. Quantifying this utility loss is important to estimate the usefulness of the resulting datasets. In this paper, several of these utility aspects are analysed.
Relevance for FeatureCloud: In this paper, we introduce the notion of causability, which is extending explainability and is of great importance for future Human-AI interfaces in WP 4. Such interfaces for explainable AI have to map the technical explainability (which is a property of an AI, e.g. the heatmap of a neural network produced by e.g. layer-wise relevance propagation) with causability (which is a property of a human, i.e. the extent to which the technical explanation is interpretable by a human) and to answer questions of why we need a ground truth, i.e. a technical framework for understanding. Here counterfactuals are important P (y x | x ′, y ′) with the typical activity of “retrospection” and questions including “what-if?” – this is highly relevant to re-trace and to make the results of FeatureCloud interpretable to experts within the medical domain.
Relevance for FeatureCloud: Advancements in Artificial Intelligence (AI) and Machine Learning (ML) are enabling new diagnostic capabilities. In this paper, we argue that the very first step before introducing AI/ML into diagnostic workflows is a deep understanding of how pathologists work. We developed a visualization concept, including (a) the sequence of the views observed by the pathologist (Observation Path), (b) the sequence of the spoken comments and statements of the pathologist (Dictation Path), (c) the underlying knowledge and experience of the pathologist (Knowledge Path), (d) information about the current phase of the diagnostic process and (e) the current magnification factor of the microscope chosen by the pathologist. This is highly important for explainable AI in the context of WP4 hence extremely valuable for the whole FeatureCloud project.
Relevance for FeatureCloud: In this paper, we investigate medical decision processes and the relevance of explainability in decision making. The first step for implementing decision-paths in systems is to retrace an experienced pathologist’s diagnosis-finding process. Recording a route through a landscape composed of human tissue in terms of a roadbook is one possible approach to collecting information on how diagnoses are found. Choosing the roadbook metaphor provides a simple schema, that holds basic directions enriched with metadata regarding landmarks on a rally – in the context of pathology such landmarks provide information on the decision-finding process. This is highly relevant for explainable AI in the context of WP4 and hence extremely valuable for the whole FeatureCloud project.