Access topic-related background literature as well as scientific publications of the FeatureCloud consortium itself. Once new publications are available, we list them here. Open access listings are linked directly to the full text version. All others lead to the respective preprint archive, journal repository, or publisher.
FeatureCloud Publications (count: 22)
Relevance for FeatureCloud: Despite tremendous advances in next-generation sequencing technology, accumulating large amounts of omics data, study limitations due to small sample sizes remain an issue, especially in rare disease clinical research. Technological heterogeneity and batch effects limit the applicability of traditional statistics and machine learning (ML) analysis. In direct relevance for the ML approaches used in FeatureCloud, this paper presents a meta-learning approach to transfer knowledge from big data and reduce the search space in data with small sample sizes.
Relevance for FeatureCloud: The computational methods that are reviewed and summarised in this paper, including methods to identify repurposable drugs and examining the reliability of underlying data resources, are very relevant for some apps of the FeatureCloud platform and also in line with the general objectives of the FeatureCloud project.
Relevance for FeatureCloud: This paper contributes to the main objective of WP3 (guideline development for the software development process, the documentation, and the machine learning process) as it presents a guideline for quality management systems (QMS) for academic organizations regarding the successful development of reusable biomedical software for research or clinical practice. It provides a starting point to implement a QMS tailored to specific needs effortlessly and greatly facilitates technology transfer in a controlled manner, thereby supporting reproducibility and reusability.
Relevance for FeatureCloud: In this paper we describe a novel holistic approach to an automated medical decision pipeline that builds on the latest machine learning research, integrating the human-in-the-loop via an innovative, interactive, and exploration-based explainability technique called counterfactual graphs. We outline how multi-modal representations enable joint learning of a single outcome, how embeddings can be learned in a distributed manner securely and efficiently, and how to leverage counterfactual paths for intuitive explainability and causability. This approach could be used as a basis for novel medical Apps in the FeatureCloud AI Store.
Relevance for FeatureCloud: This review covers bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets, and development of therapeutic strategies. Evaluating these tools helped us a lot in designing and optimising our own apps and tools during the development stage of the FeatureCloud AI Store.
Relevance for FeatureCloud: This method review paper is mainly relevant for systems medicine experts that want to use the FeatureCloud platform in the future or contribute an app to the platform as the paper’s key findings may have far-reaching consequences for the field of active module identification. The review paper found that, to date and in essence, active module identification methods (AMIMs) do not produce biologically more meaningful candidate disease modules on widely used protein-protein interaction (PPI) networks than on random networks with the same node degrees.
Relevance for FeatureCloud: The AIMe registry is a community-driven reporting platform for AI in biomedicine that the FeatureCloud team developed. It aims to enhance the accessibility, reproducibility, and usability of biomedical AI models and allows future revisions by the scientific community. AIMe stands for “artificial intelligence in biomedical research” and consists of a user-friendly web service, which guides authors of new AIs through the AIMe standard, a generic minimal information standard that allows reporting of any biomedical AI system. As such, the paper serves one of FeatureCloud’s main objectives, namely increasing transparency and thereby maximizing societal acceptance and patient trust.
Relevance for FeatureCloud: In this first large consortium publication, we present the FeatureCloud AI Store as an all-in-one platform for federated learning (FL) in biomedical research and other applications. The AI Store removes much complexity for developers and end-users by providing an extensible collection of ready-to-use apps. We show that the federated apps produce similar results to centralized ML, scale well for a typical number of collaborators, and can be combined with Secure Multiparty Computation (SMPC), thereby making FL algorithms safely and easily applicable in biomedical and clinical environments.
Relevance for FeatureCloud: Federated Learning (FL) decreases privacy risks when training Machine Learning (ML) models on distributed data, as it removes the need for sharing and centralizing sensitive data, but this learning paradigm can also influence the effectiveness of the obtained prediction models. In this paper, we specifically study Neural Networks, as a powerful and popular ML model, and contrast the impact of Federated Learning on the effectiveness compared to a centralized approach – when data is aggregated at one place before processing – to assess to what extent Federated Learning is suited as a replacement.
Relevance for FeatureCloud: The term “permissionless” has established itself within the context of blockchain and distributed ledger research to characterise protocols and systems that exhibit similar properties to Bitcoin, but the technology behind it is also highly relevant for the blockchain-based user-consent-management that is planned in FeatureCloud. This paper sheds light on this topic by revising research that either incorporates or defines the term permissionless and systematically exposes the properties and characteristics that its utilisation intends to capture.
Relevance for FeatureCloud: Directly relevant for FeatureCloud, we here demonstrate the principle of federated machine learning. While the FeatureCloud prototype platform emerged, we have worked on stand-alone solutions for typical medical application scenarios, including a federated genome-wide association study (GWAS) tool, called “sPLINK”.
Relevance for FeatureCloud: Single-cell sequencing (scRNA-seq) technologies are very powerful tool with unprecedented spatial resolution, but accompanying analyses are extremely challenging. This paper describes the novel algorithm “Scellnetor”, a network-constraint time-series clustering algorithm that allows extraction of temporal differential gene expression network patterns (modules) that explain the difference in the regulation of two developmental trajectories. This algorithm (in its federated form) will likely enrich the FeatureCloud App Store in the future.
Relevance for FeatureCloud: This paper investigates attack scenarios and success rates for a malicious node in federated learning settings such as in FeatureCloud, considering both sequential and parallel strategies, and thus builds a basis for estimating risks from potential adversaries participating in the federated learning.
Relevance for FeatureCloud: This paper estimates how well membership inference attacks, for example, determining whether a data sample was used in a machine learning model training process. This translates also to federated learning, for example, whether there is an increased risk to privacy if honest-but-curious participants can observe a number of exchanged model parameters. Results of this attack analysis fed into the risk analysis and will contribute to the mitigation strategies in WP2, and will influence directly the implementation of the federated learning in WP7 in the FeatureCloud project.
Relevance for FeatureCloud: This paper provides the first proof of principle for AI-enhanced systems medicine prediction of drug repurposing against COVID-19. It is the first paper to systematically provide still centralized network medicine AI, which will eventually be extended into a federated, decentralized approach that will be implemented in the FeatureCloud platform dedicated to a global anti-COVID-19 network headed by the International Network Medicine Consortium.
Relevance for FeatureCloud: Within the realm of FeatureCloud’s focus on privacy-preserving technology, this paper on k-anonymity is an important contribution. K-anonymity is an approach for enabling privacy-preserving data publishing of personal, sensitive data. As a result of the anonymisation process, however, the utility of the sanitised data is generally lower than on the original data. Quantifying this utility loss is important to estimate the usefulness of the resulting datasets. In this paper, several of these utility aspects are analysed.
Relevance for FeatureCloud: FeatureCloud is about the proof of feasibility and implementation of new security and privacy techniques in the medical domain, especially federated machine learning. This paper summarises the state of the art in privacy-enhancing technology for the processing of biomedical data and provides the basis for the techniques that FeatureCloud needs to support in order to ensure privacy-preserving AI in biomedicine.
Relevance for FeatureCloud: As part of WP7, Flimma addresses the issue of patient privacy while preserving scientific accuracy when transcriptomics data from multiple hospitals are analysed by implementing the state-of-the-art workflow “limma voom” in a privacy-preserving, federated manner. Patient data never leaves its source site and results are identical to those generated by “limma voom” on combined datasets even in imbalanced scenarios where meta-analysis approaches fail.
Relevance for FeatureCloud: In this paper, we introduce the notion of causability, which is extending explainability and is of great importance for future Human-AI interfaces in WP 4. Such interfaces for explainable AI have to map the technical explainability (which is a property of an AI, e.g. the heatmap of a neural network produced by e.g. layer-wise relevance propagation) with causability (which is a property of a human, i.e. the extent to which the technical explanation is interpretable by a human) and to answer questions of why we need a ground truth, i.e. a technical framework for understanding. Here counterfactuals are important P (y x | x ′, y ′) with the typical activity of “retrospection” and questions including “what-if?” – this is highly relevant to re-trace and to make the results of FeatureCloud interpretable to experts within the medical domain.
Relevance for FeatureCloud: Advancements in Artificial Intelligence (AI) and Machine Learning (ML) are enabling new diagnostic capabilities. In this paper, we argue that the very first step before introducing AI/ML into diagnostic workflows is a deep understanding of how pathologists work. We developed a visualization concept, including (a) the sequence of the views observed by the pathologist (Observation Path), (b) the sequence of the spoken comments and statements of the pathologist (Dictation Path), (c) the underlying knowledge and experience of the pathologist (Knowledge Path), (d) information about the current phase of the diagnostic process and (e) the current magnification factor of the microscope chosen by the pathologist. This is highly important for explainable AI in the context of WP4 hence extremely valuable for the whole FeatureCloud project.
Relevance for FeatureCloud: In this paper, we investigate medical decision processes and the relevance of explainability in decision making. The first step for implementing decision-paths in systems is to retrace an experienced pathologist’s diagnosis finding process. Recording a route through a landscape composed of human tissue in terms of a roadbook is one possible approach to collect information on how diagnoses are found. Choosing the roadbook metaphor provides a simple schema, that holds basic directions enriched with metadata regarding landmarks on a rally – in the context of pathology such landmarks provide information on the decision finding process. This is highly relevant for explainable AI in the context of WP4 hence extremely valuable for the whole FeatureCloud project.