find out!
Access topic-related background literature as well as scientific publications of the FeatureCloud consortium itself. Once new publications are available, we list them here. Open access listings are linked directly to the full text version. All others lead to the respective preprint archive, journal repository, or publisher.
FeatureCloud Publications (count: 55)
2023
Relevance for FeatureCloud: This review paper addresses the topics of nearly all work packages within the FeatureCloud project. In the medical field, the processing of large patient data sets promises great progress in personalized health care. However, it is strictly regulated, such as by the General Data Protection Regulation (GDPR). These regulations mandate strict data security and data protection and, thus, create major challenges for collecting and using data sets. Technologies such as federated learning (FL), especially paired with differential privacy (DP) and secure multiparty computation (SMPC), aim to solve these challenges. Here, we summarize the current discussion on the legal concerns and legal compliance related to FL systems, including privacy-enhancing technologies such as DP and SMPC, in medical research.
Relevance for FeatureCloud: Unsupervised machine learning (ML) has many challenges and at the same time privacy-related advantages that work package 5 (WP5) addresses. As part of WP5, this publication highlights federated learning (FL) as a privacy-aware data mining strategy keeping the private data on the owners’ machine, and thereby confidential. The clients compute local models and send them to an aggregator which computes a global model. In hybrid FL, the local parameters are additionally masked using secure aggregation, such that only the global aggregated statistics become available in clear text, not the client-specific updates. In this context, the paper investigates the data leakage of three popular algorithms for so-called QR decomposition, namely the Gram-Schmidt orthonormalization, the Householder algorithm, and the Givens rotation.
Relevance for FeatureCloud: Integrating the human into the loop within federated deep learning is a key aspect of FeatureCloud’s work package 4 (WP4). This paper elucidates how so-called domain-knowledge graphs support the explainability of federated deep learning. Federated learning is enabled by dividing the knowledge graph into relevant subnetworks, constructing an ensemble classifier, and allowing domain experts to analyze and manipulate detected subnetworks using a developed user interface. Furthermore, the human-in-the-loop principle can be applied with the incorporation of experts, interacting through a sophisticated User Interface (UI) driven by Explainable Artificial Intelligence (xAI) methods, changing the datasets to create counterfactual explanations.
Relevance for FeatureCloud: Work package 2 (WP2) of the FeatureCloud project is concerned with cyber security, threats, and mitigation. The commercial use of machine learning (ML) is spreading, while intellectual property (IP) protection of trained models remains a pressing issue. In this article, we develop a comprehensive threat model for IP in ML, categorizing attacks and defenses within a unified and consolidated taxonomy, thus bridging research from both the ML and security communities.
Relevance for FeatureCloud: In this major consortium publication, we present the FeatureCloud App Store as an all-in-one platform for federated learning (FL) in biomedical research and other applications. The App Store removes much complexity for developers and end-users by providing an extensible collection of ready-to-use apps. We show that the federated apps produce similar results to centralized ML, scale well for a typical number of collaborators, and can be combined with Secure Multiparty Computation (SMPC), thereby making FL algorithms safely and easily applicable in biomedical and clinical environments.
2022
- FeatureCloud Consortium (2022). Publishable Summary of the 3rd Research Period of the Project (1st July 2021 – 30th June 2022). (PDF | 134 KB)
- Bernett J et al. (2022). Robust disease module mining via enumeration of diverse prize-collecting Steiner trees. Bioinformatics 38(6): pp. 1600–1606 (PDF | 1 MB)
Relevance for FeatureCloud: For analysis of large-scale biomedical data, e.g. in the Apps of the FeatureCloud App Store, several obstacles need to be overcome. Disease module mining methods (DMMMs), for example, often include non-robust steps in their workflows. This lack of robustness has a negative effect on the trustworthiness of the obtained subnetworks, such as protein-protein interaction networks. To overcome this problem, this publication presents a new DMMM called ROBUST (robust disease module mining via enumeration of diverse prize-collecting Steiner trees). In a large-scale empirical evaluation, we show that ROBUST outperforms competing methods in terms of robustness, scalability and, in most settings, functional relevance of the produced modules, measured via KEGG (Kyoto Encyclopedia of Genes and Genomes) gene set enrichment scores and overlap with DisGeNET disease genes.
Relevance for FeatureCloud: This paper describes the development of dsMTL (‘Federated MultiTask Learning for DataSHIELD’), a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised algorithms and one unsupervised algorithm. As such, the work falls under work packages (WPs) 4 and 5 and the overall privacy-preserving focus of the FeatureCloud project. First, the authors derive the theoretical properties of these methods and the relevant machine-learning workflows to ensure the validity of the software implementation. Second, they implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, the applicability of dsMTL for comorbidity modeling in distributed data is demonstrated.
Relevance for FeatureCloud: Anomaly detection is an important task to identify rare events such as fraud, intrusions, or medical diseases. However, it often needs to be applied on personal or otherwise sensitive data, e.g. business data. This gives rise to concerns regarding the protection of the sensitive data, especially if it is to be analysed by third parties, e.g. in collaborative settings, where data is collected by different entities, but shall be analysed together to benefit from more effective models. As part of FeatureCloud WP2, this paper describes an anomaly detection task on two different benchmark datasets, in supervised, semi-supervised, and unsupervised settings. The authors federated Multi-Layer Perceptrons, Gaussian Mixture Models, and Isolation Forests, and compared them to a centralised approach. Preprint (PDF | 297 kb)
Relevance for FeatureCloud: Providing automatic compliance verification in process choreographies is crucial for any cross-organisational collaboration, including research collaborations that will use the FeatureCloud App Store in the future. An example would be the legal necessity to check at each participating hospital whether the federated models used for the data aggregation are indeed compliant with the GDPR requirements for data consent. This work deals with the question of how to verify global compliance if affected tasks are not fully visible.
Relevance for FeatureCloud: Blockchain technologies (BT), a central part of FeatureCloud work package 6 (WP6), promise exciting research directions for improving various aspects of business processes, in particular in cross-organisational settings where participants do not fully trust each other. However, while blockchain may readily provide transparency and immutability for the processes recorded on a shared ledger, these very characteristics can be problematic in regard to privacy and data protection requirements. In this paper, we address the challenges and opportunities of using BT to secure distributed processes where participants may have the incentive to make false claims or subvert pre-agreed compliance rules in their private processes.
Relevance for FeatureCloud: Tracking consents and providing manipulation security is of vital importance for any data analysis platform dealing with sensitive personal information. WP6 focuses on the application of blockchain technologies in order to provide the FeatureCloud platform with means for facilitating audits and consent tracking. This is not only important with respect to user privacy and protecting patient rights, but also allows the replication of results. With all the different basic technologies and designs for blockchains, a thorough analysis of their privacy aspects is of vital importance. This paper studies the privacy properties of open blockchains such as Bitcoin, where anyone can join, validate, access, and analyse the history of all transactions since the genesis block. Although FeatureCloud uses blockchain technology solely for managing consents and studies commitments (hashes of actual data), it can still be possible to use sophisticated heuristics to reveal meta-data. identify correlations between different transactions or reveal patient identities. Therefore, we also provide an extensive evaluation and comparison of different privacy-enhancing techniques employed in UTXO-based blockchains such as Bitcoin.
Relevance for FeatureCloud: This work applies to FeatureCloud’s WP6 (blockchain and user rights management). Over the past years, the interest in blockchain technology and its applications has tremendously increased, accompanied – however – by serious threats that raised concerns over user data privacy. This resulted in a multitude of privacy-preserving techniques that offer different guarantees in terms of trust, decentralization, and traceability. CoinJoin is one of the promising techniques. Using the example of cryptocurrency, this paper provides a comprehensive usability study of three main Bitcoin wallets that integrate the CoinJoin technique. Similar privacy-preserving techniques would have to be applied if blockchains will be used to manage patients’ rights in research centers using the FeatureCloud platform to analyse medical data.
Relevance for FeatureCloud: FeatureCloud’s WP6 recognises the importance of integrating as many federated machine learning nodes as possible, while being aware of privacy rights and regulations, especially GDPR and regulations for medical data, and conducts research into blockchain-based technologies for user rights management, consent, and data discovery mechanisms. This article studies users’ privacy perceptions of UTXO-based blockchains. It elaborates on a mental model of employing privacy-preserving techniques for blockchain transactions. Furthermore, it evaluates users’ awareness of blockchain privacy issues and examines their preferences towards existing privacy-enhancing solutions, i.e., add-on techniques to Bitcoin versus built-in techniques in privacy coins.
Relevance for FeatureCloud: This work investigates the challenges of moving principal component analysis (PCA), a widely used tool often serving as an initial step in machine learning and visualisation workflows, to the federated domain. It provides implementations of different federated PCA algorithms and evaluates them regarding their accuracy for high-dimensional biological data using realistic sample distributions over multiple data sites, and their ability to preserve downstream analyses. Complementing the simulated results, the authors used the FeatureCloud platform for a real-world implementation of the investigated algorithms. The corresponding App has multiple modes, including a batch mode and a train/test mode allowing for cross-validation splits. They also ran tests using the FeatureCloud ‘Testbed’, which allows simulating a federated setting by spawning multiple clients on the same machine while passing parameters through a remote relay server.
Relevance for FeatureCloud: In this article, we present a federated singular value decomposition (SVD) algorithm, suitable for the privacy-related and computational requirements of GWAS. Notably, the algorithm has a transmission cost independent of the number of samples and is only weakly dependent on the number of features, because the singular vectors associated with the samples are never exchanged and the vectors associated with the features only for a fixed number of iterations. Although motivated by GWAS, the algorithm is generically applicable for both horizontally and vertically partitioned data. A corresponding federated App was produced for the FeatureCloud App Store and is available on the platform.
Relevance for FeatureCloud: Aligned with FeatureCloud’s goal to enable decentralised analysis across medical and research institutions by using federated learning architectures and to improve the performance of predictive models in medicine and healthcare, this paper evaluates the efficacy of federated Random Forests, algorithms that support both classification and regression analysis. In fact, the FeatureCloud App Store already contains two such Apps (“Random Forest” and “Random Survival Forest”). The authors focus particularly on the heterogeneity within and between datasets, addressing three common challenges: (i) number of parties, (ii) sizes of datasets, and (iii) imbalanced phenotypes, evaluated on five biomedical datasets.
Relevance for FeatureCloud: As part of FeatureCloud’s WP4, this work shows how the so-called “personas method” can be adapted to support the development of human-centered artificial intelligence (AI) applications, as demonstrated in the example of a medical context. This work is – to our knowledge – the first to provide personas for AI using an openly available Personas for AI toolbox. The toolbox contains guidelines and material supporting persona development for AI as well as templates and pictures for persona visualisation. It is ready to use and freely available to the international research and development community.
Relevance for FeatureCloud: This paper elucidates the opportunity that modern information fusion provides in bridging the gap between research and practical applications in the context of future trustworthy medical artificial intelligence (AI). In this context, it aligns directly with FeatureCloud’s motivation and imperative to include ethical and legal aspects as a cross-cutting discipline, because all future AI solutions must not only be ethically responsible but also legally compliant.
Relevance for FeatureCloud: This work extends into WP2 (cyber risk assessment and mitigation) and WP6 (blockchain and user rights management) of FeatureCloud as it investigates the so-called “miner extractable value (MEV)”. The term is not solidly defined yet, so far mainly used in the world of gaming, game theory, and cryptocurrencies, but may in fact play an important role in assessing the security and stability of blockchain-based user consent and user rights management methods planned to be used in FeatureCloud as well.
Relevance for FeatureCloud: Estimating the probability, as well as the profitability, of different attacks is of utmost importance when assessing the security and stability of prevalent blockchain-based encryption technologies. In this paper, we present a simple yet practical model to calculate the success probability of finite attacks, while considering already contributed blocks and victims that do not give up easily. Hereby, we introduce a more fine-grained distinction between different actor types and the sides they take during an attack. With regard to cryptocurrencies, the presented model simplifies assessing the profitability of forks in practical settings, while also enabling fast and more accurate estimations of the economic security grantees in certain scenarios, but in principle, our work is equally relevant for encryption methods in the medical domain and therefore applies to both WP2 (cyber risk assessment and mitigation) and WP6 (blockchain and user rights management) of FeatureCloud.
Relevance for FeatureCloud: Directly relevant to FeatureCloud, we here demonstrate the principle of federated machine learning. While the FeatureCloud prototype platform emerged, we have worked on stand-alone solutions for typical medical application scenarios, including a federated genome-wide association study (GWAS) tool, called “sPLINK”.
Relevance for FeatureCloud: This paper directly relates to WP2 (Cyber risk assessment and mitigation) as it evaluates data poisoning attacks in federated settings. By altering certain inputs that are used in the training phase with a specific pattern, an adversary may later trigger malicious behaviour in the prediction phase. The observations described in this paper are very similar for both traffic sign and face recognition data, as well as different types of backdoors, and thus likely generalize well to other domains. Considering that the federated system is a distributed one, and the multitude of participants likely offers easier options for an adversary to manipulate one node, the power that a manipulated node receives over the training process is a reason for concern. Therefore, future work needs to specifically address the issue of defending against such attacks in a federated learning setting. Preprint (PDF | 2.2 MB)
Relevance for FeatureCloud: When applying artificial intelligence (AI) methods, such as graphical neural networks (GNNs) in biomedicine, major challenges with regard to practical relevance are comprehensibility, interpretability, and explainability. As part of WP4 (Supervised federated machine learning) of the FeatureCloud project, this article introduces a framework for the detection of disease subnetworks by using a simple modification of the so-called “GNNexplainer program”. An integrated protein-protein interaction (PPI) knowledge graph restricts the model to learn on more reliable and biologically more meaningful trajectories compared to classical deep learning (DL) approaches. To support the EU’s open science policy, this new method is freely available to the research community on GitHub.
Relevance for FeatureCloud: To unravel new target genes with clinical relevance, this collaborative work used in vitro models to understand the regulatory functions of the oncogene CTCFL, a transcriptional factor highly expressed in ovarian cancer. We then analysed a selection of gene candidates using de novo network enrichment analysis. The resulting mechanistic candidates were further assessed regarding their prognostic potential and druggability. FeatureCloud Apps that are already available in the App Store can be used to complete the same tasks (in a federated, privacy-preserving manner) that were applied to the data sets of this manuscript, namely preprocessing, Kaplan–Meier estimation, Cox proportional hazard model (CPH), and random survival forest (RSF).
Relevance for FeatureCloud: Clinical time-to-event studies (e.g. survival analysis after a novel treatment) are dependent on large sample sizes, that are often not available at a single institution and, instead, require datasets from multiple institutions. Hospitals are, however, legally not allowed to share their data due to strict data protection laws. This publication presents privacy-aware and federated implementations of the most used time-to-event algorithms (survival curve, cumulative hazard rate, log-rank test, and Cox proportional hazards model) in clinical trials, based on a hybrid approach of federated learning, additive secret sharing, and differential privacy. All algorithms are accessible through the intuitive web-app Partea. Federated machine learning and differential privacy are concepts that play a central role in the FeatureCloud App Store. Therefore, this work is an extension of FeatureCloud’s overall goal and ambition, namely to enable privacy-aware data analysis via apps that are available through a secure federated machine learning approach across distributed institutions.
Relevance for FeatureCloud: This paper investigates fundamental properties pertaining to the security and consistency of Blockchain- and Distributed-Ledger-Technologies that support expressive transaction semantics and, in particular, allow for stateful smart contracts. Hereby, we are able to show that a transaction property we refer to as semantic malleability can lead to a novel form of algorithmic double-spending, which can evade common detection techniques and has other unique properties that render it particularly worrisome. While in principle this attack is of primary concern for forkable ledger constructs that do not offer instant finality, the described technique can also be successfully used for attacks in ledgers with finality if the underlying security assumptions do not hold in practice. This means that technical failures could have catastrophic consequences on data consistency, even if the ledger state is redundantly stored across multiple sites and governed by Byzantine fault-tolerant consensus algorithms. Hence, our work strongly relates to WP2 (Cyber risk assessment and mitigation) and WP6 (blockchain and user rights management) and helps inform our design decisions for choosing robust ledger architectures that can achieve the desired properties and necessary security for managing user rights and consent for such sensitive data.
Relevance for FeatureCloud: FeatureCloud is about the proof of feasibility and implementation of new security and privacy techniques in the medical domain, especially federated machine learning. This paper summarises the state of the art in privacy-enhancing technology for the processing of biomedical data and provides the basis for the techniques that FeatureCloud needs to support in order to ensure privacy-preserving AI in biomedicine.
Relevance for FeatureCloud: Two hurdles that the successful application of artificial intelligence (AI) methods in healthcare still face is a) the access to data and b) the establishment of trust in the accuracy of AI-based technologies amongst clinicians and researchers. This paper presents a crucial proof-of-concept study that addresses these hurdles with the example of the “CACulator” app that has been developed for the prediction of coronary artery calcification scores (CACS). The authors demonstrated that CACS prediction is feasible via both a centralised approach and a federated approach and that both show very comparable accuracy. Because the FeatureCloud App Store is based on the concept of privacy-enhancing federated learning (FL), the FeatureCloud platform was used to implement and test the app in an FL environment.
2021
- FeatureCloud Consortium (2021). Publishable Summary of the 2nd Research Period of the Project (1st May 2020 – 30th June 2021). (PDF | 160 KB)
- Galindez G et al. (2021). Lessons from the COVID-19 pandemic for advancing computational drug repurposing strategies. Nat Comp Sci, 1: pp. 33-41 (PDF | 2.0 MB)
Relevance for FeatureCloud: The computational methods that are reviewed and summarised in this paper, including methods to identify repurposable drugs and examining the reliability of underlying data resources, are very relevant for some apps of the FeatureCloud platform and also in line with the general objectives of the FeatureCloud project.
Relevance for FeatureCloud: Single-cell sequencing (scRNA-seq) technologies are a very powerful tool with unprecedented spatial resolution, but accompanying analyses are extremely challenging. This paper describes the novel algorithm “Scellnetor”, a network-constraint time-series clustering algorithm that allows the extraction of temporal differential gene expression network patterns (modules) that explain the difference in the regulation of two developmental trajectories. This algorithm (in its federated form) will likely enrich the FeatureCloud App Store in the future.
Relevance for FeatureCloud: Within the scope of WP5 (Unsupervised federated machine-learning), this paper presents a user-friendly tool to promote federated learning in less technically inclined communities, namely an improved federated principal component analysis algorithm. This, for instance, be used in federated population stratification for genome-wide association studies (GWAS). Unlike previous algorithms, the eigenvectors are not shared among the participants due to the use of fully federated QR orthonormalisation. This not only increases the scalabiliy of the proposed approach in terms of transmission costs but also improves the privacy of the algorithm. Preprint (PDF | 682 kb)
Relevance for FeatureCloud: This paper contributes to the main objective of WP3 (guideline development for the software development process, the documentation, and the machine learning process) as it presents a guideline for quality management systems (QMS) for academic organizations regarding the successful development of reusable biomedical software for research or clinical practice. It provides a starting point to implement a QMS tailored to specific needs effortlessly and greatly facilitates technology transfer in a controlled manner, thereby supporting reproducibility and reusability.
Relevance for FeatureCloud: One crucial aspect of the safe integration of artificial intelligence (AI) into medical decision-making is ensuring that a human medical expert maintains control. For FeatureCloud App development and future App contributions to the FeatureCloud App Store, research results as described in this review article are essential. The article describes a concept of causability that is a measure of whether and to what extent humans can understand a given machine explanation. We motivate causability with a clinical case from cancer research. We argue for using causability in medical artificial intelligence (AI) to develop and evaluate future human-AI interfaces.
Relevance for FeatureCloud: In this paper we describe a novel holistic approach to an automated medical decision pipeline that builds on the latest machine learning research, integrating the human-in-the-loop via an innovative, interactive, and exploration-based explainability technique called counterfactual graphs. We outline how multi-modal representations enable joint learning of a single outcome, how embeddings can be learned in a distributed manner securely and efficiently, and how to leverage counterfactual paths for intuitive explainability and causability. This approach could be used as a basis for novel medical Apps in the FeatureCloud App Store.
Relevance for FeatureCloud: This review covers bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets, and development of therapeutic strategies. Evaluating these tools helped us a lot in designing and optimising our own apps and tools during the development stage of the FeatureCloud App Store.
Relevance for FeatureCloud: This method review paper is mainly relevant for systems medicine experts that want to use the FeatureCloud platform in the future or contribute an app to the platform as the paper’s key findings may have far-reaching consequences for the field of active module identification. The review paper found that, to date and in essence, active module identification methods (AMIMs) do not produce biologically more meaningful candidate disease modules on widely used protein-protein interaction (PPI) networks than on random networks with the same node degrees.
Relevance for FeatureCloud: The AIMe registry is a community-driven reporting platform for AI in biomedicine that the FeatureCloud team developed. It aims to enhance the accessibility, reproducibility, and usability of biomedical AI models and allows future revisions by the scientific community. AIMe stands for “artificial intelligence in biomedical research” and consists of a user-friendly web service, which guides authors of new AIs through the AIMe standard, a generic minimal information standard that allows reporting of any biomedical AI system. As such, the paper serves one of FeatureCloud’s main objectives, namely increasing transparency and thereby maximizing societal acceptance and patient trust.
Relevance for FeatureCloud: Despite tremendous advances in next-generation sequencing technology, accumulating large amounts of omics data, study limitations due to small sample sizes remain an issue, especially in rare disease clinical research. Technological heterogeneity and batch effects limit the applicability of traditional statistics and machine learning (ML) analysis. In direct relevance for the ML approaches used in FeatureCloud, this paper presents a meta-learning approach to transfer knowledge from big data and reduce the search space in data with small sample sizes.
Relevance for FeatureCloud: In recent years both research areas of next-generation sequencing and artificial intelligence (AI) have grown remarkably. Their intersection simultaneously gave rise to a panacea of different algorithms and applications. This article delineates tailored machine learning and systems biology approaches and combinations thereof that tackle the various challenges that arise in the face of big data. Moreover, it provides an overview of the numerous applications of AI aiding the analysis and interpretation of next-generation sequencing data.
Relevance for FeatureCloud: To reach the future goal of precision medicine to best tailor medical decisions, health practices, and therapies to the individual patient, network-based algorithms in biomedicine will need to be interpretable by the “human-in-the-loop” (e.g. a medical doctor), trustworthy, and reliable. In this paper, the team around Prof. Dr. Andreas Holzinger who leads WP4 (Supervised federated machine learning) demonstrate subnetwork detection based on multi-modal node features using a new Greedy Decision Forest. This approach allows for better interpretability, which is a crucial factor in gaining the trust of biomedical experts in such algorithms.
Relevance for FeatureCloud: Federated Learning (FL) decreases privacy risks when training Machine Learning (ML) models on distributed data, as it removes the need for sharing and centralizing sensitive data, but this learning paradigm can also influence the effectiveness of the obtained prediction models. In this paper, we specifically study Neural Networks, as a powerful and popular ML model, and contrast the impact of Federated Learning on the effectiveness compared to a centralized approach – when data is aggregated at one place before processing – to assess to what extent Federated Learning is suited as a replacement.
Relevance for FeatureCloud: The term “permissionless” has established itself within the context of blockchain and distributed ledger research to characterise protocols and systems that exhibit similar properties to Bitcoin, but the technology behind it is also highly relevant for the blockchain-based user-consent-management that is planned in FeatureCloud. This paper sheds light on this topic by revising research that either incorporates or defines the term permissionless and systematically exposes the properties and characteristics that its utilisation intends to capture.
Relevance for FeatureCloud: As part of WP7, Flimma addresses the issue of patient privacy while preserving scientific accuracy when transcriptomics data from multiple hospitals are analysed by implementing the state-of-the-art workflow “limma voom” in a privacy-preserving, federated manner. Patient data never leaves its source site and results are identical to those generated by “limma voom” on combined datasets even in imbalanced scenarios where meta-analysis approaches fail.
2020
- FeatureCloud Consortium (2020). Publishable Summary of the 1st Research Period of the Project (1st January 2019 – 30th April 2020). (PDF | 100 KB)
- Holzinger A (2020). Explainable AI and Multi-Modal Causability in Medicine. i-com 19(3): 171–179 (PDF | 230 kb)
Relevance for FeatureCloud: The key for future human-AI interfaces is to map explainability with causability and to allow a domain expert to ask questions to understand why an AI came up with a result, and also to ask “what-if” questions (counterfactuals) to gain insight into the underlying independent explanatory factors of a result. A multi-modal causability is important in the medical domain because often different modalities contribute to a result. For FeatureCloud, the aspects explored in this article are absolutely crucial as we are creating a platform for the interaction of humans who work in the medical domain and who will interact with AI on the platform.
Relevance for FeatureCloud: This paper investigates attack scenarios and success rates for a malicious node in federated learning settings such as in FeatureCloud, considering both sequential and parallel strategies, and thus builds a basis for estimating risks from potential adversaries participating in the federated learning. Preprint (PDF | 485 kb)
Relevance for FeatureCloud: This paper estimates how well membership inference attacks, for example, determining whether a data sample was used in a machine learning model training process. This translates also to federated learning, for example, whether there is an increased risk to privacy if honest-but-curious participants can observe a number of exchanged model parameters. Results of this attack analysis fed into the risk analysis and will contribute to the mitigation strategies in WP2, and will influence directly the implementation of the federated learning in WP7 in the FeatureCloud project.
Relevance for FeatureCloud: This paper provides the first proof of principle for AI-enhanced systems medicine prediction of drug repurposing against COVID-19. It is the first paper to systematically provide still centralized network medicine AI, which will eventually be extended into a federated, decentralized approach that will be implemented in the FeatureCloud platform dedicated to a global anti-COVID-19 network headed by the International Network Medicine Consortium.
Relevance for FeatureCloud: Within the realm of FeatureCloud’s focus on privacy-preserving technology, this paper on k-anonymity is an important contribution. K-anonymity is an approach for enabling privacy-preserving data publishing of personal, sensitive data. As a result of the anonymisation process, however, the utility of the sanitised data is generally lower than on the original data. Quantifying this utility loss is important to estimate the usefulness of the resulting datasets. In this paper, several of these utility aspects are analysed. Preprint (PDF | 1.7 MB)
Relevance for FeatureCloud: This advanced review article discusses the types of molecular data that are used in molecular network analyses, the analytical methods for inferring molecular networks, and the efforts to validate and visualise molecular networks. Successful applications of molecular network analysis have been reported in pulmonary arterial hypertension, coronary heart disease, diabetes mellitus, chronic lung diseases, and drug development. Important knowledge gaps in Network Medicine include, however, incompleteness of the molecular interactome, challenges in identifying key genes within genetic association regions, and limited applications to human diseases. As FeatureCloud taps into the Omics era and aims to target international consortia and projects that will analyse large amounts of data located at a multitude of clinical sites and biomedical research institutions, this article elucidates the importance of analysing molecular networks to decipher the underlying mechanisms of diseases rather than solely categorising diseases by symptoms and organs.
2019
Relevance for FeatureCloud: In this paper, we introduce the notion of causability, which is extending explainability and is of great importance for future Human-AI interfaces in WP 4. Such interfaces for explainable AI have to map the technical explainability (which is a property of an AI, e.g. the heatmap of a neural network produced by e.g. layer-wise relevance propagation) with causability (which is a property of a human, i.e. the extent to which the technical explanation is interpretable by a human) and to answer questions of why we need a ground truth, i.e. a technical framework for understanding. Here counterfactuals are important P (y x | x ′, y ′) with the typical activity of “retrospection” and questions including “what-if?” – this is highly relevant to re-trace and to make the results of FeatureCloud interpretable to experts within the medical domain.
Relevance for FeatureCloud: Advancements in Artificial Intelligence (AI) and Machine Learning (ML) are enabling new diagnostic capabilities. In this paper, we argue that the very first step before introducing AI/ML into diagnostic workflows is a deep understanding of how pathologists work. We developed a visualization concept, including (a) the sequence of the views observed by the pathologist (Observation Path), (b) the sequence of the spoken comments and statements of the pathologist (Dictation Path), (c) the underlying knowledge and experience of the pathologist (Knowledge Path), (d) information about the current phase of the diagnostic process and (e) the current magnification factor of the microscope chosen by the pathologist. This is highly important for explainable AI in the context of WP4 hence extremely valuable for the whole FeatureCloud project.
Relevance for FeatureCloud: In this paper, we investigate medical decision processes and the relevance of explainability in decision making. The first step for implementing decision-paths in systems is to retrace an experienced pathologist’s diagnosis-finding process. Recording a route through a landscape composed of human tissue in terms of a roadbook is one possible approach to collecting information on how diagnoses are found. Choosing the roadbook metaphor provides a simple schema, that holds basic directions enriched with metadata regarding landmarks on a rally – in the context of pathology such landmarks provide information on the decision-finding process. This is highly relevant for explainable AI in the context of WP4 and hence extremely valuable for the whole FeatureCloud project.
Background Reading
- Alcaraz N et al. (2017). De novo pathway-based biomarker identification. Nucleic Acids Res. 45(16): e151.
- Benchoufi M, Porcher R, and Ravaud P (2017). Blockchain protocols in clinical trials: Transparency and traceability of consent. F1000Research. 6: 66.
- Brisimi TS et al. (2018). Federated learning of predictive models from federated electronic health records. Int J Med Inform. 112: 59-67.
- Chen Y, Elenee Argentinis JD, and Weber G (2016). IBM Watson: How Cognitive Computing Can Be Applied to Big Data Challenges in Life Sciences Research. Clin Ther. 38(4): 688-701.
- Holzinger A et al. (2019). Interactive machine learning: experimental evidence for the human in the algorithmic loop. Applied Intelligence, 49, (7), 2401-2414, doi:10.1007/s10489-018-1361-5.
- Holzinger A et al. (2017). What do we need to build explainable AI systems for the medical domain? arXiv: 1712.09923vl.
- Holzinger A (2016). Interactive Machine Learning for Health. Informatics: When do we need the human-in-the-loop?Brain Informatics. 3, (2), 119-131, doi:10.1007/s40708-016-0042-6.
- Hustinx P (2010). Privacy by design: delivering the promises. Identity in the Information Society. 3(2): 253-255.
- Konečný J, McMahan HB, Yu FX, Richtárik P, Suresh AT, Bacon D (2017). Federated learning: Strategies for improving communication efficiency. arXiv: 1610.05492v2.
- Kuo TT, Kim HE, and Ohno-Machado L (2018). Blockchain distributed ledger technologies for biomedical and health care applications. J Am Med Inform Assoc. 24(6): 1211-1220.
- List M et al. (2016). KeyPathwayMinerWeb: online multi-omics network enrichment. Nucleic Acids Res. (W1): W98-W104.
- Mamoshina P et al. (2018) Converging blockchain and next-generation artificial intelligence technologies to decentralize and accelerate biomedical research and healthcare. 9(5): 5665-5690.
- Schmidt HHHW et al. (2018). Expert Panel Discusses the Importance of Systems Medicine. Systems Medicine. 1(1): 3-8.
- Wiwie C, Baumbach J, and Rottger R (2015). Comparing the performance of biomedical clustering methods. Nature Methods. 12(11): 1033-1038.