Access topic-related background literature as well as scientific publications of the FeatureCloud consortium itself. Once new publications are available, we list them here. Open access listings are linked directly to the full text version. All others lead to the respective preprint archive, journal repository, or publisher.
FeatureCloud Publications (count: 51)
- FeatureCloud Consortium (2022). Publishable Summary of the 3rd Research Period of the Project (1st July 2021 – 30th June 2022). (PDF | 134 KB)
- Bernett J et al. (2022). Robust disease module mining via enumeration of diverse prize-collecting Steiner trees. Bioinformatics 38(6): pp. 1600–1606 (PDF | 1 MB)
Relevance for FeatureCloud: For analysis of large-scale biomedical data, e.g. in the Apps of the FeatureCloud App Store, several obstacles need to be overcome. Disease module mining methods (DMMMs), for example, often include non-robust steps in their workflows. This lack of robustness has a negative effect on the trustworthiness of the obtained subnetworks, such as protein-protein interaction networks. To overcome this problem, this publication presents a new DMMM called ROBUST (robust disease module mining via enumeration of diverse prize-collecting Steiner trees). In a large-scale empirical evaluation, we show that ROBUST outperforms competing methods in terms of robustness, scalability and, in most settings, functional relevance of the produced modules, measured via KEGG (Kyoto Encyclopedia of Genes and Genomes) gene set enrichment scores and overlap with DisGeNET disease genes.
- Cao H et al (2022). dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning. Bioinformatics – Data and text mining (online ahead of print): 1–8 (PDF | 3 MB)
Relevance for FeatureCloud: This paper describes the development of dsMTL (‘Federated MultiTask Learning for DataSHIELD’), a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised algorithms and one unsupervised algorithm. As such, the work falls under work packages (WPs) 4 and 5 and the overall privacy-preserving focus of the FeatureCloud project. First, the authors derive the theoretical properties of these methods and the relevant machine-learning workflows to ensure the validity of the software implementation. Second, they implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, the applicability of dsMTL for comorbidity modeling in distributed data is demonstrated.
- Cavallin F & Mayer R (2022). Anomaly Detection from Distributed Data Sources via Federated Learning. In: Barolli, L., Hussain, F., Enokido, T. (eds) Advanced Information Networking and Applications (AINA ’22), Lecture Notes in Networks and Systems (Springer, Cham) 450: pp. 317–328
Relevance for FeatureCloud: Anomaly detection is an important task to identify rare events such as fraud, intrusions, or medical diseases. However, it often needs to be applied on personal or otherwise sensitive data, e.g. business data. This gives rise to concerns regarding the protection of the sensitive data, especially if it is to be analysed by third parties, e.g. in collaborative settings, where data is collected by different entities, but shall be analysed together to benefit from more effective models. As part of FeatureCloud WP2, this paper describes an anomaly detection task on two different benchmark datasets, in supervised, semi-supervised, and unsupervised settings. The authors federated Multi-Layer Perceptrons, Gaussian Mixture Models, and Isolation Forests, and compared them to a centralised approach. Preprint (PDF | 297 kb)
- Fdhila W et al. (2022). Verifying Compliance in Process Choreographies: Foundations, Algorithms, and Implementation. Information Systems 108, Article: 101983 (PDF | 2 MB)
Relevance for FeatureCloud: Providing automatic compliance verification in process choreographies is crucial for any cross-organisational collaboration, including research collaborations that will use the FeatureCloud App Store in the future. An example would be the legal necessity to check at each participating hospital whether the federated models used for the data aggregation are indeed compliant with the GDPR requirements for data consent. This work deals with the question of how to verify global compliance if affected tasks are not fully visible.
- Fdhila W, Stifter N, and Judmayer A (2022). Challenges and Opportunities of Blockchain for Auditable Processes in the Healthcare Sector. In: Business Process Management (BMP 2022): Blockchain, Robotic Process Automation, and Central and Eastern Europe Forum. Lecture Notes in Business Information Processing, Vol. 459, Springer, Cham: pp 68–83 (PDF | 630 kb)
Relevance for FeatureCloud: Blockchain technologies (BT), a central part of FeatureCloud work package 6 (WP6), promise exciting research directions for improving various aspects of business processes, in particular in cross-organisational settings where participants do not fully trust each other. However, while blockchain may readily provide transparency and immutability for the processes recorded on a shared ledger, these very characteristics can be problematic in regard to privacy and data protection requirements. In this paper, we address the challenges and opportunities of using BT to secure distributed processes where participants may have the incentive to make false claims or subvert pre-agreed compliance rules in their private processes.
- Ghesmati S, Fdhila W, and Weippl E (2022). SoK: How private is Bitcoin? Classification and Evaluation of Bitcoin Privacy Techniques. Proceedings of the 17th International Conference on Availability, Reliability and Security (ARES 2022), Article No. 5: pp. 1–14 (PDF | 857 kb)
Relevance for FeatureCloud: Tracking consents and providing manipulation security is of vital importance for any data analysis platform dealing with sensitive personal information. WP6 focuses on the application of blockchain technologies in order to provide the FeatureCloud platform with means for facilitating audits and consent tracking. This is not only important with respect to user privacy and protecting patient rights, but also allows the replication of results. With all the different basic technologies and designs for blockchains, a thorough analysis of their privacy aspects is of vital importance. This paper studies the privacy properties of open blockchains such as Bitcoin, where anyone can join, validate, access, and analyse the history of all transactions since the genesis block. Although FeatureCloud uses blockchain technology solely for managing consents and studies commitments (hashes of actual data), it can still be possible to use sophisticated heuristics to reveal meta-data. identify correlations between different transactions or reveal patient identities. Therefore, we also provide an extensive evaluation and comparison of different privacy-enhancing techniques employed in UTXO-based blockchains such as Bitcoin.
- Ghesmati S, Fdhila W, and Weippl E (2022). Usability of Cryptocurrency Wallets Providing CoinJoin Transactions. Proceedings of the Usable Security and Privacy (USEC ’22) Symposium, Article: 2022-285 (PDF | 948 kb)
Relevance for FeatureCloud: This work applies to FeatureCloud’s WP6 (blockchain and user rights management). Over the past years, the interest in blockchain technology and its applications has tremendously increased, accompanied – however – by serious threats that raised concerns over user data privacy. This resulted in a multitude of privacy-preserving techniques that offer different guarantees in terms of trust, decentralization, and traceability. CoinJoin is one of the promising techniques. Using the example of cryptocurrency, this paper provides a comprehensive usability study of three main Bitcoin wallets that integrate the CoinJoin technique. Similar privacy-preserving techniques would have to be applied if blockchains will be used to manage patients’ rights in research centers using the FeatureCloud platform to analyse medical data.
- Ghesmati S, Fdhila W, and Weippl E (2022). User-Perceived Privacy in Blockchain. Proceedings of the CoDecFin ’22 Workshop, Article: 2022-287 (PDF | 2.2 MB)
Relevance for FeatureCloud: FeatureCloud’s WP6 recognises the importance of integrating as many federated machine learning nodes as possible, while being aware of privacy rights and regulations, especially GDPR and regulations for medical data, and conducts research into blockchain-based technologies for user rights management, consent, and data discovery mechanisms. This article studies users’ privacy perceptions of UTXO-based blockchains. It elaborates on a mental model of employing privacy-preserving techniques for blockchain transactions. Furthermore, it evaluates users’ awareness of blockchain privacy issues and examines their preferences towards existing privacy-enhancing solutions, i.e., add-on techniques to Bitcoin versus built-in techniques in privacy coins.
- Hartebrodt A and Röttger R (2022). Federated horizontally partitioned principal component analysis for biomedical applications. Bioinformatics Advances 2(1), Article: vbac026 (PDF | 863 kb)
Relevance for FeatureCloud: This work investigates the challenges of moving principal component analysis (PCA), a widely used tool often serving as an initial step in machine learning and visualisation workflows, to the federated domain. It provides implementations of different federated PCA algorithms and evaluates them regarding their accuracy for high-dimensional biological data using realistic sample distributions over multiple data sites, and their ability to preserve downstream analyses. Complementing the simulated results, the authors used the FeatureCloud platform for a real-world implementation of the investigated algorithms. The corresponding App has multiple modes, including a batch mode and a train/test mode allowing for cross-validation splits. They also ran tests using the FeatureCloud ‘Testbed’, which allows simulating a federated setting by spawning multiple clients on the same machine while passing parameters through a remote relay server.
- Hartebrodt A, Röttger R, and Blumenthal D (2022). Federated singular value decomposition for high dimensional data. Preprint available on Arxiv: 2205.12109 (PDF | 2.9 MB)
Relevance for FeatureCloud: In this article, we present a federated singular value decomposition (SVD) algorithm, suitable for the privacy-related and computational requirements of GWAS. Notably, the algorithm has a transmission cost independent of the number of samples and is only weakly dependent on the number of features, because the singular vectors associated with the samples are never exchanged and the vectors associated with the features only for a fixed number of iterations. Although motivated by GWAS, the algorithm is generically applicable for both horizontally and vertically partitioned data. A corresponding federated App was produced for the FeatureCloud App Store and is available on the platform.
- Hauschild A-C et al. (2022). Federated Random Forests can improve local performance of predictive models for various healthcare applications. Bioinformatics 38(8): pp. 2278–2286 (PDF | 794 kb)
Relevance for FeatureCloud: Aligned with FeatureCloud’s goal to enable decentralised analysis across medical and research institutions by using federated learning architectures and to improve the performance of predictive models in medicine and healthcare, this paper evaluates the efficacy of federated Random Forests, algorithms that support both classification and regression analysis. In fact, the FeatureCloud App Store already contains two such Apps (“Random Forest” and “Random Survival Forest”). The authors focus particularly on the heterogeneity within and between datasets, addressing three common challenges: (i) number of parties, (ii) sizes of datasets, and (iii) imbalanced phenotypes, evaluated on five biomedical datasets.
- Holzinger A et al. (2022). Personas for Artificial Intelligence (AI) an Open Source Toolbox. IEEE Access 10: pp. 23732-23747 (PDF | 2 MB)
Relevance for FeatureCloud: As part of FeatureCloud’s WP4, this work shows how the so-called “personas method” can be adapted to support the development of human-centered artificial intelligence (AI) applications, as demonstrated in the example of a medical context. This work is – to our knowledge – the first to provide personas for AI using an openly available Personas for AI toolbox. The toolbox contains guidelines and material supporting persona development for AI as well as templates and pictures for persona visualisation. It is ready to use and freely available to the international research and development community.
- Holzinger A et al. (2022). Information fusion as an integrative cross-cutting enabler to achieve robust, explainable, and trustworthy medical artificial intelligence. Information Fusion 79: pp. 263-278 (PDF | 2 MB)
Relevance for FeatureCloud: This paper elucidates the opportunity that modern information fusion provides in bridging the gap between research and practical applications in the context of future trustworthy medical artificial intelligence (AI). In this context, it aligns directly with FeatureCloud’s motivation and imperative to include ethical and legal aspects as a cross-cutting discipline, because all future AI solutions must not only be ethically responsible but also legally compliant.
- Judmayer A et al. (2022). Estimating (Miner) Extractable Value is Hard, Let’s Go Shopping! Proceedings of the CoDecFin ’22 Workshop, Article: 2021-1231 (PDF | 2.5 MB)
Relevance for FeatureCloud: This work extends into WP2 (cyber risk assessment and mitigation) and WP6 (blockchain and user rights management) of FeatureCloud as it investigates the so-called “miner extractable value (MEV)”. The term is not solidly defined yet, so far mainly used in the world of gaming, game theory, and cryptocurrencies, but may in fact play an important role in assessing the security and stability of blockchain-based user consent and user rights management methods planned to be used in FeatureCloud as well.
- Judmayer A et al. (2022). How much is the fork? Fast Probability and Profitability Calculation during Temporary Forks. Proceedings of the International Workshop on Cryptoasset Analytics (CAAW ’22), Article: 2022-359 (PDF | 1.8 MB)
Relevance for FeatureCloud: Estimating the probability, as well as the profitability, of different attacks is of utmost importance when assessing the security and stability of prevalent blockchain-based encryption technologies. In this paper, we present a simple yet practical model to calculate the success probability of finite attacks, while considering already contributed blocks and victims that do not give up easily. Hereby, we introduce a more fine-grained distinction between different actor types and the sides they take during an attack. With regard to cryptocurrencies, the presented model simplifies assessing the profitability of forks in practical settings, while also enabling fast and more accurate estimations of the economic security grantees in certain scenarios, but in principle, our work is equally relevant for encryption methods in the medical domain and therefore applies to both WP2 (cyber risk assessment and mitigation) and WP6 (blockchain and user rights management) of FeatureCloud.
- Nasirigerdeh R et al. (2022). sPLINK: a hybrid federated tool as a robust alternative to meta-analysis in genome-wide association studies. Genome Biology 23, Article number: 32 (PDF | 1.7 MB)
Relevance for FeatureCloud: Directly relevant to FeatureCloud, we here demonstrate the principle of federated machine learning. While the FeatureCloud prototype platform emerged, we have worked on stand-alone solutions for typical medical application scenarios, including a federated genome-wide association study (GWAS) tool, called “sPLINK”.
- Nuding F & Mayer R (2022). Data Poisoning in Sequential and Parallel Federated Learning. Proc. of the ACM International Workshop on Security and Privacy Analytics (IWSPA ’22): pp. 24–34
Relevance for FeatureCloud: This paper directly relates to WP2 (Cyber risk assessment and mitigation) as it evaluates data poisoning attacks in federated settings. By altering certain inputs that are used in the training phase with a specific pattern, an adversary may later trigger malicious behaviour in the prediction phase. The observations described in this paper are very similar for both traffic sign and face recognition data, as well as different types of backdoors, and thus likely generalize well to other domains. Considering that the federated system is a distributed one, and the multitude of participants likely offers easier options for an adversary to manipulate one node, the power that a manipulated node receives over the training process is a reason for concern. Therefore, future work needs to specifically address the issue of defending against such attacks in a federated learning setting. Preprint (PDF | 2.2 MB)
- Pfeifer B et al. (2022). GNN-SubNet: disease subnetwork detection with explainable Graph Neural Networks. Preprint available at bioRxiv: 475995 (PDF | 604 kb)
Relevance for FeatureCloud: When applying artificial intelligence (AI) methods, such as graphical neural networks (GNNs) in biomedicine, major challenges with regard to practical relevance are comprehensibility, interpretability, and explainability. As part of WP4 (Supervised federated machine learning) of the FeatureCloud project, this article introduces a framework for the detection of disease subnetworks by using a simple modification of the so-called “GNNexplainer program”. An integrated protein-protein interaction (PPI) knowledge graph restricts the model to learn on more reliable and biologically more meaningful trajectories compared to classical deep learning (DL) approaches. To support the EU’s open science policy, this new method is freely available to the research community on GitHub.
- Salgado-Albarrán M et al. (2022). CTCFL regulates the PI3K-Akt pathway and it is a target for personalized ovarian cancer therapy. NPJ Systems Biology and Applications 8, Article number: 5 (PDF | 2.3 MB)
Relevance for FeatureCloud: To unravel new target genes with clinical relevance, this collaborative work used in vitro models to understand the regulatory functions of the oncogene CTCFL, a transcriptional factor highly expressed in ovarian cancer. We then analysed a selection of gene candidates using de novo network enrichment analysis. The resulting mechanistic candidates were further assessed regarding their prognostic potential and druggability. FeatureCloud Apps that are already available in the App Store can be used to complete the same tasks (in a federated, privacy-preserving manner) that were applied to the data sets of this manuscript, namely preprocessing, Kaplan–Meier estimation, Cox proportional hazard model (CPH), and random survival forest (RSF).
- Späth J et al. (2022). Privacy-aware multi-institutional time-to-event studies. PLOS Digit Health 1(9): e0000101 (PDF | 3 MB)
Relevance for FeatureCloud: Clinical time-to-event studies (e.g. survival analysis after a novel treatment) are dependent on large sample sizes, that are often not available at a single institution and, instead, require datasets from multiple institutions. Hospitals are, however, legally not allowed to share their data due to strict data protection laws. This publication presents privacy-aware and federated implementations of the most used time-to-event algorithms (survival curve, cumulative hazard rate, log-rank test, and Cox proportional hazards model) in clinical trials, based on a hybrid approach of federated learning, additive secret sharing, and differential privacy. All algorithms are accessible through the intuitive web-app Partea. Federated machine learning and differential privacy are concepts that play a central role in the FeatureCloud App Store. Therefore, this work is an extension of FeatureCloud’s overall goal and ambition, namely to enable privacy-aware data analysis via apps that are available through a secure federated machine learning approach across distributed institutions.
- Stifter N, Judmayer A, Schindler P, and Weippl E (2022). Opportunistic Algorithmic Double-Spending: How I Learned to Stop Worrying and Love the Fork. Proceedings of the 27th European Symposium on Research in Computer Security (ESORICS 2022), Part I: pp 46-66 (PDF | 557 kb)
Relevance for FeatureCloud: This paper investigates fundamental properties pertaining to the security and consistency of Blockchain- and Distributed-Ledger-Technologies that support expressive transaction semantics and, in particular, allow for stateful smart contracts. Hereby, we are able to show that a transaction property we refer to as semantic malleability can lead to a novel form of algorithmic double-spending, which can evade common detection techniques and has other unique properties that render it particularly worrisome. While in principle this attack is of primary concern for forkable ledger constructs that do not offer instant finality, the described technique can also be successfully used for attacks in ledgers with finality if the underlying security assumptions do not hold in practice. This means that technical failures could have catastrophic consequences on data consistency, even if the ledger state is redundantly stored across multiple sites and governed by Byzantine fault-tolerant consensus algorithms. Hence, our work strongly relates to WP2 (Cyber risk assessment and mitigation) and WP6 (blockchain and user rights management) and helps inform our design decisions for choosing robust ledger architectures that can achieve the desired properties and necessary security for managing user rights and consent for such sensitive data.
- Torkzadehmahani R et al. (2022). Privacy-preserving Artiﬁcial Intelligence Techniques in Biomedicine. Methods Inf Med 61(S 01): e12-e27 (PDF | 2.0 MB)
Relevance for FeatureCloud: FeatureCloud is about the proof of feasibility and implementation of new security and privacy techniques in the medical domain, especially federated machine learning. This paper summarises the state of the art in privacy-enhancing technology for the processing of biomedical data and provides the basis for the techniques that FeatureCloud needs to support in order to ensure privacy-preserving AI in biomedicine.
- Wolff J et al. (2022). Federated machine learning for a facilitated implementation of Artificial Intelligence in healthcare – a proof of concept study for the prediction of coronary artery calcification scores. J Integr Bioinform: 20220032 (PDF | 2 MB)
Relevance for FeatureCloud: Two hurdles that the successful application of artificial intelligence (AI) methods in healthcare still face is a) the access to data and b) the establishment of trust in the accuracy of AI-based technologies amongst clinicians and researchers. This paper presents a crucial proof-of-concept study that addresses these hurdles with the example of the “CACulator” app that has been developed for the prediction of coronary artery calcification scores (CACS). The authors demonstrated that CACS prediction is feasible via both a centralised approach and a federated approach and that both show very comparable accuracy. Because the FeatureCloud App Store is based on the concept of privacy-enhancing federated learning (FL), the FeatureCloud platform was used to implement and test the app in an FL environment.
- FeatureCloud Consortium (2021). Publishable Summary of the 2nd Research Period of the Project (1st May 2020 – 30th June 2021). (PDF | 160 KB)
- Galindez G et al. (2021). Lessons from the COVID-19 pandemic for advancing computational drug repurposing strategies. Nat Comp Sci, 1: pp. 33-41 (PDF | 2.0 MB)
Relevance for FeatureCloud: The computational methods that are reviewed and summarised in this paper, including methods to identify repurposable drugs and examining the reliability of underlying data resources, are very relevant for some apps of the FeatureCloud platform and also in line with the general objectives of the FeatureCloud project.
- Grønning AGB et al. (2021). Enabling single-cell trajectory network enrichment. Nat Comp Sci, 11: 3518 (PDF | 3.5 MB)
Relevance for FeatureCloud: Single-cell sequencing (scRNA-seq) technologies are a very powerful tool with unprecedented spatial resolution, but accompanying analyses are extremely challenging. This paper describes the novel algorithm “Scellnetor”, a network-constraint time-series clustering algorithm that allows the extraction of temporal differential gene expression network patterns (modules) that explain the difference in the regulation of two developmental trajectories. This algorithm (in its federated form) will likely enrich the FeatureCloud App Store in the future.
- Hartebrodt et al. (2021). Federated Principal Component Analysis for Genome-Wide Association Studies. IEEE International Conference on Data Mining (ICDM) 2021: pp. 1090-1095
Relevance for FeatureCloud: Within the scope of WP5 (Unsupervised federated machine-learning), this paper presents a user-friendly tool to promote federated learning in less technically inclined communities, namely an improved federated principal component analysis algorithm. This, for instance, be used in federated population stratification for genome-wide association studies (GWAS). Unlike previous algorithms, the eigenvectors are not shared among the participants due to the use of fully federated QR orthonormalisation. This not only increases the scalabiliy of the proposed approach in terms of transmission costs but also improves the privacy of the algorithm. Preprint (PDF | 682 kb)
- Hauschild A-C et al. (2021). Fostering reproducibility, reusability, and technology transfer in health informatics. iScience Perspective, 24 (7): 102803 (PDF | 0.9 MB )
Relevance for FeatureCloud: This paper contributes to the main objective of WP3 (guideline development for the software development process, the documentation, and the machine learning process) as it presents a guideline for quality management systems (QMS) for academic organizations regarding the successful development of reusable biomedical software for research or clinical practice. It provides a starting point to implement a QMS tailored to specific needs effortlessly and greatly facilitates technology transfer in a controlled manner, thereby supporting reproducibility and reusability.
- Holzinger A & Müller H (2021). Toward Human–AI Interfaces to Support Explainability and Causability in Medical AI. Computer, 54 (10): pp. 78-86 (PDF | 1.2 MB)
Relevance for FeatureCloud: One crucial aspect of the safe integration of artificial intelligence (AI) into medical decision-making is ensuring that a human medical expert maintains control. For FeatureCloud App development and future App contributions to the FeatureCloud App Store, research results as described in this review article are essential. The article describes a concept of causability that is a measure of whether and to what extent humans can understand a given machine explanation. We motivate causability with a clinical case from cancer research. We argue for using causability in medical artificial intelligence (AI) to develop and evaluate future human-AI interfaces.
- Holzinger A et al. (2021). Towards multi-modal causability with Graph Neural Networks enabling information fusion for explainable AI. Information Fusion, 71: pp. 28-37 (PDF | 1.4 MB)
Relevance for FeatureCloud: In this paper we describe a novel holistic approach to an automated medical decision pipeline that builds on the latest machine learning research, integrating the human-in-the-loop via an innovative, interactive, and exploration-based explainability technique called counterfactual graphs. We outline how multi-modal representations enable joint learning of a single outcome, how embeddings can be learned in a distributed manner securely and efficiently, and how to leverage counterfactual paths for intuitive explainability and causability. This approach could be used as a basis for novel medical Apps in the FeatureCloud App Store.
- Hufsky F et al. (2021). Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research. Brief Bioinform, 22 (2): 642-663 (PDF | 2.4 MB)
Relevance for FeatureCloud: This review covers bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets, and development of therapeutic strategies. Evaluating these tools helped us a lot in designing and optimising our own apps and tools during the development stage of the FeatureCloud App Store.
- Lazareva O et al. (2021). On the limits of active module identification. Brief Bioinform, 22 (5): pp. 1–11 (PDF | 1.16 MB)
Relevance for FeatureCloud: This method review paper is mainly relevant for systems medicine experts that want to use the FeatureCloud platform in the future or contribute an app to the platform as the paper’s key findings may have far-reaching consequences for the field of active module identification. The review paper found that, to date and in essence, active module identification methods (AMIMs) do not produce biologically more meaningful candidate disease modules on widely used protein-protein interaction (PPI) networks than on random networks with the same node degrees.
- Matschinske J et al. (2021). The AIMe registry for artificial intelligence in biomedical research. Nature Methods, 18: pp. 1128–1131 (PDF | 0.9 MB)
Relevance for FeatureCloud: The AIMe registry is a community-driven reporting platform for AI in biomedicine that the FeatureCloud team developed. It aims to enhance the accessibility, reproducibility, and usability of biomedical AI models and allows future revisions by the scientific community. AIMe stands for “artificial intelligence in biomedical research” and consists of a user-friendly web service, which guides authors of new AIs through the AIMe standard, a generic minimal information standard that allows reporting of any biomedical AI system. As such, the paper serves one of FeatureCloud’s main objectives, namely increasing transparency and thereby maximizing societal acceptance and patient trust.
- Matschinske J et al. (2021). The FeatureCloud App Store for Federated Learning in Biomedicine and Beyond. Computer Science, Machine Learning, arXiv preprint, Article number: 2105.05734 (PDF | 1.9 MB)
Relevance for FeatureCloud: In this first large consortium publication, we present the FeatureCloud App Store as an all-in-one platform for federated learning (FL) in biomedical research and other applications. The App Store removes much complexity for developers and end-users by providing an extensible collection of ready-to-use apps. We show that the federated apps produce similar results to centralized ML, scale well for a typical number of collaborators, and can be combined with Secure Multiparty Computation (SMPC), thereby making FL algorithms safely and easily applicable in biomedical and clinical environments.
- Park Y, Hauschild A-C & Heider D (2021). Transfer learning compensates limited data, batch effects, and technical heterogeneity in single-cell sequencing. NAR Genom Bioinform, 3 (4): pp. 1-9 (PDF | 1.7 MB)
Relevance for FeatureCloud: Despite tremendous advances in next-generation sequencing technology, accumulating large amounts of omics data, study limitations due to small sample sizes remain an issue, especially in rare disease clinical research. Technological heterogeneity and batch effects limit the applicability of traditional statistics and machine learning (ML) analysis. In direct relevance for the ML approaches used in FeatureCloud, this paper presents a meta-learning approach to transfer knowledge from big data and reduce the search space in data with small sample sizes.
- Park Y, Heider D, and Hauschild A-C (2021). Integrative Analysis of Next-Generation Sequencing for Next-Generation Cancer Research toward Artificial Intelligence. Cancers 13(13), Article: 3148 (PDF | 1.4 MB)
Relevance for FeatureCloud: In recent years both research areas of next-generation sequencing and artificial intelligence (AI) have grown remarkably. Their intersection simultaneously gave rise to a panacea of different algorithms and applications. This article delineates tailored machine learning and systems biology approaches and combinations thereof that tackle the various challenges that arise in the face of big data. Moreover, it provides an overview of the numerous applications of AI aiding the analysis and interpretation of next-generation sequencing data.
- Pfeifer B, Saranti A & Holzinger A. Network Module Detection from Multi-Modal Node Features with a Greedy Decision Forest for Actionable Explainable AI. Preprint available at Arxiv: 2108.11674 (PDF | 1 MB)
Relevance for FeatureCloud: To reach the future goal of precision medicine to best tailor medical decisions, health practices, and therapies to the individual patient, network-based algorithms in biomedicine will need to be interpretable by the “human-in-the-loop” (e.g. a medical doctor), trustworthy, and reliable. In this paper, the team around Prof. Dr. Andreas Holzinger who leads WP4 (Supervised federated machine learning) demonstrate subnetwork detection based on multi-modal node features using a new Greedy Decision Forest. This approach allows for better interpretability, which is a crucial factor in gaining the trust of biomedical experts in such algorithms.
- Pustozerova A, Rauber A & Mayer R (2021). Training Effective Neural Networks on Structured Data with Federated Learning. Advanced Information Networking and Applications (AINA 2021), Lecture Notes in Networks and Systems, 226: online ISBN: 978-3-030-75075-6 (PDF | 0.9 MB)
Relevance for FeatureCloud: Federated Learning (FL) decreases privacy risks when training Machine Learning (ML) models on distributed data, as it removes the need for sharing and centralizing sensitive data, but this learning paradigm can also influence the effectiveness of the obtained prediction models. In this paper, we specifically study Neural Networks, as a powerful and popular ML model, and contrast the impact of Federated Learning on the effectiveness compared to a centralized approach – when data is aggregated at one place before processing – to assess to what extent Federated Learning is suited as a replacement.
- Stifter N et al. (2021). What is Meant by Permissionless Blockchains? Preprint available at Eprint/IACR: 2021/023 (PDF | 235 kb)
Relevance for FeatureCloud: The term “permissionless” has established itself within the context of blockchain and distributed ledger research to characterise protocols and systems that exhibit similar properties to Bitcoin, but the technology behind it is also highly relevant for the blockchain-based user-consent-management that is planned in FeatureCloud. This paper sheds light on this topic by revising research that either incorporates or defines the term permissionless and systematically exposes the properties and characteristics that its utilisation intends to capture.
- Zolotareva O et al. (2021). Flimma: a federated and privacy-aware tool for differential gene expression analysis. Genome Biology 22, Article number: 338 (PDF | 1.7 MB)
Relevance for FeatureCloud: As part of WP7, Flimma addresses the issue of patient privacy while preserving scientific accuracy when transcriptomics data from multiple hospitals are analysed by implementing the state-of-the-art workflow “limma voom” in a privacy-preserving, federated manner. Patient data never leaves its source site and results are identical to those generated by “limma voom” on combined datasets even in imbalanced scenarios where meta-analysis approaches fail.
- FeatureCloud Consortium (2020). Publishable Summary of the 1st Research Period of the Project (1st January 2019 – 30th April 2020). (PDF | 100 KB)
- Holzinger A (2020). Explainable AI and Multi-Modal Causability in Medicine. i-com 19(3): 171–179 (PDF | 230 kb)
Relevance for FeatureCloud: The key for future human-AI interfaces is to map explainability with causability and to allow a domain expert to ask questions to understand why an AI came up with a result, and also to ask “what-if” questions (counterfactuals) to gain insight into the underlying independent explanatory factors of a result. A multi-modal causability is important in the medical domain because often different modalities contribute to a result. For FeatureCloud, the aspects explored in this article are absolutely crucial as we are creating a platform for the interaction of humans who work in the medical domain and who will interact with AI on the platform.
- Nuding F and Mayer R (2020). Poisoning Attacks in Federated Learning: An Evaluation on Traffic Sign Classification. CODASPY ’20: Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy, New Orleans , LA , USA: pp. 168–170 (PDF | 485 kb)
Relevance for FeatureCloud: This paper investigates attack scenarios and success rates for a malicious node in federated learning settings such as in FeatureCloud, considering both sequential and parallel strategies, and thus builds a basis for estimating risks from potential adversaries participating in the federated learning.
- Pustozerova A & Mayer R (2020). Information Leaks in Federated Learning. Workshop on Decentralized IoT Systems and Security (DISS) 2020, San Diego, CA, USA. Article number: 2020.23004 (PDF | 921 kb).
Relevance for FeatureCloud: This paper estimates how well membership inference attacks, for example, determining whether a data sample was used in a machine learning model training process. This translates also to federated learning, for example, whether there is an increased risk to privacy if honest-but-curious participants can observe a number of exchanged model parameters. Results of this attack analysis fed into the risk analysis and will contribute to the mitigation strategies in WP2, and will influence directly the implementation of the federated learning in WP7 in the FeatureCloud project.
- Sadegh S et al. (2020). Exploring the SARS-CoV-2 virus-host-drug interactome for drug repurposing. Nature Communications, 11, Article number: 3518 (PDF | 1.5 MB)
Relevance for FeatureCloud: This paper provides the first proof of principle for AI-enhanced systems medicine prediction of drug repurposing against COVID-19. It is the first paper to systematically provide still centralized network medicine AI, which will eventually be extended into a federated, decentralized approach that will be implemented in the FeatureCloud platform dedicated to a global anti-COVID-19 network headed by the International Network Medicine Consortium.
- Šarčević T, Molnar D & Mayer R (2020). An Analysis of Different Notions of Effectiveness in k-Anonymity. International Conference on Privacy in Statistical Databases (PSD 2020), Lecture Notes in Computer Science 12276: pp. 121-135 (PDF | 1.7 MB)
Relevance for FeatureCloud: Within the realm of FeatureCloud’s focus on privacy-preserving technology, this paper on k-anonymity is an important contribution. K-anonymity is an approach for enabling privacy-preserving data publishing of personal, sensitive data. As a result of the anonymisation process, however, the utility of the sanitised data is generally lower than on the original data. Quantifying this utility loss is important to estimate the usefulness of the resulting datasets. In this paper, several of these utility aspects are analysed.
- Silverman EK et al. (2020). Molecular networks in Network Medicine: Development and applications. WIREs Systems Biology and Medicine 12(6), Article: e1489 (PDF | 6.7 MB)
Relevance for FeatureCloud: This advanced review article discusses the types of molecular data that are used in molecular network analyses, the analytical methods for inferring molecular networks, and the efforts to validate and visualise molecular networks. Successful applications of molecular network analysis have been reported in pulmonary arterial hypertension, coronary heart disease, diabetes mellitus, chronic lung diseases, and drug development. Important knowledge gaps in Network Medicine include, however, incompleteness of the molecular interactome, challenges in identifying key genes within genetic association regions, and limited applications to human diseases. As FeatureCloud taps into the Omics era and aims to target international consortia and projects that will analyse large amounts of data located at a multitude of clinical sites and biomedical research institutions, this article elucidates the importance of analysing molecular networks to decipher the underlying mechanisms of diseases rather than solely categorising diseases by symptoms and organs.
- Holzinger A et al. (2019). Causability and Explainability of AI in Medicine. WIREs Data Mining Knowl Discov. 9, Article number: e1312 (PDF | 2.0 MB)
Relevance for FeatureCloud: In this paper, we introduce the notion of causability, which is extending explainability and is of great importance for future Human-AI interfaces in WP 4. Such interfaces for explainable AI have to map the technical explainability (which is a property of an AI, e.g. the heatmap of a neural network produced by e.g. layer-wise relevance propagation) with causability (which is a property of a human, i.e. the extent to which the technical explanation is interpretable by a human) and to answer questions of why we need a ground truth, i.e. a technical framework for understanding. Here counterfactuals are important P (y x | x ′, y ′) with the typical activity of “retrospection” and questions including “what-if?” – this is highly relevant to re-trace and to make the results of FeatureCloud interpretable to experts within the medical domain.
- Pohn B et al. (2019). Towards a Deeper Understanding of How a Pathologist Makes a Diagnosis: Visualization of the Diagnostic Process in Histopathology. 2019 IEEE Symposium on Computers and Communications (ISCC), Barcelona, Spain. pp. 1081-186 (PDF | 2.5 MB)
Relevance for FeatureCloud: Advancements in Artificial Intelligence (AI) and Machine Learning (ML) are enabling new diagnostic capabilities. In this paper, we argue that the very first step before introducing AI/ML into diagnostic workflows is a deep understanding of how pathologists work. We developed a visualization concept, including (a) the sequence of the views observed by the pathologist (Observation Path), (b) the sequence of the spoken comments and statements of the pathologist (Dictation Path), (c) the underlying knowledge and experience of the pathologist (Knowledge Path), (d) information about the current phase of the diagnostic process and (e) the current magnification factor of the microscope chosen by the pathologist. This is highly important for explainable AI in the context of WP4 hence extremely valuable for the whole FeatureCloud project.
- Pohn B et al. (2019). Visualization of Histopathological Decision Making Using a Roadbook Metaphor. 23rd International Conference Information Visualisation (IV), Paris, France. pp. 392-397 (PDF | 9.0 MB)
Relevance for FeatureCloud: In this paper, we investigate medical decision processes and the relevance of explainability in decision making. The first step for implementing decision-paths in systems is to retrace an experienced pathologist’s diagnosis-finding process. Recording a route through a landscape composed of human tissue in terms of a roadbook is one possible approach to collecting information on how diagnoses are found. Choosing the roadbook metaphor provides a simple schema, that holds basic directions enriched with metadata regarding landmarks on a rally – in the context of pathology such landmarks provide information on the decision-finding process. This is highly relevant for explainable AI in the context of WP4 and hence extremely valuable for the whole FeatureCloud project.
- Alcaraz N et al. (2017). De novo pathway-based biomarker identification. Nucleic Acids Res. 45(16): e151.
- Benchoufi M, Porcher R, and Ravaud P (2017). Blockchain protocols in clinical trials: Transparency and traceability of consent. F1000Research. 6: 66.
- Brisimi TS et al. (2018). Federated learning of predictive models from federated electronic health records. Int J Med Inform. 112: 59-67.
- Chen Y, Elenee Argentinis JD, and Weber G (2016). IBM Watson: How Cognitive Computing Can Be Applied to Big Data Challenges in Life Sciences Research. Clin Ther. 38(4): 688-701.
- Holzinger A et al. (2019). Interactive machine learning: experimental evidence for the human in the algorithmic loop. Applied Intelligence, 49, (7), 2401-2414, doi:10.1007/s10489-018-1361-5.
- Holzinger A et al. (2017). What do we need to build explainable AI systems for the medical domain? arXiv: 1712.09923vl.
- Holzinger A (2016). Interactive Machine Learning for Health. Informatics: When do we need the human-in-the-loop?Brain Informatics. 3, (2), 119-131, doi:10.1007/s40708-016-0042-6.
- Hustinx P (2010). Privacy by design: delivering the promises. Identity in the Information Society. 3(2): 253-255.
- Konečný J, McMahan HB, Yu FX, Richtárik P, Suresh AT, Bacon D (2017). Federated learning: Strategies for improving communication efficiency. arXiv: 1610.05492v2.
- Kuo TT, Kim HE, and Ohno-Machado L (2018). Blockchain distributed ledger technologies for biomedical and health care applications. J Am Med Inform Assoc. 24(6): 1211-1220.
- List M et al. (2016). KeyPathwayMinerWeb: online multi-omics network enrichment. Nucleic Acids Res. (W1): W98-W104.
- Mamoshina P et al. (2018) Converging blockchain and next-generation artificial intelligence technologies to decentralize and accelerate biomedical research and healthcare. 9(5): 5665-5690.
- Schmidt HHHW et al. (2018). Expert Panel Discusses the Importance of Systems Medicine. Systems Medicine. 1(1): 3-8.
- Wiwie C, Baumbach J, and Rottger R (2015). Comparing the performance of biomedical clustering methods. Nature Methods. 12(11): 1033-1038.