Globally, nearly 800,000 lives are lost to suicide annually, yet current clinical assessment methods for suicide risk show significant limitations in prediction accuracy. Studies indicate that nearly 90% of individuals who died by suicide were assessed as low-risk within a month of their death, with data showing 67% had a primary care visit in their final month. Despite suicide's multifactorial nature and significant public health burden, extensive research has yet to yield confirmed biological targets or effective therapeutic interventions. This proposal integrates machine learning with multi-omic data to enhance risk prediction while simultaneously identifying potential drug repurposing candidates through examining convergent targets via molecular network analysis, signature-based repurposing, and genetic association-based methods.
Using data from over 21,000 individuals with suicide-related behaviors from the All of Us Research Program and Mass General Brigham Biobank, we will conduct an isoform-level Transcriptome-Wide Association Study (isoTWAS) to map tissue-specific regulatory mechanisms. This analysis will integrate GWAS findings with brain-specific expression data from dorsolateral prefrontal cortex and anterior cingulate cortex through PsychENCODE and CommonMind consortia, examining both narrow (suicide attempts) and broad (including ideation) phenotypes. The isoTWAS framework employs multivariate elastic net regression to model multiple isoforms from cis-window SNPs, enabling detection of associations missed by gene-level analyses.
Building on our previous machine learning models that achieved AUCs of 0.71-0.93 using EHR data, we will enhance prediction accuracy by integrating molecular features. The model will sequentially incorporate polygenic risk scores, an isoTWAS-derived transcriptomic index, and OmicsPred-imputed molecular traits. This approach requires only a single historical blood draw for genotyping, from which we can impute over 17,000 molecular traits. We will develop three classification approaches: distinguishing suicide-related behaviors from population controls, from psychiatric controls, and a multi-class model differentiating all three groups. Model performance will be optimized across 6, 12, and 18-month prediction windows through Bayesian hyperparameter tuning and evaluated using both XGBoost with local feature selection methods and a Super Learner (stacked ensemble) model combining six base algorithms.
For therapeutic discovery, we will implement three complementary drug repurposing strategies. Network-based analyses will identify drug candidates using gene modules and protein interaction networks. We will construct these networks using both dynamically regulated gene co-expression modules (identified through Independent Component Analysis) and protein-protein interactions (mapped using Root Noise Model). Signature-based methods will match compounds whose expression signatures oppose suicide-associated transcriptional patterns, while MAGMA will prioritize drugs targeting genes with strong genetic associations. This three-pronged approach, applied across both datasets, will identify compounds with converging evidence from network topology, transcriptional responses, and genetic associations.
Success metrics include: identification of tissue-specific regulatory mechanisms through isoTWAS, improved predictive performance through molecular feature integration, and discovery of drug repurposing candidates supported by multiple lines of evidence. The resulting molecular risk assessment tool will require only a single genetic sample and integrate with existing clinical workflows, while the drug repurposing framework may identify compounds for therapeutic development.