Patten, Scott B.Sanderson, Michael2020-06-012020-06-012020-05-27Sanderson, M. (2020). Predicting Death by Suicide with Administrative Health Care System Data (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.http://hdl.handle.net/1880/112134Quantifying suicide risk with risk scales is common in clinical practice, but the performance of risk scales has been shown to be limited. Prediction models have been developed to quantify suicide risk and have been shown to outperform risk scales, but these models have not been commonly adopted in clinical practice. The original research presented in this thesis as three manuscripts evaluates the performance of prediction models that quantify suicide risk developed with administrative health care system data. The first two manuscripts were designed to determine the most promising prediction model class and temporal data requirements. The modeling dataset contained 3548 persons that died by suicide and 35,480 persons that did not die by suicide between 2000 and 2016. 101 predictors were selected, and these were assembled for each of the 40 quarters prior to the quarter of death, resulting in 4040 predictors for each person. Logistic regression, feedforward neural network, recurrent neural network, one-dimensional convolutional neural network, and gradient boosted trees model classes were compared. The gradient boosted trees model class achieved the best performance and 8 quarters of data at most were required for optimal performance. The third manuscript applied the findings from the first two manuscripts to evaluate the performance of prediction models in a clinical setting. The prediction models quantified the risk of death by suicide within 90 days following an Emergency Department visit for parasuicide. The modeling dataset contained 268 persons that died by suicide and 33,426 persons that did not die by suicide between 2000 and 2017. The predictors were assembled for each of the 8 quarters prior to the quarter of death, resulting in 808 predictors for each person. Logistic regression and gradient boosted trees model classes were compared. The optimal gradient boosted trees model achieved promising discrimination and calibration. Following the manuscripts, this thesis discusses further research. At present, there is no clinical consensus on the preferred performance characteristics for quantifying suicide risk. The critical next step for further research is to discover the preferred performance characteristics for quantifying suicide risk and to discover whether the preferred performance characteristics can be achieved.engUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.suicidepredictionadministrative datamachine learningEpidemiologyPredicting Death by Suicide with Administrative Health Care System Datadoctoral thesis10.11575/PRISM/37885