Browsing by Author "O’Sullivan, Dylan E."
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- ItemOpen AccessMutational signatures among young-onset testicular cancers(2021-11-24) Mealey, Nicole E.; O’Sullivan, Dylan E.; Peters, Cheryl E.; Heng, Daniel Y. C.; Brenner, Darren R.Abstract Background Incidence of testicular cancer is highest among young adults and has been increasing dramatically for men born since 1945. This study aimed to elucidate the factors driving this trend by investigating differences in mutational signatures by age of onset. Methods We retrieved somatic variant and clinical data pertaining to 135 testicular tumors from The Cancer Genome Atlas. We compared mutational load, prevalence of specific mutated genes, mutation types, and mutational signatures between age of onset groups (< 30 years, 30–39 years, ≥ 40 years) after adjusting for subtype. A recursively partitioned mixture model was utilized to characterize combinations of signatures among the young-onset cases. Results Mutational load was significantly higher among older-onset tumors (p < 0.05). There were no highly prevalent driver mutations among young-onset tumors. Mutated genes and types of nucleotide mutations were not significantly different by age group (p > 0.05). Signatures 1, 8 and 29 were more common among young-onset tumors, while signatures 11 and 16 had higher prevalence among older-onset tumors (p < 0.05). Among young-onset tumors, clustering of signatures resulted in four distinct tumor classes. Conclusions Signature contributions differ by age with signatures 1, 8 and 29 were more common among younger-onset tumors. While these signatures are connected with endogenous deamination of 5-methylcytosine, late replication errors and chewing tobacco, respectively, additional research is needed to further elucidate the etiology of young-onset testicular cancer. Large studies of mutational signatures among young-onset patients are required to understand epidemiologic trends as well as inform targeted prevention and treatment strategies.
- ItemOpen AccessText analysis framework for identifying mutations among non-small cell lung cancer patients from laboratory data(2024-03-11) Yusuf, Amman; Boyne, Devon J.; O’Sullivan, Dylan E.; Brenner, Darren R.; Cheung, Winson Y.; Mirza, Imran; Jarada, Tamer N.Abstract Background Laboratory data can provide great value to support research aimed at reducing the incidence, prolonging survival and enhancing outcomes of cancer. Data is characterized by the information it carries and the format it holds. Data captured in Alberta’s biomarker laboratory repository is free text, cluttered and rouge. Such data format limits its utility and prohibits broader adoption and research development. Text analysis for information extraction of unstructured data can change this and lead to more complete analyses. Previous work on extracting relevant information from free text, unstructured data employed Natural Language Processing (NLP), Machine Learning (ML), rule-based Information Extraction (IE) methods, or a hybrid combination between them. Methods In our study, text analysis was performed on Alberta Precision Laboratories data which consisted of 95,854 entries from the Southern Alberta Dataset (SAD) and 6944 entries from the Northern Alberta Dataset (NAD). The data covers all of Alberta and is completely population-based. Our proposed framework is built around rule-based IE methods. It incorporates topics such as Syntax and Lexical analyses to achieve deterministic extraction of data from biomarker laboratory data (i.e., Epidermal Growth Factor Receptor (EGFR) test results). Lexical analysis compromises of data cleaning and pre-processing, Rich Text Format text conversion into readable plain text format, and normalization and tokenization of text. The framework then passes the text into the Syntax analysis stage which includes the rule-based method of extracting relevant data. Rule-based patterns of the test result are identified, and a Context Free Grammar then generates the rules of information extraction. Finally, the results are linked with the Alberta Cancer Registry to support real-world cancer research studies. Results Of the original 5512 entries in the SAD dataset and 5017 entries in the NAD dataset which were filtered for EGFR, the framework yielded 5129 and 3388 extracted EGFR test results from the SAD and NAD datasets, respectively. An accuracy of 97.5% was achieved on a random sample of 362 tests. Conclusions We presented a text analysis framework to extract specific information from unstructured clinical data. Our proposed framework has shown that it can successfully extract relevant information from EGFR test results.