Lockyer, JocelynNasir, Mona2014-10-072014-11-172014-10-072014http://hdl.handle.net/11023/1917Abstract Background Multiple choice questions are used worldwide for summative assessment in undergraduate medical education. Only a few studies have looked at their reliability using both classical test theory and item response theory. The main aim of this research was to use examination data from the summative multiple choice exams at the University of Calgary in order to assess the reliability of scores using and comparing two methods of analysis, i.e., classical test theory and item response theory, on items administered three times over a six year period. In addition, the temporal stability of the same items was also analyzed using both classical test theory and item response theory. Methods Three courses were chosen for the item analysis. Thirty items from each course over a period of three years were scrutinized for reliability by conducting an item analysis using SPSS and Xcalibre 4.2. Item difficulty and discrimination indices were calculated using both classical test and the 2 parameter logistic model of item response theory. Correlation coefficients were calculated for all three years to analyze the relationship between the two measurement methods and also the inter-year correlation for the three years using both classical test and item response theory. Cronbach’s Alpha was calculated to look at the reliability of the scores. Furthermore, item characteristic curves were generated using Xcalibre 4.2. Repeated measures analysis of variance was conducted for the item parameters of both classical test and item response theory and test characteristic curves generated year-wise for the multiple choice items for a 2 parameter logistic model which were then compared across the years to assess the stability of the multiple choice items over time. Results Difficulty was found to be adequate for half the items when classical test theory was applied and for two thirds of the items when item response theory was used. Discrimination was mostly fair to adequate with classical test theory and excellent with item response theory. Standard error of measure was noted to vary from small to large for the item parameters of different items, the reliability index being 0.56- 0.65 for the test scores across the years. Correlation coefficients were excellent between Year 1 and 3 and only fair for Year 2 when compared with the other two. Correlation coefficients between classical test and item response theory were excellent. Items were noted to be stable across the three years using repeated measures analysis of variance which yielded small F ratios thus exhibiting stability of item difficulty and discrimination over Times 1, 2 and 3. Visual inspection of the test characteristic curves yielded the same findings. Conclusion Multiple choice questions used by the University of Calgary over a period of three years have been shown to be fairly reliable and stable over time with different samples of students. Some differences were noted in the item analysis carried out by the two different methods ( i.e., classical test and item response theory) but mostly the two measurement methods were comparable. Some items need reviewing and revision to further improve the reliability of the exam following which the multiple choice items may be used repeatedly without affecting their psychometric properties.engUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.Medicine and SurgeryItem Response TheoryClassical Test TheoryReliabilityTemporal StabilityItem AnalysisApplication of Classical Test Theory and Item Response Theory to Analyze Multiple Choice Questionsdoctoral thesis10.11575/PRISM/24958