Application of Machine Learning Algorithms to Actuarial Ratemaking within Property and Casualty Insurance

Arumugam, MohanaGowri

Application of Machine Learning Algorithms to Actuarial Ratemaking within Property and Casualty Insurance

dc.contributor.advisor	Ambagaspitiya, Rohana Shantha
dc.contributor.author	Arumugam, MohanaGowri
dc.contributor.committeemember	Lu, Xuewen
dc.contributor.committeemember	Scollnik, David Peter Michael
dc.contributor.committeemember	Kopciuk, Karen A
dc.contributor.committeemember	Bae, Taehan
dc.date	2024-05
dc.date.accessioned	2023-09-27T17:15:30Z
dc.date.available	2023-09-27T17:15:30Z
dc.date.issued	2023-09-19
dc.description.abstract	A scientific pricing assessment is essential for maintaining viable customer relationship management solutions (CRM) for various stakeholders including consumers, insurance intermediaries, and insurers. The thesis aims to examine research problems neighboring the ratemaking process, including relaxing the conventional loss model assumption of homogeneity and independence. The thesis identified three major research scopes within multiperil insurance settings: heterogeneity in consumer behaviour on pricing decisions, loss trending under non-linearity and temporal dependencies, and loss modelling in presence of inflationary pressure. Heterogeneous consumers on pricing decisions were examined using demand and loyalty-based strategy. A hybrid decision tree classification framework is implemented, that includes semi-supervised learning model, variable selection technique, and partitioning approach with different treatment effects in order to achieve adequate risk profiling. Also, the thesis explored a supervised tree learning mechanism under highly imbalanced overlap classes and having a non-linear response-predictors relationship. The two-phase classification framework is applied to an owner’s occupied property portfolio from a personal insurance brokerage powered by a digital platform within the Canadian market. The hybrid three-phase tree algorithm, which includes conditional inference trees, random forest wrapped by the Boruta algorithm, and model-based recursive partitioning under a multinomial generalized linear model, is proposed to study the price sensitivity ranking of digital consumers. The empirical results suggest a well-defined segmentation of digital consumers with differential price sensitivity. Further, with highly imbalanced and overlapped classes, the resampling technique was modelled together with the decision tree algorithm, providing a more scientific approach to overcome classification problems than the traditional multinomial regression. The resulting segmentation was able to identify the high-sensitivity consumers group, where premium rate reductions are recommended to reduce the churn rate. Consumers are classified as an insensitive group for which the price strategy to increase the premium rate is expected to have a slight impact on the closing ratio and retention rate. Insurance loss incurred greatly exhibits abnormal characteristics such as temporal dependence, nonlinear relationship between dependent and independent variables, seasonal variation, and mixture distribution resulting from the implicit claim inflation component. With such abnormal variable characteristics, the severity and frequency components may exhibit an altered trending pattern, that changes over time and never repeats. This could have a profound impact on the experience rating model, where the estimates of the pure premium and the rate relativity of tariff class are likely to be under or over-estimated. A discussion of the pros and cons of the conventional loss trending approach leads to an alternative framework for the loss cost structure. The conventional pure premium is further split into base severity and severity deflator random variables using a do(·) operator within causal inference. The components are separately modelled based on different time basis predictors using the semiparametric generalized additive model (GAM) with a spline curve. To maximize the claim inflation calendar year effect and improve the efficiency of severity trending, this thesis refines the claim inflation estimation by adapting Taylor’s [86] separation method that estimates the inflation index from a loss development triangle. In the second phase of developing the severity trend model, we integrated both the base severity and severity deflator under a new generalized mechanism known as Discount, Model, and Trend (DMT). The two-phase modelling was built to overcome the mixture distribution effect on final trend estimates. A simulation study constructed using the claims paid development triangle from a Canadian Insurtech broker’s houseowners/householders portfolio was used in a severity trend movement prediction analysis. We discovered that the conventional framework understated the severity trends more than the separation cum DMT framework. GAM provides a flexible and effective mechanism for modelling nonlinear time series in studies of the frequency loss trend. However, GAM assumes that residuals are independent and identically distributed (iid), while frequency loss time series can be correlated in adjacent time points. This thesis introduces a new model called Generalized Additive Model with Seasonal Autoregressive term (GAMSAR) that accounts for temporal dependency and seasonal variation in order to improve prediction confidence intervals. Parameters of the GAMSAR model are estimated by maximum partial likelihood using a modified Newton’s method developed by Yang et al. [97], and the goodness-of-fit between GAM, and GAMSAR is demonstrated using a simulation study. Simulation results show that the bias of the mean estimates from GAM differs greatly from their true value. The proposed GAMSAR model shows to be superior, especially in the presence of seasonal variation. Further, a comparison study is conducted between GAMSAR and Generalized Additive Model with Autoregressive term (GAMAR) developed by Yang et al. [97], and the coverage rate of 95% confidence interval confirms that the GAMSAR model has the ability to incorporate the nonlinear trend effects as well as capture the serial correlation between the observations. In the empirical analysis, a claim dataset of personal property insurance obtained from digital brokers in Canada is used to show that the GAMSAR(1)12 captures the periodic dependence structure of the data precisely compared to standard regression models. The proposed frequency severity trend models support the thesis’s goal of establishing a scientific approach to pricing that is robust under different trending processes.
dc.identifier.citation	Arumugam, M. (2023). Application of machine learning algorithms to actuarial ratemaking within property and casualty insurance (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.
dc.identifier.uri	https://hdl.handle.net/1880/117197
dc.identifier.uri	https://doi.org/10.11575/PRISM/42039
dc.language.iso	en
dc.publisher.faculty	Graduate Studies
dc.publisher.institution	University of Calgary
dc.rights	University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subject	Ratemaking
dc.subject	Property and Casualty Insurance
dc.subject	Machine Learning
dc.subject.classification	Artificial Intelligence
dc.subject.classification	Statistics
dc.title	Application of Machine Learning Algorithms to Actuarial Ratemaking within Property and Casualty Insurance
dc.type	doctoral thesis
thesis.degree.discipline	Mathematics & Statistics
thesis.degree.grantor	University of Calgary
thesis.degree.name	Doctor of Philosophy (PhD)
ucalgary.thesis.accesssetbystudent	I do not require a thesis withhold – my thesis will have open access and can be viewed and downloaded publicly as soon as possible.