A Machine Learning-Based Approach for Predictive Analysis of Cost Growth in Heavy Industrial Construction Projects
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The construction industry spends billions of dollars on large-scale projects annually. These projects typically experience cost overruns, which differ across regions. For instance, Alberta's average cost growth is much higher than similar projects in the United States. This study focuses on extracting features that influence project cost growth at different phases of Alberta's construction projects, such as front-end planning, detailed engineering, procurement, construction, and commissioning. We analyzed a dataset with a number of features recorded for 139 projects, based in Alberta, between 2003 and 2019. This data is provided by the Construction Owners Association of Alberta (COAA), the Construction Industry Institute (CII) and the University of Calgary. The sample size is relatively small and high dimensional for conclusive analytics, however, the results are promising in developing useful methodologies. In this study, we first applied LASSO regression to reduce the number of features from 281 to 21 features. We then reduced the number of features to 16 based on calculating permutation feature importance using a random forest algorithm. Once we identified the features impacting project cost growth, we developed an interactive tool to illustrate permutation feature importance, partial dependence plot and the editing value of each feature alongside cost prediction. However, the extracted features are primarily from the last two phases of the projects. In order to cover the first three phases of the project, the domain expert recommends adding 29 features to the tool. The tool can help practitioners predict the cost growth of a new project based on the available data at each phase of the project, and see the impact of variations in different features on the overall project cost. The tool also provides more information about the models, and how each feature impacts the project cost growth, so practitioners can invest wisely to minimize the risk of cost overruns.