Optimal Route Planning for Parking Enforcement Patrol using Reinforcement Learning
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
With the considerable population growth in cities, the need for sustainable and feasible parking enforcement solutions becomes increasingly important. A Parking enforcement solution involves finding optimal patrol policy for enforcement agents. A Patrol policy refers to a strategy or a plan for how patrols should be conducted in areas with potential violations to prevent violations and improve parking agency compliance. Given a comprehensive database about violation's distribution in different parking locations, we can incorporate an optimization model to find optimal patrol policies for different agents. However, in an environment in which we do not have such a complete database and also drivers change their attitude towards parking fee payments frequently, the effectiveness of a parking enforcement solution is measured by how it can effectively address the uncertainty existing in the number of violations for different locations. The effectiveness and efficiency of patrol enforcement algorithms have been argued in the literature. Still, the solution proposed in this study aims to tackle the problem using learning algorithms that were rarely used in previous works. We consider the problem of finding an optimal routing plan for the parking enforcement patrol vehicles when only partial data about the distribution of violations over the city is available. The decision maker faces the well-known exploration-exploitation trade-off, i.e., choosing the best route given the current information or trying new routes to gather data on potentially better routes. In the absence of a learning-based algorithm, an optimal patrol policy can only be considered as optimal regarding the current state of the environment's features but if the environment's features change, the previous solution is no longer optimal. A learning-based algorithm aims to learn the dynamic features of an environment and construct the optimal patrol policy according to the learned features. In this thesis, we first describe the problem and different approaches for the proposed problem; then, we propose a multi-arm bandit formulation and use reinforcement learning to sequentially generate routes to maximize the system's expected reward. Next, we will analyze the performance of our framework against the current patrol policies being conducted in the city. During this study, an interactive dashboard is developed and used throughout the study for spatially analyzing the distribution of violations across the city. This tool is adaptable for any agency looking into the spatial analysis of violation patterns. Our analytical findings indicate a potential increase in the observed number of violations with the implementation of this framework which leads to the agency's compliance improvement. In the final section, we will discuss the contribution and expected outcomes of the study in detail.