Simmonds, Robert William JohnUnger, Brian W.Elahi, Tanvire2017-12-182017-12-182012Elahi, T. (2012). A generic execution management framework for long running jobs in grid environments (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/4747http://hdl.handle.net/1880/105748Bibliography: p. 188-202Over the last decade, the grid has emerged as a paradigm of distributed and collaborative computing focusing on the sharing of computational and storage resources spanning across geographical and organizational domains. Greater access to high-end computational facilities provides researchers fom a broad spectrum of domains an inexpensive option of carrying out sophisticated computational experiments. However, the inherent dynamics and heterogeneity of grid environments make the execution of resource and compute intensive applications a challenging task. Increasing fault tolerance by checkpointing and migrating jobs between resources requires significant expertise and intervention from users. Automation of such tasks can allow them to focus more on the scientific results and less on the technical details. This thesis addresses the issues associated with management of execution of long running applications in grid environments. It presents a generic framework for automating execution of such applications. The framework is driven by a set of information models that capture knowledge about the resources and the applications. Crucial to the functioning of the framework is information on two application characteristics: the configurability, and the memory usage behaviour. Separate models are presented to encode knowledge of both of these characteristics. Use of a common representation of knowledge abstracts the heterogeneity of both the resources and the applications and makes the framework functional without the need to be tailored to any specific application. Two important issues that need to be considered in managing job execution are the amount of memory required by the job and the wait time the job may experience on a specific resource. The framework presented in this thesis is equipped with mechanisms to address both of these issues. It is able to make estimations about the wait time for jobs with different resource requirements. A learning system has been designed as part of the framework to characterize the memory usage behaviour of application instances. The system facilitates execution management operations by providing accurate estimation of job's memory usage.xii, 202 leaves : ill. ; 30 cm.engUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.A generic execution management framework for long running jobs in grid environmentsdoctoral thesis10.11575/PRISM/4747