Browsing by Author "Elahi, Tanvire"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Open Access A generic execution management framework for long running jobs in grid environments(2012) Elahi, Tanvire; Simmonds, Robert William John; Unger, Brian W.Over the last decade, the grid has emerged as a paradigm of distributed and collaborative computing focusing on the sharing of computational and storage resources spanning across geographical and organizational domains. Greater access to high-end computational facilities provides researchers fom a broad spectrum of domains an inexpensive option of carrying out sophisticated computational experiments. However, the inherent dynamics and heterogeneity of grid environments make the execution of resource and compute intensive applications a challenging task. Increasing fault tolerance by checkpointing and migrating jobs between resources requires significant expertise and intervention from users. Automation of such tasks can allow them to focus more on the scientific results and less on the technical details. This thesis addresses the issues associated with management of execution of long running applications in grid environments. It presents a generic framework for automating execution of such applications. The framework is driven by a set of information models that capture knowledge about the resources and the applications. Crucial to the functioning of the framework is information on two application characteristics: the configurability, and the memory usage behaviour. Separate models are presented to encode knowledge of both of these characteristics. Use of a common representation of knowledge abstracts the heterogeneity of both the resources and the applications and makes the framework functional without the need to be tailored to any specific application. Two important issues that need to be considered in managing job execution are the amount of memory required by the job and the wait time the job may experience on a specific resource. The framework presented in this thesis is equipped with mechanisms to address both of these issues. It is able to make estimations about the wait time for jobs with different resource requirements. A learning system has been designed as part of the framework to characterize the memory usage behaviour of application instances. The system facilitates execution management operations by providing accurate estimation of job's memory usage.Item Open Access A Generic Execution Management Framework for Scientific Applications(2010-07-09T16:17:31Z) Elahi, Tanvire; Kiddle, Cameron; Simmonds, RobManaging the execution of scientific applications in a heterogeneous grid computing environment can be a daunting task, particularly for long running jobs. Increasing fault tolerance by checkpointing and migrating jobs between resources requires expertise and time of the scientist. Automation of such tasks can allow the scientist to focus more on the scientific results and less on the technical details. In this paper a generic framework for managing and automating the execution of jobs is presented. It uses of a variety of information models describing systems, policies, and application details/requirements to make suitable decisions on where and how to run, checkpoint, migrate and reconfigure jobs as needed. To demonstrate the utility of the framework, it is used as part of a simulation study to assess the impact availability of application memory usage information has on meeting the QoS objectives of job submitters and on overall utilization of resources. The study shows that with greater availability of memory usage information, the execution management framework is able to better meet user objectives and improve utilization of resources, particularly when the objective is to make more efficient use of resources.