A Generic Execution Management Framework for Scientific Applications

Journal Title
Journal ISSN
Volume Title
Managing the execution of scientific applications in a heterogeneous grid computing environment can be a daunting task, particularly for long running jobs. Increasing fault tolerance by checkpointing and migrating jobs between resources requires expertise and time of the scientist. Automation of such tasks can allow the scientist to focus more on the scientific results and less on the technical details. In this paper a generic framework for managing and automating the execution of jobs is presented. It uses of a variety of information models describing systems, policies, and application details/requirements to make suitable decisions on where and how to run, checkpoint, migrate and reconfigure jobs as needed. To demonstrate the utility of the framework, it is used as part of a simulation study to assess the impact availability of application memory usage information has on meeting the QoS objectives of job submitters and on overall utilization of resources. The study shows that with greater availability of memory usage information, the execution management framework is able to better meet user objectives and improve utilization of resources, particularly when the objective is to make more efficient use of resources.
Application Modelling, Grid Computing