Volume 4, Number 1, 19-31, DOI: 10.1007/s10723-005-9016-2

Simultaneous Scheduling of Replication and Computation for Data-Intensive Applications on the Grid

Frédéric Desprez and Antoine Vernois

View Related Documents

Abstract

Managing large datasets has become one major application of Grids. Life science applications usually manage large databases that should be replicated to scale applications. The growing number of users and the simple access to Internet-based application has stressed Grid middleware. Such environment are thus asked to manage data and schedule computation tasks at the same time. These two important operations have to be tightly coupled. This paper presents an algorithm (Scheduling and Replication Algorithm, SRA) that combines data management and scheduling using a steady-state approach. Using a model of the platform, the number of requests as well as their distribution, the number and size of databases, we define a linear program to satisfy all the constraints at every level of the platform in steady-state. The solution of this linear program will give us a placement for the databases on the servers as well as providing, for each kind of job, the server on which they should be executed. Our theoretical results are validated using simulation and logs from a large life science application.

Key words  bioinformatics applications - data management - Grid computing - scheduling

This work was supported in part by the ACI GRID and Grid5000 projects of the French Department of Research.

Fulltext Preview

Image of the first page of the fulltext document