Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
Scheduling High Performance Data Mining Tasks on a Data Grid Environment
| |
|
Scheduling High Performance Data Mining Tasks on a Data Grid Environment
S. Orlando5, P. Palmerini5, 6, R. Perego6 and F. Silvestri6, 7
| (5) |
Dipartimento di Informatica, Universitá Ca’ Foscari, Venezia, Italy |
| (6) |
Istituto CNUCE, Consiglio Nazionale delle Ricerche (CNR), Pisa, Italy |
| (7) |
Dipartimento di Informatica, Universitá di Pisa, Italy |
Abstract
Increasingly the datasets used for data mining are becoming huge and physically distributed. Since the distributed knowledge
discovery process is both data and computational intensive, the Grid is a natural platform for deploying a high performance data mining service. The focus of this paper is on the core services
of such a Grid infrastructure. In particular we concentrate our attention on the design and implementation of specialized
broker aware of data source locations and resource needs of data mining tasks. Allocation and scheduling decisions are taken
on the basis of performance cost metrics and models that exploit knowledge about previous executions, and use sampling to
acquire estimate about execution behavior.
Fulltext Preview (Small, Large)
 References secured to subscribers.
|
|
|
|
|
|