Institutional Login
Welcome!
To use the personalized features of this site, please
log in
or
register
.
If you have forgotten your username or password, we can
help
.
My Menu
Marked Items
Alerts
Order History
Saved Items
All
Favorites
Content Types
All
Publications
Journals
Book Series
Books
Reference Works
Protocols
Subject Collections
Architecture and Design
Behavioral Science
Biomedical and Life Sciences
Business and Economics
Chemistry and Materials Science
Computer Science
Earth and Environmental Science
Engineering
Humanities, Social Sciences and Law
Mathematics and Statistics
Medicine
Physics and Astronomy
Professional and Applied Computing
中文(简体)
中文(繁體)
English
Deutsch
한국어
日本語
Français
Español
العربية
Русский
Book Chapter
Distributed Checkpointing on Clusters with Dynamic Striping and Staggering
Book Series
Lecture Notes in Computer Science
Publisher
Springer Berlin / Heidelberg
ISSN
0302-9743 (Print) 1611-3349 (Online)
Volume
Volume 2550/2002
Book
Advances in Computing Science — ASIAN 2002
DOI
10.1007/3-540-36184-7
Copyright
2002
ISBN
978-3-540-00195-9
DOI
10.1007/3-540-36184-7_4
Pages
19-33
Subject Collection
Computer Science
SpringerLink Date
Tuesday, January 01, 2002
Add to marked items
Add to shopping cart
Add to saved items
Permissions & Reprints
Recommend this chapter
PDF (197.1 KB)
Free Preview
Distributed Checkpointing on Clusters with Dynamic Striping and Staggering
Hai Jin
5
and Kai Hwang
6
(5)
Huazhong University of Science and Technology, 430074 Wuhan, China
(6)
University of Southern California, 90007 Los Angeles, USA
Abstract
This paper presents a new striped and staggered checkpointing (SSC) scheme for multicomputer clusters. We consider serverless clusters, where local disks attached to cluster nodes collectively form a distributed RAID (redundant array of inexpensive disks) with a single I/O space. The distributed RAID is used to save the checkpoint files periodically. Striping enables parallel I/O on distributed disks. Staggering avoids network bottleneck in distributed disk I/O operations. With a fixed cluster size, we reveal the tradeoffs between these two speedup techniques. Our SSC approach allows dynamical reconfiguration to minimize message-logging requirements among concurrent software processes. We demonstrate how to reduce the checkpointing overhead by striping and staggering dynamically. For communication-intensive programs, our SCC scheme can significantly reduce the checkpointing overhead. Benchmark results prove the benefits of trading between stripe parallelism and distributed staggering. These results are useful to design efficient checkpointing schemes for fast rollback recovery from any single node (disk) failure in a cluster of computers.
Hai
Jin
Email:
hjin@hust.edu.cn
Fulltext Preview (Small,
Large
)
References secured to subscribers.
more options
Find
Query Builder
Close
|
Clear
Title (ti)
Summary (su)
Author (au)
ISSN (issn)
ISBN (isbn)
DOI (doi)
And
Or
Not
(
)
* (wildcard)
"" (exact)
Within all content
Within this book series
Within this book
Export this chapter
Export this chapter as
RIS
|
Text
Frequently asked questions
|
General information on journals and books
|
Send us your feedback
|
Impressum
|
Contact
© Springer.
Part of Springer Science+Business Media
Privacy, Disclaimer, Terms and Conditions, © Copyright Information
MetaPress Privacy Policy
Remote Address: 38.107.191.106 • Server: mpweb02
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)