The Internet has witnessed an explosive increase in the popularity of Peer-to-Peer (P2P) file-sharing applications during
the past few years. As these applications become more popular, it becomes increasingly important to characterize their behavior
in order to improve their performance and quantify their impact on the network. In this paper, we present a measurement study
on characteristics of available files in the modern Gnutella system. We develop two new methodologies to capture accurate
snapshots of available files in a large-scale P2P system. These methodologies were implemented in a parallel crawler that
captures the entire overlay topology of the system where each peer in the overlay is annotated with its available files. We
have captured more than 50 snapshots of the Gnutella system that span over 1 year period. Using these snapshots, we conduct
three types of analysis on available files: (1) Static analysis, (2) Topological analysis, and (3) dynamic analysis. Our results
reveal several interesting properties of available files in Gnutella that can be leveraged to improve the design and evaluation
of P2P file-sharing applications.
This paper extends and supplants the earlier version of this paper presented at MMCN 2006 [1]. This material is based upon
work supported by the National Science Foundation (NSF) under Grant No. Nets-NBD-0627202, CAREER Award CNS-0448639, and an
unrestricted gift from Cisco Systems. Any opinions, findings, and conclusions or recommendations expressed in this material
are those of the authors and do not necessarily reflect the views of the NSF or Cisco.