We will investigate a number of algorithmic problems that arise in large networks such as the world-wide web. We will mostly
concentrate on the following problems.
General Caching Problems: Caching is a very well-studied problem. Consider a two-level memory system consisting of a small fast memory, that can store
up to k bits, and a large slow memory, that can store potentially infinitely many bits. The goal is to serve a sequence of memory
accesses with low total cost. In standard caching it is assumed that all memory pages have a uniform size and a uniform fault
cost. In a general caching problem, however, pages or documents have varying sizes and varying costs. This problem arises, among other places, in the cache design for networked file systems
or the world-wide web. In the web, for instance, the transmission time of a web page depends on the location of the corresponding
server and also on transient conditions such as the server load or network congestion.
The offline variant of general caching problems is NP-hard. Irani [7] recently presented polynomial time algorithms that achieve an approximation ratio of O(log(k/s)), where s is the size of the smallest document ever requested. We present polynomial time constant factor approximation algorithms
that use a small amount of additional memory. The values of the approximation ratios depend on the exact cost model assumed.
In the most general setting we achieve a (4 + ε)-approximation, for any ε > 0. The main idea of our solutions is to formulate
caching problems as integer programs and solve linear relaxations. A fractional solution is transformed into a feasible solution
using a new rounding technique. Our results were published in [2].
Management of persistent TCP connections: Communication between clients and servers in the web is performed using HTTP (Hyper Text Transfer Protocol), which in turn
uses TCP (Transmission Control Protocol) to transfer data. If data has to be transmitted between two network nodes, then there
has to exist an open TCP connection between these two nodes. While the earlier HTTP/1.0 opened and closed a separate TCP connection
for each transmission request, the new HTTP/1.1. permits persistent connections, i.e. a TCP connection may be kept open and idle until a new transmission request arrive or the connection is explicitly
closed by the server or the client. The problem is to maintain a limited number of open TCP connections at each network node,
not knowing which connections will be required in the future.
Cohen et al. [4,5] recently initiated the study of connection caching in the web and presented optimal competitive online algorithms assuming
that the establishment cost is uniform for all the connections. They also analyzed various models of communication. We investigate
the setting where connections may incur different establishment costs and present online algorithms that achieve an optimal
competitiveness. Our algorithms use extra communication between network nodes while managing open connections. We can develop
refined algorithms that allow tradeoffs between extra communication and competitive performance. We also consider problem
extensions where connections may have time-out values or asymmetric establishment costs. The results appeared in [1]
Dynamic TCP acknowledgment: Consider a sequence of data packets that arrives at a network node over an open TCP connection. The node has to acknowledge
the receipt of the packets by sending acknowledgments to the sending site. Most implementations of TCP use some acknowledgment
delay mechanism, i.e. multiple incoming data packets are acknowledged with a single acknowledgment. A reduction of the number
of acknowledgments sent leads to a smaller network congestion and to a smaller overhead incurred in sending and receiving
acknowledgments. On the other hand, by sending fewer acknowledgments, we increase the latency of the TCP connection. The goal
is to acknowledge dynamically a sequence of data packets, that arrives over time, so that the number of acknowledgments and
the acknowledgment delays for the individual packets is as small as possible.
The study of dynamic TCP acknowledgment was initiated by Dooly et al. [6]. They considered the objective function of minimizing the sum of the number of acknowledgments and the total acknowledgment
delays for all the packets. They presented deterministic online algorithms that achieve an optimal competitive ratio of 2.
Recently Karlin et al. [8] gave randomized algorithms that achieve a competitiveness of e/(e™1) ≈ 1.58. We consider a different objective function that penalizes long acknowledgment delays. In practice if a data packet
is not acknowledged within a certain amount of time, the packet is resent by the sending site, which increases again the network
congestion. We investigate an objective function that minimizes the sum of the number of acknowledgments and the maximum delay
that ever occurs for any of the data packets. We present optimal online algorithms [3].