Lightweight threads can be used to cover latencies occurring in distributed-memory implementations of declarative programming
languages. To keep the costs of thread management low, this paper proposes two techniques: first, to distinguish locally scheduled
threads from globally distributed tasks; and second, to create both threads and tasks lazily. The paper focuses on the integration of these techniques into compiled
graph-reduction, which was neglected by other researchers; in particular, their approach prohibits both tail call optimization
and the use of the push-enter model of function evaluation.