Many applications in parallel processing have to traverse large, implicitly defined trees with irregular shape. The receiver
initiated load balancing algorithm
random polling has long been known to be very efficient for these problems in practice. For any ε > 0, we prove that its parallel execution
time is at most

with high probability, where
T
rout,
T
split and
T
atomic bound the time for sending a message, splitting a subproblem and finishing a small unsplittable subproblem respectively.
The
maximum splitting depth h is related to the depth of the computation tree. Previous work did not prove efficiency close to one and used less accurate
models. In particular, our machine model allows asynchronous communication with nonconstant message delays and does not assume
that communication takes place in rounds. This model is compatible with the LogP model.