The parallelization of MCTS across multiple-machines has proven surprisingly difficult. The limitations of existing algorithms
were evident in the 2009 Computer Olympiad where Zen using a single four-core machine defeated both Fuego with ten eight-core machines, and Mogo with twenty thirty-two core machines. This paper investigates the limits of parallel MCTS in order to understand why distributed
parallelism has proven so difficult and to pave the way towards future distributed algorithms with better scaling. We first
analyze the single-threaded scaling of Fuego and find that there is an upper bound on the play-quality improvements which can come from additional search. We then analyze
the scaling of an idealized N-core shared memory machine to determine the maximum amount of parallelism supported by MCTS.
We show that parallel speedup depends critically on how much time is given to each player. We use this relationship to predict
parallel scaling for time scales beyond what can be empirically evaluated due to the immense computation required. Our results
show that MCTS can scale nearly perfectly to at least 64 threads when combined with virtual loss, but without virtual loss
scaling is limited to just eight threads. We also find that for competition time controls scaling to thousands of threads
is impossible not necessarily due to MCTS not scaling, but because high levels of parallelism can start to bump up against
the upper performance bound of Fuego itself.