Coordination is an important issue in multi-agent systems when agents want to maximize their revenue. Often coordination is
achieved through communication, however communication has its price. We are interested in finding an approach where the communication
between the agents is kept low, and a global optimal behavior can still be found.
In this paper we report on an efficient approach that allows independent reinforcement learning agents to reach a Pareto optimal
Nash equilibrium with limited communication. The communication happens at regular time steps and is basicallya signal for
the agents to start an exploration phase. During each exploration phase, some agents exclude their current best action so
as to give the team the opportunityto look for a possiblyb etter Nash equilibrium. This technique of reducing the action space
byexclusions was onlyrecen tlyin troduced for finding periodical policies in games of conflicting interests. Here, we explore
this technique in repeated common interest games with deterministic or stochastic outcomes.