We consider the learning problem faced by two self-interested agents repeatedly playing a general-sum stage game. We assume
that the players can observe each other’s actions but not the payoffs received by the other player. The concept of Nash Equilibrium
in repeated games provides an individually rational solution for playing such games and can be achieved by playing the Nash
Equilibrium strategy for the single-shot game in every iteration. Such a strategy, however can sometimes lead to a Pareto-Dominated
outcome for games like Prisoner’s Dilemma. So we prefer learning strategies that converge to a Pareto-Optimal outcome that
also produces a Nash Equilibrium payoff for repeated two-player, n-action general-sum games. The Folk Theorem enable us to
identify such outcomes. In this paper, we introduce the Conditional Joint Action Learner (CJAL) which learns the conditional
probability of an action taken by the opponent given its own actions and uses it to decide its next course of action. We empirically
show that under self-play and if the payoff structure of the Prisoner’s Dilemma game satisfies certain conditions, a CJAL
learner, using a random exploration strategy followed by a completely greedy exploitation technique, will learn to converge
to a Pareto-Optimal solution. We also show that such learning will generate Pareto-Optimal payoffs in a large majority of
other two-player general sum games. We compare the performance of CJAL with that of existing algorithms such as WOLF-PHC and
JAL on all structurally distinct two-player conflict games with ordinal payoffs.
Keywords Multiagent learning - Game theory - Prisoner’s dilemma