Team strategy acquisition is one of the most important issues of multiagent systems, especially in an adversary environment.
RoboCup has been providing such an environment for AI and robotics researchers. A deliberative approach to the team strategy
acquisition seems useless in such a dynamic and hostile environment. This paper presents a learning method to acquire team
strategy from a viewpoint of coach who can change a combination of players each of which has a fixed policy. Assuming that
the opponent has the same choice for the team strategy but keeps the fixed strategy during one match, the coach estimates
the opponent team strategy (player’s combination) based on game progress (obtained and lost goals) and notification of the
opponent strategy just after each match. The trade-off between exploration and exploitation is handled by considering how
correct the expectation in each mode is. A case of 2 to 2 match was simulated and the final result (a class of the strongest
combinations) was applied to RoboCup-2000 competition.