Strategizing against No-regret Learners
Yuan Deng, Jon Schneider, Balusubramanian Sivan
一个反复与一个无后悔的学习者进行游戏的玩家应该如何制定策略,以最大限度地发挥他的效用? 我们研究这个问题,并表明,在一些温和的假设下,玩家总是可以保证自己至少在游戏的Stackelberg平衡中得到什么的效用。 当无后悔学习者只有两个动作时,我们表明玩家不能获得比Stackelberg均衡实用程序更高的效用。 但是,当无后悔学习者有两个以上的动作并扮演基于均值的无遗憾策略时,我们表明玩家可以获得比Stackelberg均衡实用程序更高的严格要求。 我们为玩家提供了最佳游戏玩法的表征,以对抗基于均值的无后悔学习者,作为控制问题的解决方案。 当无后悔学习者的策略也保证他无交换的遗憾时,我们表明玩家不能得到比斯塔克伯格均衡实用程序更高的任何东西。
How should a player who repeatedly plays a game against a no-regret learner strategize to maximize his utility? We study this question and show that under some mild assumptions, the player can always guarantee himself a utility of at least what he would get in a Stackelberg equilibrium of the game. When the no-regret learner has only two actions, we show that the player cannot get any higher utility than the Stackelberg equilibrium utility. But when the no-regret learner has more than two action...