論文紹介

	相手 C	相手 D
自分 C	(R, R) = (3,3)	(S, T) = (0,4)
自分 D	(T, S) = (4,0)	(P, P) = (1,1)

学習の遷移

What facilitates the transition to the next stage is the following: (1) agents who cooperate are selected to play more frequently than defecting agents (and, therefore, are given the opportunity to potentially receive rewards); and (2) with enough exploration, cooperation can be sufficiently rewarded and agents can start to learn to punish agents who would try to exploit them.
次の段階への移行を促進する要因は次の通りである：(1) 協力するエージェントは裏切るエージェントよりも頻繁に選択され（したがって報酬を得る機会が増える），(2) 十分な探索があれば協力は十分に報酬化され，エージェントは自分を搾取しようとするエージェントを罰することを学習し始める.

論文紹介

紹介する論文

Partner Selection for the Emergence of Cooperation in Multi-Agent Systems Using Reinforcement Learning

背景と問い

パートナー選択

環境設定

囚人のジレンマの利得例

状態と行動

報酬設計

学習構造

学習の遷移

実験観察 (学習が進む順序)

成果と示唆

制約と今後

研究の狙い

環境・行動空間

学習で現れた規則

課題と展望

補足: 2つの研究の違い

論文紹介

紹介する論文

Partner Selection for the Emergence of Cooperation in Multi-Agent Systems Using Reinforcement Learning

背景と問い

パートナー選択

環境設定

囚人のジレンマの利得例

状態と行動

報酬設計

学習構造

学習の遷移

実験観察 (学習が進む順序)

成果と示唆

制約と今後

Learning Partner Selection Rules that Sustain Cooperation in Social Dilemmas with the Option of Opting Out

研究の狙い

環境・行動空間

学習で現れた規則

課題と展望

補足: 2つの研究の違い