川掃除エキスパート誕生


cleanupのdefault設定の難度が高いことがネックになっているのではないか?
clean_up.yaml(一部省略)
env_kwargs:
num_agents: 7
shared_rewards: false
maxAppleGrowthRate: 0.05
thresholdDepletion: 0.4
thresholdRestoration: 0.0
dirtSpawnProbability: 0.5
delayStartOfDirtSpawning: 0
observe_others_rewards: false
map_ASCII:
- "HFFFHFFHFHFHFHFHFHFHHFHFFFHF"
- "HFHFHFFHFHFHFHFHFHFHHFHFFFHF"
- "HFFHFFHHFHFHFHFHFHFHHFHFFFHF"
- "HFHFHFFHFHFHFHFHFHFHHFHFFFHF"
- "HFFFFFFHFHFHFHFHFHFHHFHFFFHF"
- "==============+~FHHHHHHf===="
- " P P P ===+~SSf "
- " P P P <~Sf P "
- " P P<~S> "
- " P P <~S> P "
- " P <~S>P "
- " P P<~S> "
- " P <~S> P "
- " P P P P <~S> "
- "^T^T^T^T^T^T^T^T^T;~S,^T^T^T"
- "BBBBBBBBBBBBBBBBBBBssBBBBBBB"
- "BBBBBBBBBBBBBBBBBBBBBBBBBBBB"
- "BBBBBBBBBBBBBBBBBBBBBBBBBBBB"
- "BBBBBBBBBBBBBBBBBBBBBBBBBBBB"
2026-02-13 05:18:59 | INFO | update=238 | env_step=7798784 | reward_mean=0.0002
2026-02-13 05:19:02 | INFO | update=239 | env_step=7831552 | reward_mean=0.0000
2026-02-13 05:19:05 | INFO | update=240 | env_step=7864320 | reward_mean=0.0000
2026-02-13 05:19:07 | INFO | update=241 | env_step=7897088 | reward_mean=0.0000
2026-02-13 05:19:10 | INFO | update=242 | env_step=7929856 | reward_mean=0.0015
2026-02-13 05:19:13 | INFO | update=243 | env_step=7962624 | reward_mean=0.0017
2026-02-13 05:38:37 | INFO | update=301 | env_step=9863168 | reward_mean=0.0658
2026-02-13 05:38:39 | INFO | update=302 | env_step=9895936 | reward_mean=0.0674
2026-02-13 05:38:42 | INFO | update=303 | env_step=9928704 | reward_mean=0.0681
2026-02-13 05:38:45 | INFO | update=304 | env_step=9961472 | reward_mean=0.0704
2026-02-13 05:38:47 | INFO | update=305 | env_step=9994240 | reward_mean=0.0667
2026-02-13 00:35:53 | INFO | update=1 | env_step=32768 | reward_mean=0.0421
2026-02-13 00:35:56 | INFO | update=2 | env_step=65536 | reward_mean=0.0002
2026-02-13 00:35:58 | INFO | update=3 | env_step=98304 | reward_mean=0.0000
2026-02-13 00:46:58 | INFO | update=280 | env_step=9175040 | reward_mean=0.4254
2026-02-13 00:47:00 | INFO | update=281 | env_step=9207808 | reward_mean=0.4485
2026-02-13 00:47:02 | INFO | update=282 | env_step=9240576 | reward_mean=0.3858
uv run scripts/train.py env=clean_up algorithm=ippo env.env_kwargs.dirtSpawnProbability=0.25
uv run scripts/train.py env=clean_up algorithm=ippo env.env_kwargs.dirtSpawnProbability=0.25
dirtSpawnProbability(汚れの発生確率)
thresholdDepletion(リンゴが育たなくなる川の汚染度の閾値)
→ 学習によって報酬量がほとんど変動しない
→ 初期の数十ステップで得られるリンゴの量が増加しているだけ
→ (rndも同じような結果)
map_ASCII(マップ形状の変更)
→ 川とリンゴ畑を近づけることでエージェントが報酬発生の因果を学習
→ 非定常性の緩和(そのためobs_sizeを下げると報酬も下がる)
→ (カリキュラム学習には使えるかも?)