Skip to content

Commit 89d4e0c

Browse files
authored
Add MsPacman agents (#274)
* Add MsPacman agents * Allow to force custom objects * Update changelog * Fix loading issues
1 parent c2f00ea commit 89d4e0c

File tree

11 files changed

+745
-9
lines changed

11 files changed

+745
-9
lines changed

CHANGELOG.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
1-
## Release 1.5.1a8 (WIP)
1+
## Release 1.6.0 (2022-08-05)
22

33
### Breaking Changes
44
- Change default value for number of hyperparameter optimization trials from 10 to 500. (@ernestum)
55
- Derive number of intermediate pruning evaluations from number of time steps (1 evaluation per 100k time steps.) (@ernestum)
66
- Updated default --eval-freq from 10k to 25k steps
77
- Update default horizon to 2 for the `HistoryWrapper`
8+
- Upgrade to Stable-Baselines3 (SB3) >= 1.6.0
9+
- Upgrade to sb3-contrib >= 1.6.0
810

911
### New Features
1012
- Support setting PyTorch's device with thye `--device` flag (@gregwar)
@@ -14,6 +16,7 @@
1416
- Added `RecurrentPPO` support (aka `ppo_lstm`)
1517
- Added autodownload for "official" sb3 models from the hub
1618
- Added Humanoid-v3, Ant-v3, Walker2d-v3 models for A2C (@pseudo-rnd-thoughts)
19+
- Added MsPacman models
1720

1821
### Bug fixes
1922
- Fix `Reacher-v3` name in PPO hyperparameter file

README.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -331,7 +331,7 @@ The previous command will create a `mp4` file. To convert this file to `gif` for
331331
python -m utils.record_training --algo ppo --env CartPole-v1 -n 1000 -f logs --deterministic --gif
332332
```
333333

334-
## Current Collection: 150+ Trained Agents!
334+
## Current Collection: 195+ Trained Agents!
335335

336336
Final performance of the trained agents can be found in [`benchmark.md`](./benchmark.md). To compute them, simply run `python -m utils.benchmark`.
337337

@@ -354,10 +354,10 @@ Additional Atari Games (to be completed):
354354

355355
| RL Algo | MsPacman | Asteroids | RoadRunner |
356356
|----------|-------------|-----------|------------|
357-
| A2C | | :heavy_check_mark: | :heavy_check_mark: |
358-
| PPO | | :heavy_check_mark: | :heavy_check_mark: |
359-
| DQN | | :heavy_check_mark: | :heavy_check_mark: |
360-
| QR-DQN | | :heavy_check_mark: | :heavy_check_mark: |
357+
| A2C | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
358+
| PPO | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
359+
| DQN | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
360+
| QR-DQN | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
361361

362362

363363
### Classic Control Environments

benchmark.md

+4
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ and also allow users to have access to pretrained agents.*
3939
|a2c |LunarLanderContinuous-v2 | 84.225| 145.906|5M | 149305| 256|
4040
|a2c |MountainCar-v0 | -111.263| 24.087|1M | 149982| 1348|
4141
|a2c |MountainCarContinuous-v0 | 91.166| 0.255|100k | 149923| 1659|
42+
|a2c |MsPacmanNoFrameskip-v4 | 1671.730| 612.918|10M | 602450| 185|
4243
|a2c |Pendulum-v1 | -162.965| 103.210|1M | 150000| 750|
4344
|a2c |PongNoFrameskip-v4 | 17.292| 3.214|10M | 594910| 65|
4445
|a2c |QbertNoFrameskip-v4 | 3882.345| 1223.327|10M | 610670| 194|
@@ -77,6 +78,7 @@ and also allow users to have access to pretrained agents.*
7778
|dqn |EnduroNoFrameskip-v4 | 830.929| 194.544|10M | 599040| 14|
7879
|dqn |LunarLander-v2 | 154.382| 79.241|100k | 149373| 200|
7980
|dqn |MountainCar-v0 | -100.849| 9.925|120k | 149962| 1487|
81+
|dqn |MsPacmanNoFrameskip-v4 | 2682.929| 492.567|10M | 599952| 140|
8082
|dqn |PongNoFrameskip-v4 | 20.602| 0.613|10M | 598998| 88|
8183
|dqn |QbertNoFrameskip-v4 | 9496.774| 5399.633|10M | 605844| 124|
8284
|dqn |RoadRunnerNoFrameskip-v4 | 40396.350| 7069.131|10M | 603257| 137|
@@ -100,6 +102,7 @@ and also allow users to have access to pretrained agents.*
100102
|ppo |LunarLanderContinuous-v2 | 270.863| 32.072|1M | 149956| 526|
101103
|ppo |MountainCar-v0 | -110.423| 19.473|1M | 149954| 1358|
102104
|ppo |MountainCarContinuous-v0 | 88.343| 2.572|20k | 149983| 633|
105+
|ppo |MsPacmanNoFrameskip-v4 | 1754.356| 172.783|10M | 600822| 163|
103106
|ppo |Pendulum-v1 | -172.225| 104.159|100k | 150000| 750|
104107
|ppo |PongNoFrameskip-v4 | 20.989| 0.105|10M | 599902| 90|
105108
|ppo |QbertNoFrameskip-v4 | 15627.108| 3313.538|10M | 600248| 83|
@@ -122,6 +125,7 @@ and also allow users to have access to pretrained agents.*
122125
|qrdqn |EnduroNoFrameskip-v4 | 3231.200| 1311.801|10M | 585728| 5|
123126
|qrdqn |LunarLander-v2 | 70.236| 225.491|100k | 149957| 522|
124127
|qrdqn |MountainCar-v0 | -106.042| 15.536|120k | 149943| 1414|
128+
|qrdqn |MsPacmanNoFrameskip-v4 | 997.867| 877.130|10M | 604914| 225|
125129
|qrdqn |PongNoFrameskip-v4 | 20.492| 0.687|10M | 597443| 63|
126130
|qrdqn |QbertNoFrameskip-v4 | 14799.728| 2917.629|10M | 600773| 92|
127131
|qrdqn |RoadRunnerNoFrameskip-v4 | 42325.424| 8361.161|10M | 591016| 59|

enjoy.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,10 @@ def main(): # noqa: C901
6262
parser.add_argument(
6363
"--env-kwargs", type=str, nargs="+", action=StoreDict, help="Optional keyword argument to pass to the env constructor"
6464
)
65+
parser.add_argument(
66+
"--custom-objects", action="store_true", default=False, help="Use custom objects to solve loading issues"
67+
)
68+
6569
args = parser.parse_args()
6670

6771
# Going through custom gym packages to let them register in the global registory
@@ -170,7 +174,7 @@ def main(): # noqa: C901
170174
newer_python_version = sys.version_info.major == 3 and sys.version_info.minor >= 8
171175

172176
custom_objects = {}
173-
if newer_python_version:
177+
if newer_python_version or args.custom_objects:
174178
custom_objects = {
175179
"learning_rate": 0.0,
176180
"lr_schedule": lambda _: 0.0,
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
#{"t_start": 1659728336.7725544, "env_id": "MsPacmanNoFrameskip-v4"}
2+
r,l,t
3+
1730.0,3234,3.953365
4+
1560.0,3298,5.427331
5+
1100.0,2714,6.639803
6+
2360.0,3634,8.255448
7+
1950.0,3874,9.985612
8+
2450.0,4090,11.803499
9+
1070.0,3074,13.174083
10+
1710.0,2530,14.298565
11+
4470.0,4194,16.174039
12+
1790.0,2874,17.454673
13+
2030.0,3162,18.860609
14+
1600.0,2266,19.866352
15+
2440.0,3610,21.472004
16+
1150.0,2210,22.459134
17+
1530.0,3554,24.041418
18+
1270.0,2626,25.208103
19+
1280.0,2482,26.314844
20+
1680.0,3226,27.752215
21+
1100.0,2514,28.87338
22+
2360.0,5218,31.201759
23+
1200.0,2970,32.53076
24+
960.0,3418,34.053116
25+
1730.0,2722,35.263485
26+
1520.0,4170,37.123587
27+
1660.0,2682,38.316556
28+
1580.0,2946,39.626874
29+
1960.0,3530,41.199417
30+
1170.0,2530,42.328518
31+
2130.0,3890,44.061271
32+
1910.0,3970,45.832457
33+
1810.0,3050,47.194604
34+
2430.0,4418,49.168339
35+
2320.0,2938,50.474335
36+
2120.0,4906,52.668199
37+
1360.0,2850,53.939555
38+
1020.0,2346,54.978707
39+
2280.0,3698,56.63467
40+
1560.0,3866,58.354663
41+
1200.0,3082,59.732336
42+
1180.0,3834,61.443044
43+
2100.0,3586,63.045426
44+
1010.0,2754,64.285646
45+
1240.0,2674,65.485508
46+
2990.0,3370,66.994512
47+
1290.0,2466,68.149448
48+
1440.0,3130,69.61241
49+
1560.0,2458,70.763192
50+
1700.0,3106,72.160394
51+
2130.0,3522,73.813967
52+
4950.0,3890,75.72441
53+
1950.0,2978,77.180707
54+
2030.0,4090,79.102359
55+
1550.0,3322,80.647296
56+
1530.0,3626,82.287508
57+
1090.0,3162,83.792252
58+
2030.0,3442,85.417475
59+
1400.0,2906,86.786747
60+
1560.0,3106,88.266386
61+
1210.0,2658,89.512441
62+
2770.0,4034,91.376633
63+
1200.0,3242,92.889384
64+
3110.0,4722,95.071795
65+
1060.0,2738,96.366151
66+
1310.0,3338,97.908794
67+
1810.0,3218,99.350722
68+
2580.0,3018,100.764254
69+
1780.0,3906,102.538961
70+
1630.0,2954,103.861392
71+
1340.0,2978,105.20175
72+
1160.0,2330,106.260769
73+
1650.0,3818,108.327378
74+
1100.0,2818,109.85005
75+
950.0,2346,111.110954
76+
900.0,2234,112.313941
77+
1670.0,3858,114.393967
78+
2230.0,4506,116.827441
79+
1340.0,2786,118.330728
80+
3030.0,3738,120.362621
81+
2090.0,3714,122.375252
82+
1240.0,3218,124.109298
83+
1220.0,3170,125.812876
84+
1310.0,2634,127.262277
85+
1570.0,3194,128.965139
86+
1290.0,3058,130.362771
87+
1820.0,2850,131.608693
88+
1950.0,3842,133.281789
89+
1430.0,2722,134.451039
90+
2100.0,3586,136.02444
91+
1750.0,3594,137.63478
92+
1590.0,2706,138.844255
93+
1500.0,3570,140.421348
94+
1670.0,4362,142.362482
95+
1490.0,3314,143.837664
96+
730.0,2874,145.113736
97+
2590.0,4802,147.192036
98+
1230.0,2850,148.438278
99+
1800.0,3650,150.012589
100+
3370.0,3370,151.468725
101+
1190.0,2610,152.591411
102+
1810.0,4946,154.727196
103+
2520.0,4290,156.59002
104+
1380.0,2874,157.903495
105+
2490.0,3810,159.663447
106+
1670.0,2714,160.8516
107+
1500.0,3954,162.555276
108+
1570.0,3690,164.145591
109+
1690.0,3738,165.75474
110+
1550.0,4538,167.75505
111+
1650.0,3562,169.36602
112+
1260.0,2970,170.68833
113+
1670.0,2874,171.928043
114+
1940.0,3018,173.302168
115+
1030.0,2682,174.551034
116+
1890.0,3618,176.189759
117+
1160.0,3034,177.564327
118+
1680.0,3266,179.01814
119+
1840.0,3506,180.649786
120+
1070.0,2538,181.787977
121+
2030.0,3938,183.56093
122+
2960.0,4634,185.557277
123+
1920.0,3410,187.089478
124+
1620.0,3066,188.49645
125+
1260.0,2466,189.63132
126+
1030.0,2914,190.900608
127+
1740.0,2834,192.218688
128+
2340.0,3682,193.901766
129+
1110.0,2762,195.125856
130+
1190.0,2786,196.371832
131+
1820.0,3306,197.829656
132+
1930.0,3010,199.169468
133+
1200.0,2434,200.343077
134+
1380.0,3802,202.276492
135+
1790.0,2810,203.649479
136+
1980.0,4170,205.609311
137+
1990.0,3226,207.144824
138+
1700.0,4266,209.257385
139+
1010.0,2658,210.532377
140+
1240.0,2586,211.735485
141+
1580.0,3186,213.318766
142+
1090.0,2762,214.718967
143+
1370.0,3194,216.254075
144+
1770.0,4986,218.678091
145+
1140.0,2306,219.77412
146+
1470.0,3538,221.694823
147+
870.0,2026,222.677573
148+
1710.0,3362,224.210367
149+
2410.0,5522,226.616171
150+
1280.0,2698,227.791895
151+
1150.0,2714,228.979634
152+
1400.0,3610,230.616868
153+
1630.0,3730,232.327563
154+
1750.0,2914,233.723598
155+
1150.0,2626,234.917321
156+
1730.0,3642,236.556935
157+
1200.0,3098,237.904774
158+
1430.0,3842,239.581436
159+
1350.0,2730,240.753289
160+
1470.0,4874,242.852802
161+
1760.0,3170,244.236137
162+
1550.0,2890,245.477554
163+
1580.0,3298,246.899028
164+
1310.0,2866,248.130993
165+
1590.0,3066,249.452029
166+
480.0,1762,250.20387
167+
2260.0,3842,251.851681
168+
1930.0,4330,253.715691
169+
1110.0,2866,254.948357
170+
1120.0,2498,256.028346
171+
1960.0,2610,257.199669
172+
1030.0,2482,258.326828
173+
2170.0,2178,259.308928
174+
1890.0,3242,260.714789
175+
1210.0,2706,261.879737
176+
1670.0,3738,263.499191
177+
1130.0,2138,264.428535
178+
1840.0,3546,265.977581
179+
1840.0,3010,267.331547
180+
1400.0,3298,268.879427
181+
2490.0,3042,270.281823
182+
1180.0,2386,271.379688
183+
4340.0,3738,273.109107
184+
1220.0,3106,274.53357
185+
1260.0,2714,275.779621
186+
1310.0,2794,277.063536
187+
1040.0,2674,278.295469

logs/benchmark/benchmark.md

+4
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ and also allow users to have access to pretrained agents.*
3939
|a2c |LunarLanderContinuous-v2 | 84.225| 145.906|5M | 149305| 256|
4040
|a2c |MountainCar-v0 | -111.263| 24.087|1M | 149982| 1348|
4141
|a2c |MountainCarContinuous-v0 | 91.166| 0.255|100k | 149923| 1659|
42+
|a2c |MsPacmanNoFrameskip-v4 | 1671.730| 612.918|10M | 602450| 185|
4243
|a2c |Pendulum-v1 | -162.965| 103.210|1M | 150000| 750|
4344
|a2c |PongNoFrameskip-v4 | 17.292| 3.214|10M | 594910| 65|
4445
|a2c |QbertNoFrameskip-v4 | 3882.345| 1223.327|10M | 610670| 194|
@@ -77,6 +78,7 @@ and also allow users to have access to pretrained agents.*
7778
|dqn |EnduroNoFrameskip-v4 | 830.929| 194.544|10M | 599040| 14|
7879
|dqn |LunarLander-v2 | 154.382| 79.241|100k | 149373| 200|
7980
|dqn |MountainCar-v0 | -100.849| 9.925|120k | 149962| 1487|
81+
|dqn |MsPacmanNoFrameskip-v4 | 2682.929| 492.567|10M | 599952| 140|
8082
|dqn |PongNoFrameskip-v4 | 20.602| 0.613|10M | 598998| 88|
8183
|dqn |QbertNoFrameskip-v4 | 9496.774| 5399.633|10M | 605844| 124|
8284
|dqn |RoadRunnerNoFrameskip-v4 | 40396.350| 7069.131|10M | 603257| 137|
@@ -100,6 +102,7 @@ and also allow users to have access to pretrained agents.*
100102
|ppo |LunarLanderContinuous-v2 | 270.863| 32.072|1M | 149956| 526|
101103
|ppo |MountainCar-v0 | -110.423| 19.473|1M | 149954| 1358|
102104
|ppo |MountainCarContinuous-v0 | 88.343| 2.572|20k | 149983| 633|
105+
|ppo |MsPacmanNoFrameskip-v4 | 1754.356| 172.783|10M | 600822| 163|
103106
|ppo |Pendulum-v1 | -172.225| 104.159|100k | 150000| 750|
104107
|ppo |PongNoFrameskip-v4 | 20.989| 0.105|10M | 599902| 90|
105108
|ppo |QbertNoFrameskip-v4 | 15627.108| 3313.538|10M | 600248| 83|
@@ -122,6 +125,7 @@ and also allow users to have access to pretrained agents.*
122125
|qrdqn |EnduroNoFrameskip-v4 | 3231.200| 1311.801|10M | 585728| 5|
123126
|qrdqn |LunarLander-v2 | 70.236| 225.491|100k | 149957| 522|
124127
|qrdqn |MountainCar-v0 | -106.042| 15.536|120k | 149943| 1414|
128+
|qrdqn |MsPacmanNoFrameskip-v4 | 997.867| 877.130|10M | 604914| 225|
125129
|qrdqn |PongNoFrameskip-v4 | 20.492| 0.687|10M | 597443| 63|
126130
|qrdqn |QbertNoFrameskip-v4 | 14799.728| 2917.629|10M | 600773| 92|
127131
|qrdqn |RoadRunnerNoFrameskip-v4 | 42325.424| 8361.161|10M | 591016| 59|

0 commit comments

Comments
 (0)