Introduce `dev-tools/harness.js` for testing/benchmarking #5370

scheibo · 2019-03-28T21:43:09Z

I wonder though if I should just get the testing logic committed (without any of the benchmark/profile/tracing/formatting options I keep bikeshedding) so that the harness can be used by others for looking for errors/debugging etc, and followup with my performance logic later?

Originally posted by @scheibo in #5278 (comment)

I keep finding new yaks to shave with benchmarking/profiling, but this works as is for testing.

scheibo · 2019-03-28T21:43:19Z

This could have caught the crashes from the server restarts today (provided it ran long enough and the PRNG was in its favor to give a mon 'Leech Seed' etc). I think to be more generally useful for smoke tests we should create a random team generator which doesn't aim to build 'good' mons, but instead aims for coverage - just making sure we have the best chance of playing battles with the most Pokemon/Moves/Abilities etc. Obviously we wouldn't be able to test each combination and interplay of factors, but just making sure we've used all 1000+ Pokemon and 1000+ moves would only take < 100 custom games (1000/12 with no team validation) which takes about 15s to run and would give us some confidence that we can at least avoid crashes. We could do this for each format (not just the random formats), including OMs and it would still take < 10 minutes. Definitely not something to run on each commit, but something that could be used to know if restarting the server will crash on battles on not.

Creating such a generator is out of scope for this PR, but where I think would be useful to take this on the testing front.

Slayer95 · 2019-03-28T21:45:06Z

I think to be more generally useful for smoke tests we should create a random team generator which doesn't aim to build 'good' mons, but instead aims for coverage

That's the Hackmons Cup generator.

scheibo · 2019-03-29T00:04:10Z

Made a strawman prototype team generator inspired by Hackmons Cup (~~doesn't quite work~~, not ready for review). We want to be sure to exhaust all possible Items/Moves/Pokemon/Abilities in the pools before repeating. For each generation, we can then pre-generate a bunch of random teams and create custom games (without species clause, EV checks, etc) to run through the harness. I'll need to add logic to the runner to allow it to play in non-random metas, but thats not very difficult and I'll do it in the followup CL with the team generation logic.

EDIT: 6e3f5e8 is my WIP smoke test using the generator and runner.
EDIT2: I realize I can go even further and hook into the RandomPlayerAI to get it to coordinate with the team generator to give preference to making choices it hasn't seen, WIP in 8961b45. It's still stochastic and not guaranteed to cover all scenarios as quickly as possible, but its better than completely random. I'll need to adjust Runner from this PR to take a factory method for creating its AI.

Also fixes an issue with error accounting in async mode.

scheibo · 2019-03-29T23:33:11Z

@Zarel - PTAL when you're feeling better. I'm worried you'll confuse this for the perma-WIP #5278 which used to have this title.

My 'smoke test project' (which will follow up on this PR) is going well - I've got it running on all the singles formats pretty well (logs). Doubles is proving to be troublesome (possibly a bug in the AI, given the logs), the pool sampling could maybe be improved (eg. 1516-1590, where it seems stuck not using one move for a while), and I need to fix logging/repeatability (in particular, when the bug occurs at the wrong level, like its doing now. Error logging/reproduction works well if the problem is in sim, less well if its in my generator/AI code :P). I reverted some of the fixes from yesterday's crashes and it catches them, which seems to validate my mission.

dev-tools/HARNESS.md

sim/battle-stream.ts

Slayer95 · 2019-03-30T03:42:40Z

>[slow battle] 2204ms - >start {"formatid":"digimondigimonshowdown","seed":[27939,54995,12592,54036]}
>player p1 {"name":"Bot 1","seed":[62304,1829,61481,48193]}
>player p2 {"name":"Bot 2","seed":[22370,56598,38891,14058]}
Run node dev-tools/harness 1 --format=digimondigimonshowdown --seed=49587,60385,24801,60692 to debug (optionally with --output and/or --input for more info):
Error: Push after end of read stream
at BattleStream.resolvePush (/home/ubuntu/workspace/data/Pokemon-Showdown/.lib-dist/streams.js:544:37)
at BattleStream.pushError (/home/ubuntu/workspace/data/Pokemon-Showdown/.lib-dist/streams.js:526:8)
at BattleStream._write (/home/ubuntu/workspace/data/Pokemon-Showdown/.sim-dist/battle-stream.js:63:10)
at BattleStream.write (/home/ubuntu/workspace/data/Pokemon-Showdown/.lib-dist/streams.js:726:15)
at ObjectReadWriteStream.write [as _write] (/home/ubuntu/workspace/data/Pokemon-Showdown/.sim-dist/battle-stream.js:195:12)
at ObjectReadWriteStream.write (/home/ubuntu/workspace/data/Pokemon-Showdown/.lib-dist/streams.js:726:15)
at RandomPlayerAI.choose (/home/ubuntu/workspace/data/Pokemon-Showdown/.sim-dist/battle-stream.js:300:15)
at RandomPlayerAI.receiveError (/home/ubuntu/workspace/data/Pokemon-Showdown/.sim-dist/examples/random-player-ai.js:36:8)
at RandomPlayerAI.receiveLine (/home/ubuntu/workspace/data/Pokemon-Showdown/.sim-dist/battle-stream.js:286:16)
at RandomPlayerAI.receive (/home/ubuntu/workspace/data/Pokemon-Showdown/.sim-dist/battle-stream.js:274:9)

This seems to still be happening

scheibo · 2019-03-30T03:45:36Z

@Slayer95 - thats the same error I'm seeing on #5370 (comment), and I thought I fixed it in the past on #5278. Maybe I didn't port everything relevant over from that PR.

Slayer95 · 2019-03-30T03:49:22Z

Oh, right! That's a Doubles format. Just posted to make sure you didn't miss it.

Zarel · 2019-03-30T04:04:06Z

Should I wait until the Doubles issue is fixed...?

scheibo · 2019-03-30T04:08:00Z

Should I wait until the Doubles issue is fixed...?

Its not like anything breaks if this gets committed as is, but its OK if you want to hold off, I'm working to fix the bug now.

scheibo · 2019-03-30T04:28:05Z

134fb8b is what fixed the push after end of stream last time. Commenting out the battle.destroy() fixes things as well (but obviously isn't the right answer)

Slayer95 · 2019-03-30T04:30:00Z

And this seems to still have memory issues (now running without --async after figuring out that --async wasn't actually sequential). Commit: 10d5a54 (it surely wouldn't get better with the newly added commits)

Notes

First peak is after running about half an hour. By then, the output has plenty of [slow battle] warnings.
Second peak is a short run
Screenshot taken during a 1000 battles run

PS. Run actually just finished, memory usage didn't get a chance to increase anymore, and it dropped to 40 MB afterwards as expected.

scheibo · 2019-03-30T05:08:10Z

https://gist.github.com/scheibo/84e0331f38f724646824a29776cb0f89 is a repro of the push after stream that can be fed into pokemon-showdown simulate-battle (though it materializes as a different error). The battle.inputLog annoyingly doesn't capture the issue at all, because a) the input log != what the exact inputs were and b) the battle is over when additional pushes are made.

I've also made RandomPlayerAI#receiveError log for debugging, because it seems like the AI is papering over issues with 'default' like I was worried it would (and like our tests were wont to do). I think maybe the request.active.length > 1 check which I copied from the original RandomPlayerAI to determine whether we're not in singles and thus targeting information is required or not is incorrect - doesn't it fail if we're the only Pokemon left for a player? Anyway, I have plenty of fun things to debug when I wake up tomorrow.

First peak is after running about half an hour.

512 MB

OK, so these are definitely conditions I didn't account for! I've never run it for more than 5 minutes (2000 battles), and my computers have 16GB and 64GB respectively. I will look into this.

Slayer95 · 2019-03-30T05:15:04Z

the request.active.length > 1 check which I copied from the original RandomPlayerAI to determine whether we're not in singles and thus targeting information is required or not is incorrect - doesn't it fail if we're the only Pokemon left for a playe

No, that check is fine.

scheibo · 2019-03-30T05:32:39Z

Ah, because it should just be [{}, null] when theres only one poke?

Slayer95 · 2019-03-30T05:34:44Z

No, active Pokémon cannot contain null anymore. You might see some null checks in the code though. That enables support for bringing only 1 Pokémon in Doubles battles, which is currently forbidden in cartridge.

scheibo · 2019-03-30T06:06:02Z

I really need to go to bed, but #5381 was a pretty simple fix. Unfortunately, my 'push after stream' error no longer repros with that (the battles play out successfully), but I need to fix the 'push after stream' error as well. For my own reference, for this PR I want to fix:

remove receiveError() default fallback. I originally punted, but at the minimum the AI should more strictly check the error its getting to detect if its 'trapped' or 'disabled' case, not just an invalid choice. I think it should also do the extra legwork to save the previous request and re-choose taking into account the error, but maybe I'll continue to punt on that.
fix push after stream error
come up with a better way of saving the true input log
memory issues @Slayer95 has identified

The AI doesn't look like it will handle triple/rotation/multi battles very well either, but that's out of scope for now.

In my smoke branch, I need

a way to repro runs better. Because of how the team generator + AI works, you have to rerun at the 'format' level instead of the 'battle' level, because its no longer the case the battle state and AI is fully determined from its starting seed - the state of the pools matters as well.
I need to special case options that only work with certain pokemon (z crystals with zMoveUsers/mega or ultra stones with megaEvolves/ are there any other moves/abilities/items that only work with certain pokemon?) because otherwise its very unlikely the smoke tests will be able to actually test mega evolution etc
fix team preview (need to choose 'team 2' instead of just '2', obviously)

dev-tools/harness.js

Slayer95 · 2019-03-30T09:50:04Z

but at the minimum the AI should more strictly check the error its getting to detect if its 'trapped' or 'disabled' case, not just an invalid choice.

I need to special case options that only work with certain pokemon (z crystals with zMoveUsers/mega or ultra stones with megaEvolves/ are there any other moves/abilities/items that only work with certain pokemon?) because otherwise its very unlikely the smoke tests will be able to actually test mega evolution etc

You don't need to worry about that. If there is a mismatch between item and Pokémon, the request has the canMegaEvo, etc. flags off. What you do have to check, however, is that no more than 1 Pokémon should try to mega-evolve/ultra-burst/z-move in the same turn.

scheibo · 2019-03-30T14:07:22Z

We need to listen to the |callback|cant| and |callback|trapped| messages.

Ah, was not aware of those. I don't think theyre documented in SIM-PROTOCOL.md either. :S

What you do have to check, however, is that no more than 1 Pokémon should try to mega-evolve/ultra-burst/z-move in the same turn.

We already handle this, I think? canMegaEvo, canZMove, canUltraBurst guard against this - though maybe there's an edge case where we could try to Ultra + Mega in the same turn - is that allowed?

You don't need to worry about that.

I'm not worried about the AI not handling mega evolution properly (it does it correctly already), I'm worried that in my smoke branch it doesnt e\ver get the opportunity to mega evolve because the teams it generates will rarely if ever have the correct combinations.

Zarel · 2019-03-30T14:36:13Z

I keep on forgetting |callback|cant and |callback|trapped still exist. Shouldn't they be choice errors by now? Are they still used by anything? They should probably be removed entirely.

dev-tools/harness.js

scheibo · 2019-04-01T19:06:46Z

eb9e76e removes Promise.race which should remove the memory issues until the V8 regression @Slayer95 identified gets fixed (awesome work on that BTW!). The dangling promises aren't that big an issue given the harness includes a RejectionTracker which will catch those issues anyway (just in a less direct way).

Once #5394 lands, all outstanding issues will have been fixed, at which point @Zarel should be able to merge this and I can send out my next PR for the more elaborate smoke testing code (which already found an honest-to-goodness crash! Yay!)

Zarel · 2019-04-01T19:28:39Z

This pull request introduces 2 alerts when merging f767526 into ac5c0a5 - view on LGTM.com

new alerts:

2 for Unused variable, import, function or class

Comment posted by LGTM.com

scheibo · 2019-04-01T19:48:09Z

2 for Unused variable, import, function or class

There's a /* eslint-disable no-unused-vars */ and our linter passes... why does LGTM not respect that?

Slayer95 · 2019-04-02T20:29:18Z

why does LGTM not respect that?

Because LGTM is not ESLint ;)
You'd need to add // lgtm at the end of the line

scheibo · 2019-04-02T21:19:09Z

Because LGTM is not ESLint ;)

I should have rephrased as "I think LGTM should understand ESLint directives given its widely used and having to ignore the same violation two different ways is obnoxious"

dev-tools/harness.js

Zarel · 2019-04-02T22:00:45Z

This pull request introduces 2 alerts when merging f393bc7 into 2978546 - view on LGTM.com

new alerts:

2 for Unused variable, import, function or class

Comment posted by LGTM.com

Slayer95 · 2019-04-02T22:09:31Z

It no longer reports the error locations, but it still counts them. LGTM

EDIT: It should probably be // lgmt[js/unused-local-variable] ?

scheibo · 2019-04-02T22:25:29Z

What does LGTM buy us that Travis running our linter and tests doesn't?

Zarel · 2019-04-02T22:52:38Z

This pull request introduces 2 alerts when merging 3669dc1 into a53d6a8 - view on LGTM.com

new alerts:

2 for Unused variable, import, function or class

Comment posted by LGTM.com

Slayer95 · 2019-04-02T23:05:22Z

Who knows !

scheibo · 2019-04-02T23:09:29Z

OMFG ITS STILL COMPLAINING. (ノಠ益ಠ)ノ彡┻━┻

EDIT: @Zarel, if you know how to shut LGTM up, its ready to merge now that you've committed #5394.

Zarel · 2019-04-03T06:02:36Z

This pull request introduces 2 alerts when merging f242844 into 0f82410 - view on LGTM.com

new alerts:

2 for Unused variable, import, function or class

Comment posted by LGTM.com

Zarel · 2019-04-03T13:16:29Z

dev-tools/harness.js

+// Tracks whether some promises threw errors that weren't caught so we can log
+// and exit with a non-zero status to fail any tests. This "shouldn't happen"
+// because we're "great at propagating promises (TM)", but better safe than sorry.
+const RejectionTracker = {


Hm, you recently switched this from new class { to {. Want to talk about your reasoning?

I'm still undecided which is better. I'm leaning towards new class { because it feels more natural in TypeScript.

There's also class with literally everything static. Which is also an option I'm undecided about.

#5370 (comment) is what prompted it. IMO, an object is the simplest way - static or new class just add extra noise.

I'm also happy to follow up and bikeshed this to look however you want, but I have several PRs blocked on this landing so it would be awesome if we could merge first and argue over the insignificant bits later :)

Zarel · 2019-04-03T17:09:25Z

@scheibo, LGTM is a static code analyzer, it finds flaws that our linter and tests can't. I've had it dig up a few bugs that even TSLint isn't capable of seeing, bugs involving variables being guaranteed to be a specific value at a specific point.

After repeated complaints I thought I got them to support ESLint syntax for unused variables, though?

Zarel · 2019-04-03T17:12:41Z

Also, the "introduces 2 alerts" message is counting suppressed alerts. They won't show up in the main LGTM page; it's just here so you can't slip bad code past me, I think.

Introduce dev-tools/harness.js for testing/benchmarking

6a47ffd

Make Runner/MultiRunner split to allow for non-random formats

07cd48d

Also fixes an issue with error accounting in async mode.

scheibo added a commit to pkmn-archive/pokemon-showdown that referenced this pull request Mar 29, 2019

Working smoke test, needs polish and for smogon#5370 to land

10d5a54

Zarel reviewed Mar 30, 2019

View reviewed changes

dev-tools/HARNESS.md Outdated Show resolved Hide resolved

Zarel reviewed Mar 30, 2019

View reviewed changes

sim/battle-stream.ts Outdated Show resolved Hide resolved

Zarel reviewed Mar 30, 2019

View reviewed changes

sim/battle-stream.ts Outdated Show resolved Hide resolved

@Zarel review

c45ff0e

scheibo mentioned this pull request Mar 30, 2019

Handle targeting adjacentFoe and adjacentAllyOrSelf in RandomPlayerAI #5381

Merged

Slayer95 reviewed Mar 30, 2019

View reviewed changes

dev-tools/harness.js Outdated Show resolved Hide resolved

Slayer95 reviewed Mar 30, 2019

View reviewed changes

dev-tools/harness.js Show resolved Hide resolved

scheibo mentioned this pull request Mar 30, 2019

Make sure BattleStream only calls _destroy once #5382

Merged

Slayer95 reviewed Mar 31, 2019

View reviewed changes

dev-tools/harness.js Outdated Show resolved Hide resolved

scheibo added 2 commits April 1, 2019 11:37

Remove Promise.race

eb9e76e

Add smoke test for the smoke test harness :)

f767526

Merge branch 'master' into just-harness

9c270ec

scheibo added 3 commits April 2, 2019 14:14

Merge branch 'master' into just-harness

9a964f4

Export rejection tracker, support arbitrary AI

64ba5e6

Add lgtm crap

a9d93b5

Slayer95 reviewed Apr 2, 2019

View reviewed changes

dev-tools/harness.js Outdated Show resolved Hide resolved

Make RejectionTracker a singleton

f393bc7

Fucking LGTM

3669dc1

Try suppressing LGTM

f242844

Zarel reviewed Apr 3, 2019

View reviewed changes

gg LGTM

800a0e5

Zarel merged commit ece3228 into smogon:master Apr 3, 2019

scheibo mentioned this pull request Apr 5, 2019

Remove |callback| in favor of |error| + updated request #5414

Closed

scheibo deleted the just-harness branch April 19, 2019 00:22

Introduce dev-tools/harness.js for testing/benchmarking #5370

Introduce dev-tools/harness.js for testing/benchmarking #5370

Conversation

scheibo commented Mar 28, 2019

scheibo commented Mar 28, 2019

Slayer95 commented Mar 28, 2019

scheibo commented Mar 29, 2019 • edited Loading

scheibo commented Mar 29, 2019

Slayer95 commented Mar 30, 2019 • edited Loading

scheibo commented Mar 30, 2019 • edited Loading

Slayer95 commented Mar 30, 2019

Zarel commented Mar 30, 2019

scheibo commented Mar 30, 2019

scheibo commented Mar 30, 2019

Slayer95 commented Mar 30, 2019 • edited Loading

scheibo commented Mar 30, 2019

Slayer95 commented Mar 30, 2019

scheibo commented Mar 30, 2019

Slayer95 commented Mar 30, 2019

scheibo commented Mar 30, 2019 • edited Loading

Slayer95 commented Mar 30, 2019 • edited Loading

scheibo commented Mar 30, 2019

Zarel commented Mar 30, 2019

scheibo commented Apr 1, 2019

Zarel commented Apr 1, 2019

scheibo commented Apr 1, 2019

Slayer95 commented Apr 2, 2019

scheibo commented Apr 2, 2019

Zarel commented Apr 2, 2019

Slayer95 commented Apr 2, 2019 • edited Loading

scheibo commented Apr 2, 2019 • edited Loading

Zarel commented Apr 2, 2019

Slayer95 commented Apr 2, 2019

scheibo commented Apr 2, 2019 • edited Loading

Zarel commented Apr 3, 2019

Zarel Apr 3, 2019

Choose a reason for hiding this comment

Zarel Apr 3, 2019

Choose a reason for hiding this comment

scheibo Apr 3, 2019 • edited Loading

Choose a reason for hiding this comment

scheibo Apr 3, 2019

Choose a reason for hiding this comment

Zarel commented Apr 3, 2019

Zarel commented Apr 3, 2019

Introduce `dev-tools/harness.js` for testing/benchmarking #5370

Introduce `dev-tools/harness.js` for testing/benchmarking #5370

scheibo commented Mar 29, 2019 •

edited

Loading

Slayer95 commented Mar 30, 2019 •

edited

Loading

scheibo commented Mar 30, 2019 •

edited

Loading

Slayer95 commented Mar 30, 2019 •

edited

Loading

scheibo commented Mar 30, 2019 •

edited

Loading

Slayer95 commented Mar 30, 2019 •

edited

Loading

Slayer95 commented Apr 2, 2019 •

edited

Loading

scheibo commented Apr 2, 2019 •

edited

Loading

scheibo commented Apr 2, 2019 •

edited

Loading

scheibo Apr 3, 2019 •

edited

Loading