Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce dev-tools/harness.js for testing/benchmarking #5370

Merged
merged 17 commits into from
Apr 3, 2019

Conversation

scheibo
Copy link
Contributor

@scheibo scheibo commented Mar 28, 2019

I wonder though if I should just get the testing logic committed (without any of the benchmark/profile/tracing/formatting options I keep bikeshedding) so that the harness can be used by others for looking for errors/debugging etc, and followup with my performance logic later?

Originally posted by @scheibo in #5278 (comment)

I keep finding new yaks to shave with benchmarking/profiling, but this works as is for testing.

@scheibo
Copy link
Contributor Author

scheibo commented Mar 28, 2019

This could have caught the crashes from the server restarts today (provided it ran long enough and the PRNG was in its favor to give a mon 'Leech Seed' etc). I think to be more generally useful for smoke tests we should create a random team generator which doesn't aim to build 'good' mons, but instead aims for coverage - just making sure we have the best chance of playing battles with the most Pokemon/Moves/Abilities etc. Obviously we wouldn't be able to test each combination and interplay of factors, but just making sure we've used all 1000+ Pokemon and 1000+ moves would only take < 100 custom games (1000/12 with no team validation) which takes about 15s to run and would give us some confidence that we can at least avoid crashes. We could do this for each format (not just the random formats), including OMs and it would still take < 10 minutes. Definitely not something to run on each commit, but something that could be used to know if restarting the server will crash on battles on not.

Creating such a generator is out of scope for this PR, but where I think would be useful to take this on the testing front.

@Slayer95
Copy link
Contributor

I think to be more generally useful for smoke tests we should create a random team generator which doesn't aim to build 'good' mons, but instead aims for coverage

That's the Hackmons Cup generator.

@scheibo
Copy link
Contributor Author

scheibo commented Mar 29, 2019

Made a strawman prototype team generator inspired by Hackmons Cup (doesn't quite work, not ready for review). We want to be sure to exhaust all possible Items/Moves/Pokemon/Abilities in the pools before repeating. For each generation, we can then pre-generate a bunch of random teams and create custom games (without species clause, EV checks, etc) to run through the harness. I'll need to add logic to the runner to allow it to play in non-random metas, but thats not very difficult and I'll do it in the followup CL with the team generation logic.

EDIT: 6e3f5e8 is my WIP smoke test using the generator and runner.
EDIT2: I realize I can go even further and hook into the RandomPlayerAI to get it to coordinate with the team generator to give preference to making choices it hasn't seen, WIP in 8961b45. It's still stochastic and not guaranteed to cover all scenarios as quickly as possible, but its better than completely random. I'll need to adjust Runner from this PR to take a factory method for creating its AI.

Also fixes an issue with error accounting in async mode.
scheibo added a commit to pkmn-archive/pokemon-showdown that referenced this pull request Mar 29, 2019
@scheibo
Copy link
Contributor Author

scheibo commented Mar 29, 2019

@Zarel - PTAL when you're feeling better. I'm worried you'll confuse this for the perma-WIP #5278 which used to have this title.


My 'smoke test project' (which will follow up on this PR) is going well - I've got it running on all the singles formats pretty well (logs). Doubles is proving to be troublesome (possibly a bug in the AI, given the logs), the pool sampling could maybe be improved (eg. 1516-1590, where it seems stuck not using one move for a while), and I need to fix logging/repeatability (in particular, when the bug occurs at the wrong level, like its doing now. Error logging/reproduction works well if the problem is in sim, less well if its in my generator/AI code :P). I reverted some of the fixes from yesterday's crashes and it catches them, which seems to validate my mission.

@Slayer95
Copy link
Contributor

Slayer95 commented Mar 30, 2019

>[slow battle] 2204ms - >start {"formatid":"digimondigimonshowdown","seed":[27939,54995,12592,54036]}
>player p1 {"name":"Bot 1","seed":[62304,1829,61481,48193]}
>player p2 {"name":"Bot 2","seed":[22370,56598,38891,14058]}
Run node dev-tools/harness 1 --format=digimondigimonshowdown --seed=49587,60385,24801,60692 to debug (optionally with --output and/or --input for more info):
Error: Push after end of read stream
at BattleStream.resolvePush (/home/ubuntu/workspace/data/Pokemon-Showdown/.lib-dist/streams.js:544:37)
at BattleStream.pushError (/home/ubuntu/workspace/data/Pokemon-Showdown/.lib-dist/streams.js:526:8)
at BattleStream._write (/home/ubuntu/workspace/data/Pokemon-Showdown/.sim-dist/battle-stream.js:63:10)
at BattleStream.write (/home/ubuntu/workspace/data/Pokemon-Showdown/.lib-dist/streams.js:726:15)
at ObjectReadWriteStream.write [as _write] (/home/ubuntu/workspace/data/Pokemon-Showdown/.sim-dist/battle-stream.js:195:12)
at ObjectReadWriteStream.write (/home/ubuntu/workspace/data/Pokemon-Showdown/.lib-dist/streams.js:726:15)
at RandomPlayerAI.choose (/home/ubuntu/workspace/data/Pokemon-Showdown/.sim-dist/battle-stream.js:300:15)
at RandomPlayerAI.receiveError (/home/ubuntu/workspace/data/Pokemon-Showdown/.sim-dist/examples/random-player-ai.js:36:8)
at RandomPlayerAI.receiveLine (/home/ubuntu/workspace/data/Pokemon-Showdown/.sim-dist/battle-stream.js:286:16)
at RandomPlayerAI.receive (/home/ubuntu/workspace/data/Pokemon-Showdown/.sim-dist/battle-stream.js:274:9)

This seems to still be happening

@scheibo
Copy link
Contributor Author

scheibo commented Mar 30, 2019

@Slayer95 - thats the same error I'm seeing on #5370 (comment), and I thought I fixed it in the past on #5278. Maybe I didn't port everything relevant over from that PR.

@Slayer95
Copy link
Contributor

Oh, right! That's a Doubles format. Just posted to make sure you didn't miss it.

@Zarel
Copy link
Member

Zarel commented Mar 30, 2019

Should I wait until the Doubles issue is fixed...?

@scheibo
Copy link
Contributor Author

scheibo commented Mar 30, 2019

Should I wait until the Doubles issue is fixed...?

Its not like anything breaks if this gets committed as is, but its OK if you want to hold off, I'm working to fix the bug now.

@scheibo
Copy link
Contributor Author

scheibo commented Mar 30, 2019

134fb8b is what fixed the push after end of stream last time. Commenting out the battle.destroy() fixes things as well (but obviously isn't the right answer)

@Slayer95
Copy link
Contributor

Slayer95 commented Mar 30, 2019

And this seems to still have memory issues (now running without --async after figuring out that --async wasn't actually sequential). Commit: 10d5a54 (it surely wouldn't get better with the newly added commits)

History

Notes

  1. First peak is after running about half an hour. By then, the output has plenty of [slow battle] warnings.
  2. Second peak is a short run
  3. Screenshot taken during a 1000 battles run

PS. Run actually just finished, memory usage didn't get a chance to increase anymore, and it dropped to 40 MB afterwards as expected.

@scheibo
Copy link
Contributor Author

scheibo commented Mar 30, 2019

https://gist.github.com/scheibo/84e0331f38f724646824a29776cb0f89 is a repro of the push after stream that can be fed into pokemon-showdown simulate-battle (though it materializes as a different error). The battle.inputLog annoyingly doesn't capture the issue at all, because a) the input log != what the exact inputs were and b) the battle is over when additional pushes are made.

I've also made RandomPlayerAI#receiveError log for debugging, because it seems like the AI is papering over issues with 'default' like I was worried it would (and like our tests were wont to do). I think maybe the request.active.length > 1 check which I copied from the original RandomPlayerAI to determine whether we're not in singles and thus targeting information is required or not is incorrect - doesn't it fail if we're the only Pokemon left for a player? Anyway, I have plenty of fun things to debug when I wake up tomorrow.

First peak is after running about half an hour.

512 MB

OK, so these are definitely conditions I didn't account for! I've never run it for more than 5 minutes (2000 battles), and my computers have 16GB and 64GB respectively. I will look into this.

@Slayer95
Copy link
Contributor

the request.active.length > 1 check which I copied from the original RandomPlayerAI to determine whether we're not in singles and thus targeting information is required or not is incorrect - doesn't it fail if we're the only Pokemon left for a playe

No, that check is fine.

@scheibo
Copy link
Contributor Author

scheibo commented Mar 30, 2019

Ah, because it should just be [{}, null] when theres only one poke?

@Slayer95
Copy link
Contributor

No, active Pokémon cannot contain null anymore. You might see some null checks in the code though. That enables support for bringing only 1 Pokémon in Doubles battles, which is currently forbidden in cartridge.

@scheibo
Copy link
Contributor Author

scheibo commented Mar 30, 2019

I really need to go to bed, but #5381 was a pretty simple fix. Unfortunately, my 'push after stream' error no longer repros with that (the battles play out successfully), but I need to fix the 'push after stream' error as well. For my own reference, for this PR I want to fix:

  • remove receiveError() default fallback. I originally punted, but at the minimum the AI should more strictly check the error its getting to detect if its 'trapped' or 'disabled' case, not just an invalid choice. I think it should also do the extra legwork to save the previous request and re-choose taking into account the error, but maybe I'll continue to punt on that.
  • fix push after stream error
  • come up with a better way of saving the true input log
  • memory issues @Slayer95 has identified

The AI doesn't look like it will handle triple/rotation/multi battles very well either, but that's out of scope for now.

In my smoke branch, I need

  • a way to repro runs better. Because of how the team generator + AI works, you have to rerun at the 'format' level instead of the 'battle' level, because its no longer the case the battle state and AI is fully determined from its starting seed - the state of the pools matters as well.
  • I need to special case options that only work with certain pokemon (z crystals with zMoveUsers/mega or ultra stones with megaEvolves/ are there any other moves/abilities/items that only work with certain pokemon?) because otherwise its very unlikely the smoke tests will be able to actually test mega evolution etc
  • fix team preview (need to choose 'team 2' instead of just '2', obviously)

@Slayer95
Copy link
Contributor

Slayer95 commented Mar 30, 2019

but at the minimum the AI should more strictly check the error its getting to detect if its 'trapped' or 'disabled' case, not just an invalid choice.

We need to listen to the |callback|cant| and |callback|trapped| messages.

I need to special case options that only work with certain pokemon (z crystals with zMoveUsers/mega or ultra stones with megaEvolves/ are there any other moves/abilities/items that only work with certain pokemon?) because otherwise its very unlikely the smoke tests will be able to actually test mega evolution etc

You don't need to worry about that. If there is a mismatch between item and Pokémon, the request has the canMegaEvo, etc. flags off. What you do have to check, however, is that no more than 1 Pokémon should try to mega-evolve/ultra-burst/z-move in the same turn.

@scheibo
Copy link
Contributor Author

scheibo commented Mar 30, 2019

We need to listen to the |callback|cant| and |callback|trapped| messages.

Ah, was not aware of those. I don't think theyre documented in SIM-PROTOCOL.md either. :S

What you do have to check, however, is that no more than 1 Pokémon should try to mega-evolve/ultra-burst/z-move in the same turn.

We already handle this, I think? canMegaEvo, canZMove, canUltraBurst guard against this - though maybe there's an edge case where we could try to Ultra + Mega in the same turn - is that allowed?

You don't need to worry about that.

I'm not worried about the AI not handling mega evolution properly (it does it correctly already), I'm worried that in my smoke branch it doesnt e\ver get the opportunity to mega evolve because the teams it generates will rarely if ever have the correct combinations.

@Zarel
Copy link
Member

Zarel commented Mar 30, 2019

I keep on forgetting |callback|cant and |callback|trapped still exist. Shouldn't they be choice errors by now? Are they still used by anything? They should probably be removed entirely.

@scheibo
Copy link
Contributor Author

scheibo commented Apr 1, 2019

eb9e76e removes Promise.race which should remove the memory issues until the V8 regression @Slayer95 identified gets fixed (awesome work on that BTW!). The dangling promises aren't that big an issue given the harness includes a RejectionTracker which will catch those issues anyway (just in a less direct way).

Once #5394 lands, all outstanding issues will have been fixed, at which point @Zarel should be able to merge this and I can send out my next PR for the more elaborate smoke testing code (which already found an honest-to-goodness crash! Yay!)

@Zarel
Copy link
Member

Zarel commented Apr 1, 2019

This pull request introduces 2 alerts when merging f767526 into ac5c0a5 - view on LGTM.com

new alerts:

  • 2 for Unused variable, import, function or class

Comment posted by LGTM.com

@scheibo
Copy link
Contributor Author

scheibo commented Apr 1, 2019

2 for Unused variable, import, function or class

There's a /* eslint-disable no-unused-vars */ and our linter passes... why does LGTM not respect that?

@Slayer95
Copy link
Contributor

Slayer95 commented Apr 2, 2019

why does LGTM not respect that?

Because LGTM is not ESLint ;)
You'd need to add // lgtm at the end of the line

@scheibo
Copy link
Contributor Author

scheibo commented Apr 2, 2019

Because LGTM is not ESLint ;)

I should have rephrased as "I think LGTM should understand ESLint directives given its widely used and having to ignore the same violation two different ways is obnoxious"

@Zarel
Copy link
Member

Zarel commented Apr 2, 2019

This pull request introduces 2 alerts when merging f393bc7 into 2978546 - view on LGTM.com

new alerts:

  • 2 for Unused variable, import, function or class

Comment posted by LGTM.com

@Slayer95
Copy link
Contributor

Slayer95 commented Apr 2, 2019

It no longer reports the error locations, but it still counts them. LGTM

EDIT: It should probably be // lgmt[js/unused-local-variable] ?

@scheibo
Copy link
Contributor Author

scheibo commented Apr 2, 2019

What does LGTM buy us that Travis running our linter and tests doesn't?

@Zarel
Copy link
Member

Zarel commented Apr 2, 2019

This pull request introduces 2 alerts when merging 3669dc1 into a53d6a8 - view on LGTM.com

new alerts:

  • 2 for Unused variable, import, function or class

Comment posted by LGTM.com

@Slayer95
Copy link
Contributor

Slayer95 commented Apr 2, 2019

Who knows !

@scheibo
Copy link
Contributor Author

scheibo commented Apr 2, 2019

OMFG ITS STILL COMPLAINING. (ノಠ益ಠ)ノ彡┻━┻

EDIT: @Zarel, if you know how to shut LGTM up, its ready to merge now that you've committed #5394.

@Zarel
Copy link
Member

Zarel commented Apr 3, 2019

This pull request introduces 2 alerts when merging f242844 into 0f82410 - view on LGTM.com

new alerts:

  • 2 for Unused variable, import, function or class

Comment posted by LGTM.com

// Tracks whether some promises threw errors that weren't caught so we can log
// and exit with a non-zero status to fail any tests. This "shouldn't happen"
// because we're "great at propagating promises (TM)", but better safe than sorry.
const RejectionTracker = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, you recently switched this from new class { to {. Want to talk about your reasoning?

I'm still undecided which is better. I'm leaning towards new class { because it feels more natural in TypeScript.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also class with literally everything static. Which is also an option I'm undecided about.

Copy link
Contributor Author

@scheibo scheibo Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#5370 (comment) is what prompted it. IMO, an object is the simplest way - static or new class just add extra noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also happy to follow up and bikeshed this to look however you want, but I have several PRs blocked on this landing so it would be awesome if we could merge first and argue over the insignificant bits later :)

@Zarel Zarel merged commit ece3228 into smogon:master Apr 3, 2019
@Zarel
Copy link
Member

Zarel commented Apr 3, 2019

@scheibo, LGTM is a static code analyzer, it finds flaws that our linter and tests can't. I've had it dig up a few bugs that even TSLint isn't capable of seeing, bugs involving variables being guaranteed to be a specific value at a specific point.

After repeated complaints I thought I got them to support ESLint syntax for unused variables, though?

@Zarel
Copy link
Member

Zarel commented Apr 3, 2019

Also, the "introduces 2 alerts" message is counting suppressed alerts. They won't show up in the main LGTM page; it's just here so you can't slip bad code past me, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants