-
Notifications
You must be signed in to change notification settings - Fork 49
Hidden Information
Hearthstone is a game with a lot of hidden information:
- Mulligan choices.
- Card orderings in deck. E.g., which card will be drawn next?
- Opponent's hand cards.
- Opponent's secrets.
- Several kinds of card effects. E.g., summon a random 5-cost minion.
- Etc.
This impacts several parts of the process, including board parsing, getting a deterministic starting board, MCTS with hidden information, etc. This document explains them briefly, and propose methods to deal with.
Arena play is very different from constructed play:
- (AI) needs to construct a deck by choosing cards
- Hard to guess opponent's hand & deck by what cards have been played
If it's nearly impossible to guess opponent's hand & deck, maybe we can exclude hand cards in MCTS nodes.
The rest discussions dealt with constructed play only.
We use the logging feature of Hearthstone game client to get the game state. When a card is hidden, the card id of the entity is an empty string (or, does not exist). The cards in hand and deck are all referring to the entity by its id.
If the card id of an entity is not yet revealed, we need to write down how this entity is generated, so we can know the distribution of this card id. For example, if an unknown card is from the initial deck, we know the card id is uniformly distributed across the deck cards. Another example, if an unknown card is generated by a battlecry reads Put a random 5-cost card to your hand, then we know the unknown card id is uniformly distributed across all 5-cost cards.
To start an AI run, it typically requires the hidden information is determined. That is, we need to sample the hidden information by a given distribution, to determine the hidden card ids.
We need to perform mulligan decisions, and need to guess opponent's mulligan decision as well, to gain a correct probability distribution of opponent's hand cards.
At least two ways can be used to deal with it:
- Hand-crafted rules. E.g., Mulligan out cards with cost larger than 2.
- Train a network to decide which cards are good to hold.
Ideally, the distribution of an opponent secret, is relates to the game state when opponent played the secret. We can run/write-down the MCTS scores of that time point, and record to the entity info. What's more important is, we need to update the distribution for each game move. Since we gain knowledge about the secret if the secret is not triggered by a particular action.
If it's turn 10 now, and the opponent is holding 10 cards in his hand. Then, we will get C(30,10) different start boards after determination. This will be a big problem for Multiple-Observer MCTS, which will produce an opponent tree (in fact, a forest) with C(30,10) root, and cannot easily share information among these subtrees. One idea is to limit the sample count. That is, we only sample, say 10, deterministic start boards. Then, for each MCTS iteration, we randomly choose one start board from them, and start the iteration. However, in theory, we still have lower down our strength by 10x, if there are no sharing information between these subtrees.
- Decide N to be the number of starting board.
- Decide the distribution of opponent secrets.
- Sample N determined boards to be the starting boards.
Use Multiple-Observer MCTS can handle hidden information. However, a big problem is, each determined board maps to a root node in opponent's game tree. If there are C(30,10) different combination of opponent's hand cards, then we will have C(30,10) root nodes, and we cannot share subtrees across these root nodes.
We can try to share some information using techniques like Episodic Information Capture and Reuse. Or, just reduce the number of root nodes by limiting the number of sampled boards.
Idea:
Make choices using only current board states and own hand cards. Not using any history.
One problem is to determine opponent secret cards:
- Opponent might play a specific secret cards, according to the board state at that time
- Maybe, record the probability distribution, and reflect this when determining starting board
A similar problem is to determine opponent hand cards:
- Played cards should be considered
- Mulligan strategy should be taken into account
One big question is:
Do we have same strategy if two states are with different probability distribution of opponent hand cards
Recalling that one tree node maps to one strategy, two states can map to the same node if and only if
- Same on-board features (i.e., minions, heroes, weapons, hero powers)
- Same probability distribution of opponent secret cards
- (Discussed below) Same cards hold in hand
- (Discussed below) Same cards hold in opponent's hand
Idea:
Strategy fusion: same strategy without regarding to hand cards.
Or, in other words:
- Your hand cards are always hidden.
- Choose a card to play in a choose-one menu, with #-of-choices = #-of-hand-cards
- Choose battlecry target as usual (if any)
This style basically rules out the combo strategy during one turn:
- Assume card A should be played if card B is also played in the same turn
- You see card A in the first choose-one menu. And played it
- But, in the second choose-one menu, card B might not be there.
In a constructed play, this weakens the AI strength too much. Maybe good if playing aggro v.s. aggro.
Even in an arena play, you still have synergy to play multiple cards in one turn. However, since opponent's card are nearly impossible to guess, maybe it's suitable for opponent.
It means, we perform same strategy under:
- Don't know which cards are already played by opponent
- Don't know which cards are preserved during mulligan
But, we do take above two facts into consideration at root node. We just cannot do that during our tree node.
If we do want to have a different strategy over the above two cases, we can map to a node by:
- Board feature (e.g., minions, heroes, weapons, hero powers)
- Own hand cards
- #-of-cards in opponent hand, which are reserved in mulligan
- List of opponent played cards
The number of sample count for determined starting boards should be chosen with care.
Larger number of sample count leads to a weaker opponent model:
- Each root node of opponent tree maps to one determined starting boards
- The MCTS iterations are effectively spread over different starting boards
- Each opponent tree only covers N-th iterations.
Smaller number of sample count leads to a wrong probability distribution of hidden cards:
- There are hidden cards in opponent's hand, which mainly determines what he/she will do in the next turn.
- If sample count N is too small, we cannot get a correct probability distribution of opponent's hand cards.
- This should includes the results of mulligan. If a card is with a higher probability to preserve, it should has a higher probability to be actually in opponent's hand.
An example to choose sample count N:
- About 5000 iterations per second
- Think 60 seconds per turn.
- Total 300k iterations per turn.
- If we choose N to be 1000, then each opponent tree (mapping by a determined board) covers only 0.3k iterations.
- If we choose N to be 100, then each opponent tree covers 3k iterations.
As one can see, the strength of opponent model is lowered down when the sample count is increased.
The sample count N mentioned above cares only the process to determine the start board, to prevent an excess number of root nodes in MCTS.
However, this issue happens across the entire MCTS tree. For example, when drawing a card from deck, the branching factor is typical 2x, assuming there are 2x cards in the deck.
One way to deal with this problem, is to limit the number of possible random sequences, or effectively, limit the possible random seeds. In this case, only a limited random outcomes are considered, and we can calculate the expected strength of a particular random sequence:
- Assume we are expected to run X = 300k iterations per turn.
- Define the number of random sequences to be Y = 100
- Then, on average, each random outcome is iterated over X/Y = 3K times
The biggest problem is that, a determined board maps to one root node in opponent's tree. One way to deal with this is to gather several determined boards to one root node, providing that:
- These gathered determined starting boards are with the same play strategy.