Hidden Information

Background

Hearthstone is a game with a lot of hidden information:

Mulligan choices.
Card orderings in deck. E.g., which card will be drawn next?
Opponent's hand cards.
Opponent's secrets.
Several kinds of card effects. E.g., summon a random 5-cost minion.
Etc.

This impacts several parts of the process, including board parsing, getting a deterministic starting board, MCTS with hidden information, etc. This document explains them briefly, and propose methods to deal with.

Arena

Arena play is very different from constructed play:

(AI) needs to construct a deck by choosing cards
Hard to guess opponent's hand & deck by what cards have been played

If it's nearly impossible to guess opponent's hand & deck, maybe we can exclude hand cards in MCTS nodes.

The rest discussions dealt with constructed play only.

Board Parsing

We use the logging feature of Hearthstone game client to get the game state. When a card is hidden, the card id of the entity is an empty string (or, does not exist). The cards in hand and deck are all referring to the entity by its id.

Unknown Card Distribution

If the card id of an entity is not yet revealed, we need to write down how this entity is generated, so we can know the distribution of this card id. For example, if an unknown card is from the initial deck, we know the card id is uniformly distributed across the deck cards. Another example, if an unknown card is generated by a battlecry reads Put a random 5-cost card to your hand, then we know the unknown card id is uniformly distributed across all 5-cost cards.

Get Deterministic Start Board

To start an AI run, it typically requires the hidden information is determined. That is, we need to sample the hidden information by a given distribution, to determine the hidden card ids.

Mulligan

We need to perform mulligan decisions, and need to guess opponent's mulligan decision as well, to gain a correct probability distribution of opponent's hand cards.

At least two ways can be used to deal with it:

Hand-crafted rules. E.g., Mulligan out cards with cost larger than 2.
Train a network to decide which cards are good to hold.

Unknown Card Distribution of Opponent Secret

Ideally, the distribution of an opponent secret, is relates to the game state when opponent played the secret. We can run/write-down the MCTS scores of that time point, and record to the entity info. What's more important is, we need to update the distribution for each game move. Since we gain knowledge about the secret if the secret is not triggered by a particular action.

Reduced Sample Count

If it's turn 10 now, and the opponent is holding 10 cards in his hand. Then, we will get C(30,10) different start boards after determination. This will be a big problem for Multiple-Observer MCTS, which will produce an opponent tree (in fact, a forest) with C(30,10) root, and cannot easily share information among these subtrees. One idea is to limit the sample count. That is, we only sample, say 10, deterministic start boards. Then, for each MCTS iteration, we randomly choose one start board from them, and start the iteration. However, in theory, we still have lower down our strength by 10x, if there are no sharing information between these subtrees.

Concluded Sampling Process

Decide N to be the number of starting board.
Decide the distribution of opponent secrets.
Sample N determined boards to be the starting boards.

MCTS

Use Multiple-Observer MCTS can handle hidden information. However, a big problem is, each determined board maps to a root node in opponent's game tree. If there are C(30,10) different combination of opponent's hand cards, then we will have C(30,10) root nodes, and we cannot share subtrees across these root nodes.

We can try to share some information using techniques like Episodic Information Capture and Reuse. Or, just reduce the number of root nodes by limiting the number of sampled boards.

Reduce State Space

Idea:

Make choices using only current board states and own hand cards. Not using any history.

One problem is to determine opponent secret cards:

Opponent might play a specific secret cards, according to the board state at that time
Maybe, record the probability distribution, and reflect this when determining starting board

A similar problem is to determine opponent hand cards:

Played cards should be considered
Mulligan strategy should be taken into account

One big question is:

Do we have same strategy if two states are with different probability distribution of opponent hand cards

Recalling that one tree node maps to one strategy, two states can map to the same node if and only if

Same on-board features (i.e., minions, heroes, weapons, hero powers)
Same probability distribution of opponent secret cards
(Discussed below) Same cards hold in hand
(Discussed below) Same cards hold in opponent's hand

Exclude Own Hand-Cards in Node

Idea:

Strategy fusion: same strategy without regarding to hand cards.

Or, in other words:

Your hand cards are always hidden.
Choose a card to play in a choose-one menu, with #-of-choices = #-of-hand-cards
Choose battlecry target as usual (if any)

This style basically rules out the combo strategy during one turn:

Assume card A should be played if card B is also played in the same turn
You see card A in the first choose-one menu. And played it
But, in the second choose-one menu, card B might not be there.

In a constructed play, this weakens the AI strength too much. Maybe good if playing aggro v.s. aggro.

Even in an arena play, you still have synergy to play multiple cards in one turn. However, since opponent's card are nearly impossible to guess, maybe it's suitable for opponent.

Exclude Opponent Hand Cards in Node

It means, we perform same strategy under:

Don't know which cards are already played by opponent
Don't know which cards are preserved during mulligan

But, we do take above two facts into consideration at root node. We just cannot do that during our tree node.

If we do want to have a different strategy over the above two cases, we can map to a node by:

Board feature (e.g., minions, heroes, weapons, hero powers)
Own hand cards
#-of-cards in opponent hand, which are reserved in mulligan
List of opponent played cards

Trade-off of Sample Count N

The number of sample count for determined starting boards should be chosen with care.

Larger number of sample count leads to a weaker opponent model:

Each root node of opponent tree maps to one determined starting boards
The MCTS iterations are effectively spread over different starting boards
Each opponent tree only covers N-th iterations.

Smaller number of sample count leads to a wrong probability distribution of hidden cards:

There are hidden cards in opponent's hand, which mainly determines what he/she will do in the next turn.
If sample count N is too small, we cannot get a correct probability distribution of opponent's hand cards.
This should includes the results of mulligan. If a card is with a higher probability to preserve, it should has a higher probability to be actually in opponent's hand.

An example to choose sample count N:

About 5000 iterations per second
Think 60 seconds per turn.
Total 300k iterations per turn.
If we choose N to be 1000, then each opponent tree (mapping by a determined board) covers only 0.3k iterations.
If we choose N to be 100, then each opponent tree covers 3k iterations.

As one can see, the strength of opponent model is lowered down when the sample count is increased.

More Words on Sample Count N

The sample count N mentioned above cares only the process to determine the start board, to prevent an excess number of root nodes in MCTS.

However, this issue happens across the entire MCTS tree. For example, when drawing a card from deck, the branching factor is typical 2x, assuming there are 2x cards in the deck.

One way to deal with this problem, is to limit the number of possible random sequences, or effectively, limit the possible random seeds. In this case, only a limited random outcomes are considered, and we can calculate the expected strength of a particular random sequence:

Assume we are expected to run X = 300k iterations per turn.
Define the number of random sequences to be Y = 100
Then, on average, each random outcome is iterated over X/Y = 3K times

Improve Ideas

The biggest problem is that, a determined board maps to one root node in opponent's tree. One way to deal with this is to gather several determined boards to one root node, providing that:

These gathered determined starting boards are with the same play strategy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly