CPRG Home

  • On Strategy Stitching in Large Extensive Form Multiplayer Games
    • Authors: Richard Gibson and Duane Szafron
    • Purpose: Proceedings of the Twenty-Fifth Conference on Neural Information Processing Systems (NIPS 2011)
    • Date: December 2011
    • Show Abstract.
      Computing a good strategy in a large extensive form game often demands an extraordinary amount of computer memory, necessitating the use of abstraction to reduce the game size. Typically, strategies from abstract games perform better in the real game as the granularity of abstraction is increased. This paper investigates two techniques for stitching a base strategy in a coarse abstraction of the full game tree, to expert strategies in fine abstractions of smaller subtrees. We provide a general framework for creating static experts, an approach that generalizes some previous strategy stitching efforts. In addition, we show that static experts can create strong agents for both 2-player and 3-player Leduc and Limit Texas Hold’em poker, and that a specific class of static experts can be preferred among a number of alternatives. Furthermore, we describe a poker agent that used static experts and won the 3-player events of the 2010 Annual Computer Poker Competition.

      Hide Abstract.

    • [PDF]

  • Automated Action Abstraction of Imperfect Information Extensive-Form Games
    • Authors: John Hawkin, Robert Holte, and Duane Szafron
    • Purpose: Proceedings of the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11)
    • Date: August 2011
    • Show Abstract.
      Multi-agent decision problems can often be formulated as extensive-form games. We focus on imperfect information extensive-form games in which one or more actions at many decision points have an associated continuous or many-valued parameter. A stock trading agent, in addition to deciding whether to buy or not, must decide how much to buy. In no-limit poker, in addition to selecting a probability for each action, the agent must decide how much to bet for each betting action. Selecting values for these parameters makes these games extremely large. Two-player no-limit Texas Hold'em poker with stacks of 500 big blinds has approximately 10^71 states, which is more than 10^50 times more states than two-player limit Texas Hold'em. The main contribution of this paper is a technique that abstracts a game's action space by selecting one, or a small number, of the many values for each parameter. We show that strategies computed using this new algorithm for no-limit Leduc poker exhibit significant utility gains over epsilon-Nash equilibrium strategies computed with standard, hand-crafted parameter value abstractions.

      Hide Abstract.

    • [PDF]

  • Accelerating Best Response Calculation in Large Extensive Games
    • Authors: Michael Johanson, Kevin Waugh, Michael Bowling, and Martin Zinkevich
    • Purpose: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI-11)
    • Date: July 2011
    • Show Abstract.
      One fundamental evaluation criteria of an AI technique is its performance in the worst-case. For static strategies in extensive games, this can be computed using a best response computation. Conventionally, this requires a full game tree traversal. For very large games, such as poker, that traversal is infeasible to perform on modern hardware. In this paper, we detail a general technique for best response computations that can often avoid a full game tree traversal. Additionally, our method is specifically well-suited for parallel environments. We apply this approach to computing the worst-case performance of a number of strategies in heads-up limit Texas hold'em, which, prior to this work, was not possible. We explore these results thoroughly as they provide insight into the effects of abstraction on worst-case performance in large imperfect information games. This is a topic that has received much attention, but could not previously be examined outside of toy domains.

      Hide Abstract.

    • [PDF]

  • Using Counterfactual Regret Minimization to Create Competitive Multiplayer Poker Agents
    • Authors: Nick Abou Risk and Duane Szafron
    • Purpose: Autonomous Agents and Multiagent Systems 2010 (AAMAS-10)
    • Date: May 2010
    • Show Abstract.
      Games are used to evaluate and advance Multiagent and Artificial Intelligence techniques. Most of these games are deterministic with perfect information (e.g. Chess and Checkers). A deterministic game has no chance element and in a perfect information game, all information is visible to all players. However, many real-world scenarios with competing agents are stochastic (non-deterministic) with imperfect information. For two-player zero-sum perfect recall games, a recent technique called Counterfactual Regret Minimization (CFR) computes strategies that are provably convergent to an epsilon-Nash equilibrium. A Nash equilibrium strategy is useful in two-player games since it maximizes its utility against a worst-case opponent. However, for multiplayer (three or more player) games, we lose all theoretical guarantees for CFR. However, we believe that CFR-generated agents may perform well in multiplayer games. To test this hypothesis, we used this technique to create several 3-player limit Texas Hold'em poker agents and two of them placed first and second in the 3-player event of the 2009 AAAI/IJCAI Computer Poker Competition. We also demonstrate that good strategies can be obtained by grafting sets of two-player subgame strategies to a 3-player base strategy after one of the players is eliminated.

      Hide Abstract.

    • [PDF]

  • Strategy Grafting in Extensive Games
    • Authors: Kevin Waugh, Nolan Bard, and Michael Bowling
    • Purpose: Advances in Neural Information Processing Systems 22 (NIPS-09)
    • Date: December 2009
    • Show Abstract.
      Extensive games are often used to model the interactions of multiple agents within an environment. Much recent work has focused on increasing the size of an extensive game that can be feasibly solved. Despite these improvements, many interesting games are still too large for such techniques. A common approach for computing strategies in these large games is to first employ an abstraction technique to reduce the original game to an abstract game that is of a manageable size. This abstract game is then solved and the resulting strategy is played in the original game. Most top programs in recent AAAI Computer Poker Competitions use this approach. The trend in this competition has been that strategies found in larger abstract games tend to beat strategies found in smaller abstract games. These larger abstract games have more expressive strategy spaces and therefore contain better strategies. In this paper we present a new method for computing strategies in large games. This method allows us to compute more expressive strategies without increasing the size of abstract games that we are required to solve. We demonstrate the power of the approach experimentally in both small and large games, while also providing a theoretical justification for the resulting improvement.

      Hide Abstract.

    • [PDF]

  • State Translation in No-Limit Poker
    • Authors: David Paul Schnizlein
    • Purpose: M. Sc. Thesis
    • Date: Fall 2009
    • Show Abstract.
      One way to create a champion level poker agent is to compute a Nash Equilibrium in an abstract version of the poker game. The resulting strategy is then used to play in the full game. With this approach, translation is required between the full and abstract games in order to use the abstract strategy. In limit poker this translation step is defined when the abstraction is chosen. However, when considering no-limit poker the translation process becomes more complicated. We formally describe the process of translation and investigate its consequences. We examine how the current method, hard translation, can result in exploitable agents and introduce a new probabilistic method, soft translation, that produces more robust players. We also investigate how switching between strategies with different underlying abstractions affects the performance of an agent.

      Hide Abstract.

    • [PDF]

  • Using Counterfactual Regret Minimization to Create a Competitive Multiplayer Poker Agent
    • Authors: Nicholas Abou Risk
    • Purpose: M. Sc. Thesis
    • Date: Fall 2009
    • Show Abstract.
      Games have been used to evaluate and advance techniques in the field of Artificial Intelligence since before computers were invented. Many of these games have been deterministic perfect information games (e.g. Chess and Checkers). A deterministic game has no chance element and in a perfect information game, all information is visible to all players. However, many real-world scenarios involving competing agents can be more accurately modeled as stochastic (non-deterministic), imperfect information games, and this dissertation investigates such games. Poker is one such game played by millions of people around the world; it will be used as the testbed of the research presented in this dissertation. For a specific set of games, two-player zero-sum perfect recall games, a recent technique called Counterfactual Regret Minimization (CFR) computes strategies that are provably convergent to an -Nash equilibrium. A Nash equilibrium strategy is very useful in two-player games as it maximizes its utility against a worst-case opponent. However, once we move to multiplayer games, we lose all theoretical guarantees for CFR. Furthermore, we have no theoretical guarantees about the performance of a strategy from a multiplayer Nash equilibrium against two arbitrary opponents. Despite the lack of theoretical guarantees, my thesis is that CFR-generated agents may perform well in multiplayer games. I created several 3-player limit Texas Hold\u2019em Poker agents and the results of the 2009 Computer Poker Competition demonstrate that these are the strongest 3-player computer Poker agents in the world. I also contend that a good strategy can be obtained by grafting a set of two-player subgame strategies to a 3-player base strategy when one of the players is eliminated.

      Hide Abstract.

    • [PDF]

  • Abstraction In Large Extensive Games
    • Authors: Kevin Waugh
    • Purpose: M. Sc. Thesis
    • Date: Fall 2009
    • Show Abstract.
      Extensive games model scenarios where multiple agents interact with a potentially stochastic environment. In the case of two-player zero-sum games, we have efficient techniques for computing solutions. These solutions are optimal in that they guarantee the player the highest expected reward against a worst-case opponent. Unfortunately, there are many interesting two-player zero-sum games that are too large for the state-of-the-art solvers. For example, heads-up Texas Hold\u2019em has 10^18 game states and requires over two petabytes of storage to record a single strategy1. A popular approach for tackling these large games is to use an abstraction technique to create a smaller game that models the original game. A solution to the smaller abstract game can be computed and is presumed good in the original game. A common assumption is that the more accurately the abstract game models the original game the stronger the resulting strategy will be. There is substantial empirical evidence to support this assumption and, prior to this work, it had not been questioned. In this thesis, we show that this assumption is not true. That is, using a larger, more accurate, abstract game can lead to weaker strategies. To show this, we must first formalize what it means to abstract a game as well as what it means for an abstraction to be bigger or more informed. We then show that the assumption fails to hold under two criteria for evaluating a strategy. The first is exploitability, which measures performance against a worst-case adversary. The second is a new evaluation criterion called the domination value, which essentially measures how many mistakes (actions that one should never take) a strategy makes. Despite these abstraction pathologies, we argue that solutions to the larger abstract games tend to make fewer mistakes. This leads us to develop a new technique, called strategy grafting, that can be used to augment a strong base strategy. It does this by decomposing an extensive game into multiple sub-games. These sub-games are then solved separately, but with shared knowledge of the underlying base strategy. This technique allows us to make effective use of much larger strategy spaces than previously possible. We provide a weak theoretical guarantee on the quality of this new technique, but we show in practice that it does quite well even when the conditions on our theoretical results do not hold. Furthermore, when evaluated under our new criterion, we notice that grafted strategies tend to make fewer mistakes than their base strategy.

      Hide Abstract.

    • [PDF]

  • Probabilistic State Translation in Extensive Games with Large Action Sets
    • Authors: David Schnizlein, Michael Bowling, and Duane Szafron.
    • Purpose: Proceedings of the 2009 International Joint Conference on Artificial Intelligence (IJCAI)
    • Date: 2009
    • Show Abstract.
      Equilibrium or near-equilibrium solutions to very large extensive form games are often computed by using abstractions to reduce the game size. A common abstraction technique for games with a large number of available actions is to restrict the number of legal actions in every state. This method has been used to discover equilibrium solutions for the game of no-limit heads-up Texas Hold'em. When using a solution to an abstracted game to play one side in the un-abstracted (real) game, the real opponent actions may not correspond to actions in the abstracted game. The most popular method for handling this situation is to translate opponent actions in the real game to the closest legal actions in the abstracted game. We show that this approach can result in a very exploitable player and propose an alternative solution. We use probabilistic mapping to translate a real action into a probability distribution over actions, whose weights are determined by a similarity metric. We show that this approach significantly reduces the exploitability when using an abstract solution in the real game.

      Hide Abstract.

    • [PDF]

  • A Practical Use of Imperfect Recall
    • Authors: Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling.
    • Purpose: Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA)
    • Date: 2009
    • Show Abstract.
      Perfect recall is the common and natural assumption that an agent never forgets. As a consequence, the agent can always condition its choice of action on any prior observations. In this paper, we explore relaxing this assumption. We observe the negative impact this relaxation has on algorithms: some algorithms are no longer well-defined, while others lose their theoretical guarantees on the quality of a solution. Despite these disadvantages, we show that removing this restriction can provide considerable empirical advantages when modeling extremely large extensive games. In particular, it allows fine granularity of the most relevant observations without requiring decisions to be contingent on all past observations. In the domain of poker, this improvement enables new types of information to be used in the abstraction. By making use of imperfect recall and new types of information, our poker program was able to win the limit equilibrium event as well as the no-limit event at the 2008 AAAI Computer Poker Competition. We show experimental results to verify that our programs using imperfect recall are indeed stronger than their perfect recall counterparts.

      Hide Abstract.

    • [PDF]

  • Data Biased Robust Counter Strategies
    • Authors: Michael Johanson, Michael Bowling.
    • Purpose: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS)
    • Date: 2009
    • Show Abstract.
      The problem of exploiting information about the environment while still being robust to inaccurate or incomplete information arises in many domains. Competitive imperfect information games where the goal is to maximally exploit an unknown opponent's weaknesses are an example of this problem. Agents for these games must balance two objectives. First, they should aim to exploit data from past interactions with the opponent, seeking a best-response counter strategy. Second, they should aim to minimize losses since the limited data may be misleading or the opponent's strategy may have changed, suggesting an opponent-agnostic Nash equilibrium strategy. In this paper, we show how to partially satisfy both of these objectives at the same time, producing strategies with favourable tradeoffs between the ability to exploit an opponent and the capacity to be exploited. Like a recently published technique, our approach involves solving a modified game; however the result is more generally applicable and even performs well in situations with very limited data. We evaluate our technique in the game of two-player, Limit Texas Hold'em.

      Hide Abstract.

    • [PDF]

  • Abstraction Pathologies in Extensive Games
    • Authors: Kevin Waugh, David Schnizlein, Michael Bowling, and Duane Szafron.
    • Purpose: Autonomous Agents and Multiagent Systems 2009 (AAMAS-09)
    • Date: 2009
    • Show Abstract.
      Extensive games can be used to model many scenarios in which multiple agents interact with an environment. There has been considerable recent research on finding strong strategies in very large, zero-sum extensive games. The standard approach in such work is to employ abstraction techniques to derive a more tractably sized game. An extensive game solver is then employed to compute an equilibrium in that abstract game, and the resulting strategy is presumed to be strong in the full game. Progress in this line of research has focused on solving larger abstract games, which more closely resemble the full game. However, there is an underlying assumption that by abstracting less, and solving a larger game, an agent will have a stronger strategy in the full game. In this work we show that this assumption is not true in general. Refining an abstraction can actually lead to a weaker strategy. We show examples of these abstraction pathologies in a small game of poker that can be analyzed exactly. These examples show that pathologies arise when abstracting both chance nodes as well as a player's actions. In summary, this paper shows that the standard approach to finding strong strategies for large extensive games rests on shaky ground.

      Hide Abstract.

    • [PDF]

  • Strategy Evaluation in Extensive Games with Importance Sampling
    • Authors: Michael Bowling, Michael Johanson, Neil Burch and Duane Szafron
    • Purpose: Proceedings of the 25th Annual International Conference on Machine Learning (ICML)
    • Date: 2008
    • Show Abstract.
      Typically agent evaluation is done through Monte Carlo estimation. However, stochastic agent decisions and stochastic outcomes can make this approach inefficient, requiring many samples for an accurate estimate. We present a new technique that can be used to simultaneously evaluate many strategies while playing a single strategy in the context of an extensive game. This technique is based on importance sampling, but utilizes two new mechanisms for significantly reducing variance in the estimates. We demonstrate its effectiveness in the domain of poker, where stochasticity makes traditional evaluation problematic.

      Hide Abstract.

    • [PDF]

  • Using State Estimation for Dynamic Agent Modelling
    • Authors: Nolan Bard
    • Purpose: M. Sc. Thesis
    • Date: March 2008
    • Show Abstract.
      Agent modelling is a challenging problem in many modern artificial intelligence applications. The agent modelling task is especially difficult when handling stochastic choices, deliberately hidden information, dynamic agents, and the need for fast learning. State estimation techniques, such as Kalman filtering and particlefiltering, have addressed many of these challenges, but have received little attention in the agent modelling literature. This thesis explores the use of particle filtering for modelling dynamic opponents in Kuhn poker, a simplified version of Texas Hold'em poker. We demonstrate effective modelling both against static opponents as well as dynamic opponents, when the dynamics are known. We then examine an application of Rao-Blackwellized particle filtering for doing dual estimation, inferring both the opponents state and a model of its dynamics. Finally, we examine the robustness of the approach to incorrect beliefs about the opponent and compare it to previous work on opponent modelling in Kuhn poker.

      Hide Abstract.

    • [PDF]

  • Regret Minimization in Games with Incomplete Information
    • Authors: Martin Zinkevich, Michael Johanson, Michael Bowling, Carmelo Piccione
    • Purpose: Advances in Neural Information Processing Systems 20 (NIPS)
    • Date: 2007
    • Show Abstract.
      Extensive games are a powerful model of multiagent decision-making scenarios with incomplete information. Finding a Nash equilibrium for very large instances of these games has received a great deal of recent attention. In this paper, we describe a new technique for solving large games based on regret minimization. In particular, we introduce the notion of counterfactual regret, which exploits the degree of incomplete information in an extensive game. We show how minimizing counterfactual regret minimizes overall regret, and therefore in self-play can be used to compute a Nash equilibrium. We demonstrate this technique in the domain of poker, showing we can solve abstractions of limit Texas Hold'em with as many as 10^12 states, two orders of magnitude larger than previous methods.

      Hide Abstract.

    • [PDF]

  • Computing Robust Counter-Strategies
    • Authors: Michael Johanson, Martin Zinkevich, Michael Bowling.
    • Purpose: Advances in Neural Information Processing Systems 20 (NIPS)
    • Date: 2007
    • Show Abstract.
      Adaptation to other initially unknown agents often requires computing an effective counter-strategy. In the Bayesian paradigm, one must find a good counter-strategy to the inferred posterior of the other agents' behavior. In the experts paradigm, one may want to choose experts that are good counter-strategies to the other agents' expected behavior. In this paper we introduce a technique for computing {\em robust} counter-strategies for adaptation in multiagent scenarios under a variety of paradigms. The strategies can take advantage of a suspected tendency in the decisions of the other agents, while bounding the worst-case performance when the tendency is not observed. The technique involves solving a modified game, and therefore can make use of recently developed algorithms for solving very large extensive games. We demonstrate the effectiveness of the technique in two-player Texas Hold'em. We show that the computed poker strategies are substantially more robust than best response counter-strategies, while still exploiting a suspected tendency. We also compose the generated strategies in an experts algorithm showing a dramatic improvement in performance over using simple best responses.

      Hide Abstract.

    • [PDF]

  • Robust Strategies and Counter-Strategies: Building a Champion Level Computer Poker Player
    • Authors: Michael Johanson
    • Purpose: M. Sc. Thesis
    • Date: October 2007
    • Show Abstract.
      Poker is a challenging game with strong human and computer players. In this thesis, we will explore four approaches towards creating a computer program capable of challenging these poker experts. The first approach is to approximate a Nash equilibrium strategy which is robust against any opponent. The second approach is to find an exploitive counter-strategy to an opponent. We will show that these counter-strategies are brittle: they can lose to arbitrary other opponents. The third approach is a compromise of the first two, to find robust counter-strategies. The four approach is to combine several of these agents into a team, and learn during a game which to use. As proof of the value of these techniques, we have used the resulting poker programs to win an event in the 2007 AAAI Computer Poker Competition and play competitively against two human poker professionals in the First Man-Machine Poker Championship.

      Hide Abstract.

    • [PDF]

  • Particle Filtering for Dynamic Agent Modelling in Simplified Poker
    • Authors: Nolan Bard and Michael Bowling
    • Purpose: Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI)
    • Date: 2007
    • Show Abstract.
      Agent modelling is a challenging problem in many modern artificial intelligence applications. The agent modelling task is especially difficult when handling stochastic choices, deliberately hidden information, dynamic agents, and the need for fast learning. State estimation techniques, such as Kalman filtering and particle filtering, have addressed many of these challenges, but have received little attention in the agent modelling literature. This paper looks at the use of particle filtering for modelling a dynamic opponent in Kuhn poker, a simplified version of Texas Hold'em poker. We demonstrate effective modelling both against static opponents as well as dynamic opponents, when the dynamics are known. We then examine an application of Rao-Blackwellized particle filtering for doing dual estimation, inferring both the opponent's state as well as a model of its dynamics. Finally, we examine the robustness of the approach to incorrect beliefs about the opponent and compare it to previous work on opponent modelling in Kuhn poker.

      Hide Abstract.

    • [PDF]

  • A New Algorithm for Generating Equilibria in Massive Zero-Sum Games
    • Authors: Martin Zinkevich, Michael Bowling, and Neil Burch
    • Purpose: Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI)
    • Date: 2007
    • Show Abstract.
      In normal scenarios, computer scientists often consider the number of states in a game to capture the difficulty of learning an equilibrium. However, players do not see games in the same light: most consider Go or Chess to be more complex than Monopoly. In this paper, we discuss a new measure of game complexity that links existing state-of-the-art algorithms for computing approximate equilibria to a more human measure. In particular, we consider the range of skill in a game, i.e. how many different skill levels exist. We then modify existing techniques to design a new algorithm to compute approximate equilibria whose performance can be captured by this new measure. We use it to develop the first near Nash equilibrium for a four round abstraction of poker, and show that it would have been able to win handily the bankroll competition from last year's AAAI poker competition.

      Hide Abstract.

    • [PDF]

  • Postgame Analysis of Poker Decisions
    • Authors: Morgan Kan
    • Purpose: M. Sc. Thesis
    • Date: January 2007
    • Show Abstract.
      In Artificial Intelligence research, evaluation is a recurring theme. A newly crafted game-playing program is interesting if it can be shown to be better by some measure. Usually, this is done by running a series of matches against other programs to generate a statistically significant result. In some games, particularly ones with imperfect information or stochastic elements, it may be infeasible to play enough games to achieve statistical significance. A new tool called DIVAT is presented for analyzing the quality of decisions in the game of poker. Instead of using the net monetary outcome of each game to evaluate the player, the tool focuses on the skill elements of the game. The expected value of the player's actions are compared to a reasonable baseline policy. The effect of stochasticity and imperfect information is greatly reduced resulting in an unbiased low-variance estimator for the game of poker.

      Hide Abstract.

    • [PDF]

  • Algorithms and Assessment in Computer Poker
    • Authors: Darse Billings
    • Purpose: Ph.D. Dissertation
    • Date: September 2006
    • Show Abstract.
      The game of poker offers a clean well-defined domain in which to investigate some truly fundamental issues in computing science, such as how to handle deliberate misinformation, and how to make intelligent guesses based on partial knowledge. In the taxonomy of games, poker lies at the opposite end of the spectrum from well-studied board games like checkers and chess. Poker is a multi-player game with stochasticity (random events occurring over a known probability distribution), imperfect information (some information is hidden from each player), and partially observable outcomes (some information might never be known). Consequently, the use of deception, opponent modeling, and coping with uncertainty are indispensable elements of high-level strategic play. Traditional methods for computer game-playing are incapable of handling these properties. Writing a program to play a skillful game of poker is a challenging proposition from an engineering perspective as well, since each decision must be made quickly (typically about one second). A major theme of this dissertation is the evolution of architectures for poker-playing programs that has occurred since the research began in 1992. Four distinct approaches we address are: knowledge-based systems, simulation, game-theoretic methods, and adaptive imperfect information game-tree search. Each phase of the research explored the strengths and weaknesses of the corresponding architectures, both in theory and in practice. The important problem of obtaining an accurate assessment of performance is also addressed. The creation of a powerful tool for this purpose, called DIVAT, is discussed in detail. The academic papers arising from each of these studies constitute the core chapters of this thesis. The conclusion discusses some of the advantages and shortcomings of each approach, along with the most valuable lessons learned in the process. Although the goal of surpassing all human players has not yet been attained, the path has been cleared. The best poker programs are beginning to pose a serious threat, even in this most ``human'' of games. As programs continue to improve, they provide new insights into poker strategy, and valuable lessons on how to handle various forms of uncertainty.

      Hide Abstract.

    • [PDF]

  • Optimal Unbiased Estimators For Evaluating Agent Performance
    • Authors: Martin Zinkevich, Michael Bowling, Nolan Bard, Morgan Kan, and Darse Billings.
    • Purpose: Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI)
    • Date: 2006
    • Show Abstract.
      Evaluating the performance of an agent or group of agents can be, by itself, a very challenging problem. The stochastic nature of the environment plus the stochastic nature of agents' decisions can result in estimates with intractably large variances. This paper examines the problem of finding low variance estimates of agent performance. In particular, we assume that some agent-environment dynamics are known, such as the random outcome of drawing a card or rolling a die. Other dynamics are unknown, such as the reasoning of a human or other black-box agent. Using the known dynamics, we describe the complete set of all unbiased estimators, that is, for any possible unknown dynamics the estimate's expectation is always the agent's expected utility. Then, given a belief about the unknown dynamics, we identify the unbiased estimator with minimum variance. If the belief is correct our estimate is optimal, and if the belief is wrong it is at least unbiased. Finally, we apply our unbiased estimator to the game of poker, demonstrating dramatically reduced variance and faster evaluation.

      Hide Abstract.

    • [PDF]

  • A Tool for the Direct Assessment of Poker Decisions
    • Authors: Darse Billings, Morgan Kan
    • Purpose: The International Computer Games Association Journal, vol 29(3), pp. 119-142, October 2006
    • Date: 2006
    • Show Abstract.
      The element of luck permeates every aspect of the game of poker. Random stochastic outcomes introduce a large amount of noise that can make it very difficult to distinguish a good player from a bad one, much less trying to quantify small differences in skill. Good methods for directly assessing the quality of decisions exist for other stochastic games, including backgammon and blackjack. However, those are perfect information games, where the notion of an objectively best move is well-defined, which unfortunately is not the case for imperfect information games, in general. The Ignorant Value Assessment Tool, DIVAT, uses perfect knowledge (hindsight) analysis to quantify the value of each player decision made in a game of two-player Limit Texas Hold'em. Comparisons are made against a realistic baseline betting sequence, which is based on quasi-equilibrium policies and game-theoretic invariant frequencies for raising and folding. Much of the relevant context involved in each decision is simply ignored, out of necessity; but enough is retained to provide a reasonably accurate yardstick, which is then applied equally to all players. The frequency and magnitude of expected value differences between player actions, relative to the baseline, provides a statistically unbiased low-variance estimate of the long-term expected outcome between the players.

      Hide Abstract.

    • [PDF]

  • The Effectiveness of Opponent Modelling in a Small Imperfect Information Game
    • Authors: Bret Hoehn
    • Purpose: M. Sc. Thesis
    • Date: Spring 2006
    • Show Abstract.
      Opponent modelling is an important issue in games programming today. Programs which do not perform opponent modelling are unlikely to take full advantage of the mistakes made by an opponent. Additionally, programs which do not adapt over time become less of a challenge to players, causing these players to lose interest. While opponent modelling can be a difficult challenge in perfect information games, where the full state of the game is known to all players at all times, it becomes an even more difficult task in games of imperfect information, where players are not always able to observe the actual state of the game. This thesis studies the problem of opponent modelling in Kuhn Poker, a small imperfect information game that contains several properties that make real-world poker games interesting. Two basic types of opponent modelling are studied, explicit modelling and implicit modelling, and their effectiveness is compared.

      Hide Abstract.

    • [PDF]

  • Opponent Modelling and Search in Poker
    • Authors: Terence Schauenberg
    • Purpose: M. Sc. Thesis
    • Date: 2006
    • Show Abstract.
      Poker is a challenging domain that contains both elements of chance and imperfect information. Though progress has been made in the domain, there is still one major stumbling block on the way to creating a world-class calibre computer player. This is the task of learning how an opponent plays (i.e., opponent modelling) and subsequently coming up with a counter-strategy that can exploit that information. The work in this thesis explores this task. A program is implemented that models the opponent through game play and then plans an exploitive counter-strategy using expectimax search. This program is evaluated using two different heads-up limit poker variations: a small-scale variation called Leduc Hold'em, and a full-scale one called Texas Hold'em.

      Hide Abstract.

    • [PDF]

  • Effective Short-Term Opponent Exploitation in Simplified Poker
    • Authors: Bret Hoehn, Finnegan Southey, Robert C. Holte, and Valeriy Bulitko
    • Purpose: AAAI'05
    • Date: 2005
    • Show Abstract.
      Uncertainty in poker stems from two key sources, the shuffled deck and an adversary whose strategy is unknown. One approach is to find a pessimistic game theoretic solution (i.e. a Nash equilibrium), but human players have idiosyncratic weaknesses that can be exploited if a model of their strategy can be learned by observing their play. However, games against humans last for at most a few hundred hands so learning must be fast to be effective. We explore two approaches to opponent modelling in the context of Kuhn poker, a small game for which game theoretic solutions are known. Parameter estimation and expert algorithms are both studied. Experiments demonstrate that, even in this small game, convergence to maximally exploitive solutions in a small number of hands is impractical, but that good (i.e. better than Nash or breakeven) performance can be achieved in a short period of time. Finally, we show that amongst a set of strategies with equal game theoretic value, in particular the set of Nash equilibrium strategies, some are preferable because they speed learning of the opponent's strategy by exploring it more effectively.

      Hide Abstract.

    • [PDF]

  • Bayes' Bluff: Opponent Modelling in Poker
    • Authors: Finnegan Southey, Michael Bowling, Bryce Larson, Carmelo Piccione, Neil Burch, Darse Billings, and Chris Rayner
    • Purpose: Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI)
    • Date: 2005
    • Show Abstract.
      Poker is a challenging problem for artcial intelligence, with non-deterministic dynamics, partial observability, and the added diculty of unknown adversaries. Modelling all of the uncertainties in this domain is not an easy task. In this paper we present a Bayesian probabilistic model for a broad class of poker games, separating the uncertainty in the game dynamics from the uncertainty of the opponent's strategy. We then describe approaches to two key subproblems: (i) inferring a posterior over opponent strategies given a prior distribution and observations of their play, and (ii) playing an appropriate response to that distribution. We demonstrate the overall approach on a reduced version of poker using Dirichlet priors and then on the full game of Texas hold'em using a more informed prior. We demonstrate methods for playing effective responses to the opponent, based on the posterior.

      Hide Abstract.

    • [PDF]

  • Game tree search with adaptation in stochastic imperfect information games
    • Authors: Darse Billings, Aaron Davidson, Terence Schauenberg, Neil Burch, Michael Bowling, Robert Holte, Jonathan Schaeffer, and Duane Szafron
    • Purpose: Computers and Games (CG)
    • Date: 2004
    • Show Abstract.
      Building a high-performance poker-playing program is a challenging project. The best program to date, PsOpti, uses game theory to solve a simplified version of the game. Although the program plays reasonably well, it is oblivious to the opponent's weaknesses and biases. Modeling the opponent to exploit predictability is critical to success at poker. This paper introduces Vexbot, a program that uses a game tree search algorithm to compute the expected value of each betting option, and does real-time opponent modeling to improve its evaluation function estimates. The result is a program that defeats PsOpti convincingly, and poses a much tougher challenge for strong human players.

      Hide Abstract.

    • [PDF]

  • Approximating Game-Theoretic Optimal Strategies for Full-scale Poker
    • Authors: Darse Billings, Neil Burch, Aaron Davidson, Robert Holte, Jonathan Schaeffer, Terence Schauenberg, and Duane Szafron
    • Purpose: Proceedings of the 2003 International Joint Conference on Artificial Intelligence
    • Date: 2003
    • Show Abstract.
      The computation of the first complete approximations of game-theoretic optimal strategies for full-scale poker is addressed. Several abstraction techniques are combined to represent the game of 2-player Texas Hold'em, having size O(10^18), using closely related models each having size O(10^7). Despite the reduction in size by a factor of 100 billion, the resulting models retain the key properties and structure of the real game. Linear programming solutions to the abstracted game are used to create substantially improved poker-playing programs, able to defeat strong human players and be competitive against world-class opponents.

      Hide Abstract.

    • [PDF]

  • The Challenge of Poker
    • Authors: Darse Billings, Aaron Davidson, Jonathan Schaeffer, and Duane Szafron
    • Purpose: Artificial Intelligence Journal, vol 134(1-2)
    • Date: 2002
    • Show Abstract.
      Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect information, where multiple competing agents must deal with probabilistic knowledge, risk assessment, and possible deception, not unlike decisions made in the real world. Opponent modeling is another difficult problem in decision-making applications, and it is essential to achieving high performance in poker. This paper describes the design considerations and architecture of the poker program Poki. In addition to methods for hand evaluation and betting strategy, Poki uses learning techniques to construct statistical models of each opponent, and dynamically adapts to exploit observed patterns and tendencies. The result is a program capable of playing reasonably strong poker, but there remains considerable research to be done to play at a world-class level.

      Hide Abstract.

    • [PDF]

  • Opponent Modeling in Poker: Learning and Acting in a Hostile and Uncertain Environment
    • Authors: Aaron Davidson
    • Purpose: M. Sc. Thesis
    • Date: 2002
    • Show Abstract.
      Artificial intelligence research has had great success in many classic games such as chess, checkers, and othello. In these perfect-information domains, alpha-beta search is sufficient to achieve a high level of play. However, Artificial intelligence research has long been criticized for focusing on deterministic domains of perfect information -- many problems in the real world exhibit properties of imperfect or incomplete information and non-determinism. Poker, the archetypal game studied by game-theorists, is a challenging domain containing both imperfect information and non-deterministic elements. The random shuffling of cards provides a great deal of uncertainty. The cards held by an opponent or in the reamining deck provide the imperfect information. A poker player has only a small incomplete description of the entire game-state. Furthermore, opponents play deceptively in order to hide information about the cards they hold. These properties prevent traditional game-tree search methods from working. In order to build a high-performance poker playing agent, a multitude of difficult problems must be solved in machine learning, search and simulation, and belief management. Poki, a state-of-the-art poker playing program is discussed, with focus on the problem of opponent modelling. Several methods of opponent modelling are evaluated, including nerual networks and decision trees. Poki can currently play poker at the level of an average human player. We introduce a new method of directly searching imperfect information game trees, called miximax. Like mininmax search, we do a search of the game-tree. Since the objective best moves of the opponent cannot be determined (we do not know their cards) we cannot choose a best move in the search. Instead, we use our knowledge of the opponent to choose their mixed strategy. Instead of using a min, we use a mix of their choices. The result is a reliable estimate of the expected value of our choices.

      Hide Abstract.

    • [PDF]

  • Improved Opponent Modeling in Poker
    • Authors: Aaron Davidson, Darse Billings, Jonathan Schaeffer, and Duane Szafron
    • Purpose: Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI'2000)
    • Date: 2000
    • Show Abstract.
      The game of poker has many properties that make it an interesting topic for articial intelligence (AI). It is a game of imperfect information, which relates to one of the most fundamental problems in computer science: how to handle knowledge that may be erroneous or incomplete. Poker is also one of the few games to be studied where deriving an accurate understanding of each opponent's style is an essential element to success. In developing a strong poker program, the opponent modeling method has always been a central component of the system. As other aspects of the program were improved, the techniques for modeling once again became a limiting factor to the overall level of play. As a result, the topic has been revisited. This paper reports on recent progress achieved by improved statistical methods, which were suggested by experiments using articial neural networks.

      Hide Abstract.

    • [PDF]

  • Probabilities and Simulations in Poker
    • Authors: Lourdes Peña
    • Purpose: M. Sc. Thesis
    • Date: September 10, 1999
    • Show Abstract.
      Poker is an imperfect information game that requires decision-making under conditions of uncertainty, much like many real-world applications. Strong poker players have to skillfully deal with multiple opponents, risk management, opponent modeling, deception and unreliable information. These features make poker an interesting area for Artificial Intelligence research. This thesis describes work done on improving the knowledge representation, betting strategy, and opponent modeling of Loki, a poker-playing program at the University of Alberta. First, a randomized betting strategy that returns a probability triple is introduced. A probability triple is a probabilistic representation of betting decisions that indicates the likelihood of each betting action occurring in a given situation. Second, real-time simulations are used to compute the expected values of betting decisions. These simulations use selective sampling to maximize the information obtained with each simulation trial. Experimental results show that each of these enhancements represents a major advance in the strength of Loki.

      Hide Abstract.

    • [PDF]

  • Using Probabilistic Knowledge and Simulation to play Poker.
    • Authors: Darse Billings, Lourdes Pena, Jonathan Schaeffer, Duane Szafron
    • Purpose: Proceedings of the Sixteenth National Conference of the American Association for Artificial Intelligence (AAAI)
    • Date: 1999
    • Show Abstract.
      Until recently, artificial intelligence researchers who use games as their experimental testbed have concentrated on games of perfect information. Many of these games have been amenable to brute-force search techniques. In contrast, games of imperfect information, such as bridge and poker, contain hidden information making similar search techniques impractical. This paper describes recent progress in developing a high-performance poker-playing program. The advances come in two forms. First, we introduce a new betting strategy that returns a probabilistic betting decision, a probability triple, that gives the likelihood of a fold, call or raise occurring in a given situation. This component unifies all the expert knowledge used in the program, does a better job of representing the type of decision making needed to play strong poker, and improves the way information is propagated throughout the program. Second, real-time simulations are used to compute the expected values of betting decisions. The program generates an instance of the missing data, subject to any constraints that have been learned, and then simulates the rest of the game to determine a numerical result. By repeating this a sufficient number of times, a statistically meaningful sample is used in the program's decision-making process. Experimental results show that these enhancements each represent major advances in the strength of computer poker programs.

      Hide Abstract.

    • [PDF]

  • Learning to Play Strong Poker
    • Authors: Jonathan Schaeffer, Darse Billings, Lourdes Pena, and Duane Szafron
    • Purpose: International Conference on Machine Learning 1999 Workshop Machine Learning in Game Playing (ICML)
    • Date: 1999
    • Show Abstract.
      Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect knowledge, where multiple competing agents must deal with risk management, opponent modeling, unreliable information, and deception, much like decision-making applications in the real world. Opponent modeling is one of the most difficult problems in decision-making applications and in poker it is essential to achieving high performance. This paper describes and evaluates the implicit and explicit learning in the poker program Loki. Loki implicitly "learns" sophisticated strategies by selectively sampling likely cards for the opponents and then simulating the remainder of the game. The program has explicit learning for observing its opponents, constructing opponent models and dynamically adapting its play to exploit patterns in the opponents' play. The result is a program capable of playing reasonably strong poker, but there remains considerable research to be done to play at a world-class level.

      Hide Abstract.

    • [PDF]

  • Using Selective-Sampling Simulations in Poker
    • Authors: Darse Billings, Denis Papp, Lourdes Pena, Jonathan Schaeffer, and Duane Szafron
    • Purpose: roceedings of the AAAI Spring Symposium on Search Techniques for Problem Solving under Uncertainty and Incomplete Information
    • Date: 1999
    • Show Abstract.
      Until recently, AI research that used games as an experimental testbed has concentrated on perfect information games. Many of these games have been amenable to so-called brute-force search techniques. In contrast, games of imperfect information, such as bridge and poker, contain hidden knowledge making similar search techniques impractical. This paper describes work being done on developing a world-class poker-playing program. Part of the program's playing strength comes from real-time simulations. The program generates an instance of the missing data, subject to any constraints that have been learned, and then searches the game tree to determine a numerical result. By repeating this a sufficient number of times, a statistically meaningful sample can be obtained to be used in the program's decision-making process. For constructing programs to play two-player deterministic perfect information games, there is a well-defined framework based on the alpha-beta search algorithm. For imperfect information games, no comparable framework exists. In this paper we propose selective sampling simulations as a general-purpose framework for building programs to achieve high performance in imperfect information games.

      Hide Abstract.

    • [PDF]

  • Opponent Modeling in Poker
    • Authors: Darse Billings, Denis Papp, Jonathan Schaeffer, and Duane Szafron
    • Purpose: Proceedings of the Fifteenth National Conference of the American Association for Artificial Intelligence (AAAI)
    • Date: 1998
    • Show Abstract.
      Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect knowledge, where multiple competing agents must deal with risk management, agent modeling, unreliable information and deception, much like decision-making applications in the real world. Agent modeling is one of the most difficult problems in decision-making applications and in poker it is essential to achieving high performance. This paper describes and evaluates Loki, a poker program capable of observing its opponents, constructing opponent models and dynamically adapting its play to best exploit patterns in the opponents' play.

      Hide Abstract.

    • [PDF]

  • Poker as a Testbed for Machine Intelligence Research
    • Authors: Darse Billings, Denis Papp, Jonathan Schaeffer, and Duane Szafron
    • Purpose: Proceedings of AI'98, (Canadian Society for Computational Studies in Intelligence)
    • Date: 1998
    • Show Abstract.
      For years, games researchers have used chess, checkers and other board games as a testbed for machine intelligence research. The success of world-championship-caliber programs for these games has resulted in a number of interesting games being overlooked. Specifically, we show that poker can serve as a better testbed for machine intelligence research related to decision making problems. Poker is a game of imperfect knowledge, where multiple competing agents must deal with risk management, agent modeling, unreliable information and deception, much like decision-making applications in the real world. The heuristic search and evaluation methods successfully employed in chess are not helpful here. This paper outlines the difficulty of playing strong poker, and describes our first steps towards building a world-class poker-playing program.

      Hide Abstract.

    • [PDF]

  • Dealing with Imperfect Information in Poker
    • Authors: Denis Papp
    • Purpose: M. Sc. Thesis
    • Date: November 30, 1998
    • Show Abstract.
      Poker provides an excellent testbed for studying decision-making under conditions of uncertainty. There are many benefits to be gained from designing and experimenting with poker programs. It is a game of imperfect knowledge, where multiple competing agents must understand estimation, prediction, risk management, deception, counter-deception, and agent modeling. New evaluation techniques for estimating the strength and potential of a poker hand are presented. This thesis describes the implementation of a program that successfully handles all aspects of the game, and uses adaptive opponent modeling to improve performance.

      Hide Abstract.

    • [PDF]

  • Computer Poker
    • Authors: Darse Billings
    • Purpose: M. Sc. Thesis
    • Date: August 30, 1995
    • Show Abstract.
      Games are an interesting and challenging domain for computer science research, having the nice characteristics of a clearly defined set of rules and a specific goal. Developing a program to play a strategic game well often involves the application of theoretical concepts to practical situations, and the relative success of that endeavour can be measured with quantifiable results. The game of poker is logistically simple yet strategically complex, and offers many properties not exhibited by chess, checkers, and most other well-studied games. Most importantly, poker is a non-deterministic game with imperfect (hidden) information. Handling unreliable or incomplete information is a fundamental problem in computer science, and poker provides an excellent domain for investigating problems of decision making under conditions of uncertainty. Somewhat surprisingly, the potential benefits of studying poker have been largely overlooked by computer scientists and game researchers. In this essay we survey what resources are available for academic researchers, and lay some foundations for the scientific exploration of this fascinating game.

      Hide Abstract.

    • [PDF]

Copyright © 2011