#cards #poker #mcts #artificial-intelligence #hold-em #cfr

bin+lib robopoker

Implementations of Meta's Pluribus No-Limit Texas Hold-Em solution

2 releases

0.1.1 Oct 7, 2024
0.1.0 Oct 7, 2024

#141 in Games

MIT license

160KB
3.5K SLoC

robopoker

license build

robopoker is a Rust library to play, learn, analyze, track, and solve No-Limit Texas Hold'em.

The guiding philosophy of this package was to use very tangible abstractions to represent the rules, mechanics, and strategies of NLHE. Every module is modeled as closely as possible to its real-world analogue, while also utilizing clever representations and transformations to be as memory- and compute-efficient as possible.

Modules

cards

This module offers basic data structures and algorithms for a standard 52-card French deck. We define cards at a very low level while still benefiting from high level abstractions, due to Rust's support of type-safe transformations between various representations of isomorphic data types.

For example, the Rank, Suit, Card, Hand, and Deck types encapsulate all the high-level features you expect: order, equality, shuffling, dealing, and hand evaluation are all derived from precise representations and efficient manipulations of memory that enable high-performance while preserving zero-cost abstractions. Isomorphic representations of these data structures across u8, u16, u32, u64 types provide flexibility that is useful for random deck generation, hand evaluation, set-theoretic operations, and (de)serialization across network boundaries.

Hand as u64 ends up allowing for very efficient bit manipulations, since all unordered subsets of cards are uniquely represented. Utility methods for iterating, drawing, inserting, union, and counting all emerge from natural bitwise equivalents.

Evaluator provides ranking poker hands while avoiding the high cost of explosive combinatorics. We lean heavily into idiomatic Rust by using lazy evaluation over the Option<HandStrength> monad to implement priority search of card rankings. In the future, it may be worth considering the time-space tradeoff between the current lazy implementation and a possible eager one to do lookups.

clustering

The literature [2, 3] suggests hierarchical k-means clustering for information absraction. This is the dimensionality reduction that we apply by grouping similar observed chance outcomes while being completely agnostic to any strategy. The main idea is to recursively, from the outer (river) up, decompose observations into the space of their lower-level abstractions. We build up each (non-river) layer as a distribution of distributions. By equipping the dimensionally-sparce space of information abstractions with an Earth mover's distance metric, we can learn clusters via k-means.

The outer (river) layer is clustered uniquely. Each River runout of 2 private and 5 public cards is evaluated against all possible 2-card villain hands to compute equity. Equities, measured as float percentage, are converted into percentile buckets, yielding the only variant of Abstraction that is naturally equipped with a distance metric. Perhaps a 7-card lookup table or a stochastic villain sampling would be orders of magnitude more performant, but given the amortized one-time cost of these calcuations, we can trade off convenience for accuracy.

Ultimately, we are brute forcing this step by iterating over the entire space of 3.1B distinguishable game states, which requires enumerating all 3T possible NLHE showdowns. An impending optimization will be to reduce the space of Observation by enforcing strategic isomorphism. Even at this first preprocessing step, scale becomes a massive bottleneck.

We benchmarked three persistence mechanisms for calculating River equities on a (M1 CPU; 16GB RAM; 2TB DISK) machine, with hand-optimized Postgres configuration settings in postgres.conf. However, it's worth noting that any practical attempt to run this library will require ~1TB of RAM. Each CPU core can calculate River equities at approximately 1k Observations / second, and multithreading is set up for this embarassingly parallel task.

cfr

Traits and implementation of a counter-factual regret minimization engine, automated range generation, and parametrized reinforcement learning. Currently, we implement a variant of CFR+ which uses positive regrets and updates strategies between players in distinct iteration steps. The algorithm is almost agnostic to the rules of whatever imperfect-information game is being solved, as long as we implement traits that allow for local

play

This module offers an out-of-the-box way to play NLHE without crossing any network boundaries. The hierarchy of data structures follows from a tree-based abstraction of the game.

  • Seat encapsulates public information related to a player.
  • Action is an enum that transitions the state of the game. Chance actions (card draws) and choice actions (player decisions) are both edges between nodes of possible game states.
  • Rotation is a path-invariant representation of the current game state. It is a minimal data structure, with most relevant information exposed via pure methods.
  • Game is a path-aware representation of the current game hand. It is the smallest solvable subset of NLHE, and a direct representation of the game tree abstraction that is used by solvers and players alike.
  • Table is a mechanism to advance that game state encapsulated by Game and Rotation according to the rules of the game, assuming input comes from a synchronized subroutine. For games to be played across a network boundary, custom implementation of the game loop must be used to account for distributed game state and fault tolerance.

api

Coming soon. A distributed and scalable single-binary WebSocket-based HTTP server that allows players to play, learn, analyze, and track hands remotely.

[1] Regret Minimization in Games with Incomplete Information. Advances in Neural Information Processing Systems, 20. (https://proceedings.neurips.cc/paper/2007/file/08d98638c6fcd194a4b1e6992063e944-Paper.pdf) In NIPS. [2] Discretization of Continuous Action Spaces in Extensive-Form Games. (http://www.cs.cmu.edu/~sandholm/discretization.aamas15.fromACM.pdf) In AAMAS. [3] Potential-Aware Imperfect-Recall Abstraction with Earth Mover’s Distance in Imperfect-Information Games. (http://www.cs.cmu.edu/~sandholm/potential-aware_imperfect-recall.aaai14.pdf) In AAAI.

Dependencies

~12–21MB
~308K SLoC