Project Proposal by Martin Stacey


Machine Poker

Software

OO language

Covers

Programming, artificial intelligence, probability theory.

Skills Required

Programming, interest in artificial intelligence.

Challenge

Conceptual ??? Technical ?? Programming ????

Brief Description

The aim of the project is to build intelligent agents that play poker against each other, and learn from experience how to play better poker.

This will require building a game mechanism that runs games between the different intelligent agents, and a multi-game framework using object persistence mechanisms for maintaining the states of agents between games and between runs of the system, which will run a very large number of games in each run. There will need to be a simple user interface for looking at what happens in individual poker games, showing summary statistics, and inspecting the states of the agents; however putting time into a glossy interface is likely to prove more of a distraction than it's worth. In an ideal world, a system that would allow humans to participate in the virtual poker games would be nice, but this shouldn't take precedence over the core of the project.

While different approaches to learning to play poker might be tried, and what is feasible might depend on which version of poker is used, probably the best or at least the easiest way to do this is to define a way to recognize particular game states, and a way to state rules that state what action is to be applied in each game state, and then learn the probabilities with which actions are triggered by particular game states. Whether using a genetic algorithm or a neural network might work will depend on how the game state descriptions are formulated.

It would be very interesting to find out whether or not the agents converged to the same set of rules as each other, or to the same set of rules in every run, or if there is ever a stable best ruleset. In an ideal world, the system would compare different learning algorithms, and see what happens if a successful agent stops learning.


Back to