Wednesday, August 18, 2021

The Original Alpha-Beta Search Paper

 I have an interesting story about trying to find a copy of the original paper describing alpha-beta search.  Back in 1992 I was doing research on massively-parallel computer chess for my Ph.D. thesis, and I was trying to do a literature review.  According to several sources, the paper was

  • Timothy P. Hart and Daniel J. Edwards, "The Tree Prune (TP) Algorithm", Artificial Intelligence Project Memo 30, RLE And Computation Center, Massachusetts Institute of Technology, Cambridge, MA, USA, December 4, 1961.   (Revised as D. J. Edwards and T. P. Hart, "The α-β Heuristic", AI Memo 30 (Revised), October 28, 1963.)
I was pleased that the original algorithm had been developed at MIT, but the problem was that I couldn't find the paper in the MIT libraries!  The MIT libraries had lost the paper.  It turned out that Tom Knight had  the first first 4 pages (out of 6) of the 1961 version, but not the 1963 version.

I emailed everyone I could find who had cited the paper (Berliner, Karp, Knuth, McCarthy, Nilsson,  and Slagel).  No one had a copy.  The email from Knuth bounced:  He had already abandoned email, and required a real hardcopy letter if you wanted a response.  I couldn't find the original authors (the web was just getting started), and was about to give up, but then I tried the white pages (the hardcopy phonebook), where I found Tim Hart in Lexington, Massachusetts.  I sent him a letter, and he responded with original mimeographed copies of both papers.


The history of alpha-beta search is murky.  The rest of this post comprises an incomplete review of that history.  (See Knuth and Moore for a more careful treatment.)
  • Slagle and Dixon (1969) wrote that Hart and Edwards provided the first detailed description of alpha-beta search (in 1961) and the first published use of the term "alpha-beta" (in the 1963 revision).  Slagel and Dixon provided the first proof about the effectiveness of alpha-beta search on best-ordered trees, whereas Hart and Edwards stated the theorem without proof.
  • Nils J. Nilson's 1971 book Problem Solving Methods in Artificial Intelligence cites Hart and Edwards as having done extensive investigation of alpha-beta search.   Nilson cites, on page V-49, a personal communication with A. L. Samuel who claimed that he had implemented alpha-beta search in his 1959 checkers program but that Samuel didn't realize the significance of the algorithm and hadn't included a description of it in his paper.   Samuel's work was actually pretty amazing: The entire checkers program was written in only 6800 instructions on an IBM 704, and included a transposition table.
  • Knuth and Moore (1975) is the standard reference for a mathematical treatment of alpha-beta pruning.  Knuth and Moore also cite (on page 303) that Samuel stated that alpha-beta search was present in his checker-playing programs.  Knuth and Moore state that McCarthy thought of alpha-beta search in 1956.  Knuth and Moore say that the first published account of alpha-beta search appeared in Russia (Brudno 1963), based on the fact that the  Hart and Edwards AI memo was "unpublished". Knuth and Moore note how early authors had a lot of trouble explaining alpha-beta search: the idea was difficult to express in the notation of the time.
Hart and Edwards write the algorithm in old McCarthy-style LISP (M expressions rather than the more modern S expressions).  

 

I think it's interesting that in 1961 they called in an algorithm, but called it an "heuristic" in 1963.  Today, we would consider an algorithm to be more "real" than a heuristic.  Maybe in 1963, when doing AI research, it was better to develop heuristics.   Knuth points out that McCarthy didn't realize that alpha-beta search always gives the right answer, and thought it was a possibly inaccurate heuristic. Today, we consider alpha-beta search to be an algorithm that relies on heuristics to get good performance. The important heuristics are the ones that guess what the best search order is for subtrees.  The algorithm works right without the heuristics, but the heuristics are important.  

Virtually all the analysis of the runtime discusses the case of searching a uniform best-ordered tree, but the whole point of the search is that you don't know the best move.  So we use heuristics to guess the best move from any position.  A simple heuristic in chess , which was explained to me by Berliner, is "try capturing the most valuable piece with the least valuable piece";  if you can capture a queen with a pawn, that's probably the best move, or at least one that's good enough to allow a lot of pruning.  Another important heuristic is "iterative deepening" : when searching to depth k, guess that the best move is the one that was best according to a search to depth k-1.

The previous work to Hart and Edwards was Newell, Shaw, and Simon (1958), which provided the first published description of alpha-beta-like tree pruning. Their algorithm does only alpha pruning, without deep cutoffs.