Wednesday, August 18, 2021

The Original Alpha-Beta Search Paper

 I have an interesting story about trying to find a copy of the original paper describing alpha-beta search.  Back in 1992 I was doing research on massively-parallel computer chess for my Ph.D. thesis, and I was trying to do a literature review.  According to several sources, the paper was

  • Timothy P. Hart and Daniel J. Edwards, "The Tree Prune (TP) Algorithm", Artificial Intelligence Project Memo 30, RLE And Computation Center, Massachusetts Institute of Technology, Cambridge, MA, USA, December 4, 1961.   (Revised as D. J. Edwards and T. P. Hart, "The α-β Heuristic", AI Memo 30 (Revised), October 28, 1963.)
I was pleased that the original algorithm had been developed at MIT, but the problem was that I couldn't find the paper in the MIT libraries!  The MIT libraries had lost the paper.  It turned out that Tom Knight had  the first first 4 pages (out of 6) of the 1961 version, but not the 1963 version.

I emailed everyone I could find who had cited the paper (Berliner, Karp, Knuth, McCarthy, Nilsson,  and Slagel).  No one had a copy.  The email from Knuth bounced:  He had already abandoned email, and required a real hardcopy letter if you wanted a response.  I couldn't find the original authors (the web was just getting started), and was about to give up, but then I tried the white pages (the hardcopy phonebook), where I found Tim Hart in Lexington, Massachusetts.  I sent him a letter, and he responded with original mimeographed copies of both papers.


The history of alpha-beta search is murky.  The rest of this post comprises an incomplete review of that history.  (See Knuth and Moore for a more careful treatment.)
  • Slagle and Dixon (1969) wrote that Hart and Edwards provided the first detailed description of alpha-beta search (in 1961) and the first published use of the term "alpha-beta" (in the 1963 revision).  Slagel and Dixon provided the first proof about the effectiveness of alpha-beta search on best-ordered trees, whereas Hart and Edwards stated the theorem without proof.
  • Nils J. Nilson's 1971 book Problem Solving Methods in Artificial Intelligence cites Hart and Edwards as having done extensive investigation of alpha-beta search.   Nilson cites, on page V-49, a personal communication with A. L. Samuel who claimed that he had implemented alpha-beta search in his 1959 checkers program but that Samuel didn't realize the significance of the algorithm and hadn't included a description of it in his paper.   Samuel's work was actually pretty amazing: The entire checkers program was written in only 6800 instructions on an IBM 704, and included a transposition table.
  • Knuth and Moore (1975) is the standard reference for a mathematical treatment of alpha-beta pruning.  Knuth and Moore also cite (on page 303) that Samuel stated that alpha-beta search was present in his checker-playing programs.  Knuth and Moore state that McCarthy thought of alpha-beta search in 1956.  Knuth and Moore say that the first published account of alpha-beta search appeared in Russia (Brudno 1963), based on the fact that the  Hart and Edwards AI memo was "unpublished". Knuth and Moore note how early authors had a lot of trouble explaining alpha-beta search: the idea was difficult to express in the notation of the time.
Hart and Edwards write the algorithm in old McCarthy-style LISP (M expressions rather than the more modern S expressions).  

 

I think it's interesting that in 1961 they called in an algorithm, but called it an "heuristic" in 1963.  Today, we would consider an algorithm to be more "real" than a heuristic.  Maybe in 1963, when doing AI research, it was better to develop heuristics.   Knuth points out that McCarthy didn't realize that alpha-beta search always gives the right answer, and thought it was a possibly inaccurate heuristic. Today, we consider alpha-beta search to be an algorithm that relies on heuristics to get good performance. The important heuristics are the ones that guess what the best search order is for subtrees.  The algorithm works right without the heuristics, but the heuristics are important.  

Virtually all the analysis of the runtime discusses the case of searching a uniform best-ordered tree, but the whole point of the search is that you don't know the best move.  So we use heuristics to guess the best move from any position.  A simple heuristic in chess , which was explained to me by Berliner, is "try capturing the most valuable piece with the least valuable piece";  if you can capture a queen with a pawn, that's probably the best move, or at least one that's good enough to allow a lot of pruning.  Another important heuristic is "iterative deepening" : when searching to depth k, guess that the best move is the one that was best according to a search to depth k-1.

The previous work to Hart and Edwards was Newell, Shaw, and Simon (1958), which provided the first published description of alpha-beta-like tree pruning. Their algorithm does only alpha pruning, without deep cutoffs.


Tuesday, May 11, 2021

Reviews of Ancient Systems Papers: "Distributed Operating Systems" by Tanenbaum and Van Renesse, 1985.

 I've been reading some of the papers that were assigned reading in MIT class 6.033 Computer Systems Engineering, both when I took it in about 1984 and when I helped teach it, as a teaching assistant, in about 1992.  This class is aimed at MIT seniors, and involves a large writing component. I think there is a lot of value in those old papers.  Some of it hasn't aged very well, however.  Here I'll review some of those old papers, 37 years after I originally read most of them.

I'll start this series (note the optimism!  A series!  So far, containing one element) by reviewing item number 28 1991 6.033 Reading List is Tanembaum and Van Renese, "Distributed Operating Systems"Tanenbaum and Van Renesse, "Distributed Operating Systems", Computing Surveys,17:4, December 1985.

The authors are interested in "distributed operating systems".  On such a system, the users can be largely unaware "of the fact that multiple machines are being used."   As far as I know, such such operating system has gained traction.  We are still using "network operating systems" today.  I suspect that's for good reason:  failures in distributed systems are too hard to hide.

Keep in mind that this paper was written in 1985, before Paxos (or Viewstamp replication) was invented.  So the problem of coping with errors was wide open.   No one knew how to make a distributed transaction that wouldn't get stuck.  In 1988, Lynch, Dwork and Stockmeyer showed that consensus was possible, in theory, in distributed system.  In 1988 Oki and Liskov described Viewstamp replication, and in 1989, Lamport described Paxos, both of which can be used to handle errors.   In some sense, those consensus  algorithms, which I'll call "Paxos" here, are the beginning of modern distributed systems: I might even argue that every correct distributed algorithm is built starting with Paxos.  Feel free to prove me wrong with examples!

This paper, in Section 2.1.3, "error handling", alludes to the problem.  It points out that "in a centralized system, a system crash means that the client, server, and communication channel are all completely destroyed, and no attempt is made to revive them.  In a distributed system, matters are more complex."  They go on to write about how if one machine fails, the other may end up waiting forever, and then if it has a timeout, it may timeout just when the failing machine comes back online.   "Simply waiting until the server is rebooted and trying again sometimes works and sometimes does not".   They conclude that doing an operation exactly once "is probably impossible".  Today we know better.

Their section on scheduling feels out-of-date.  These days we know how much more about scheduling.

They spend a lot of time on the details of remote procedure calls, naming, and protection.  As far as I can tell, these issues haven't changed much in 37 years.

When this paper was written, Sun's NFS was brand new.  NFS doesn't handle failures very well, and sadly it's almost still the state of the art for filesystems.   I don't know any distributed filesystem protocols that actually work in the face of failures.  (There are plenty of distributed filesystems built on top of Paxos, such as Amazon's EFS and Oracle's File Storage Service.  But they all talk NFS to the clients, and it's really tough to write applications that cope with NFS failures.   I find the paper's discussion of how to build distributed filesystems to be a painful:  they didn't really get at the difficult issues of distributed filesystems.  The problem isn't the filesystem (at least not when you have Paxos), the problem is that the programming interface that everyone is using just doesn't work well in a distributed environment.  For example, Unix open(), read(), write(), fsync(), and close() functionality is an unsolved problem.   What's needed is an API that both permits caching (so that the filesystem will be fast enough) and understands failures (so the application can cope with things like network partitions).

They conclude that distributed operating systems "will be an interesting and fruitful area of research for a number of years to come".  It seems like that may still be true.