Tuesday, May 11, 2021

Reviews of Ancient Systems Papers: "Distributed Operating Systems" by Tanenbaum and Van Renesse, 1985.

 I've been reading some of the papers that were assigned reading in MIT class 6.033 Computer Systems Engineering, both when I took it in about 1984 and when I helped teach it, as a teaching assistant, in about 1992.  This class is aimed at MIT seniors, and involves a large writing component. I think there is a lot of value in those old papers.  Some of it hasn't aged very well, however.  Here I'll review some of those old papers, 37 years after I originally read most of them.

I'll start this series (note the optimism!  A series!  So far, containing one element) by reviewing item number 28 1991 6.033 Reading List is Tanembaum and Van Renese, "Distributed Operating Systems"Tanenbaum and Van Renesse, "Distributed Operating Systems", Computing Surveys,17:4, December 1985.

The authors are interested in "distributed operating systems".  On such a system, the users can be largely unaware "of the fact that multiple machines are being used."   As far as I know, such such operating system has gained traction.  We are still using "network operating systems" today.  I suspect that's for good reason:  failures in distributed systems are too hard to hide.

Keep in mind that this paper was written in 1985, before Paxos (or Viewstamp replication) was invented.  So the problem of coping with errors was wide open.   No one knew how to make a distributed transaction that wouldn't get stuck.  In 1988, Lynch, Dwork and Stockmeyer showed that consensus was possible, in theory, in distributed system.  In 1988 Oki and Liskov described Viewstamp replication, and in 1989, Lamport described Paxos, both of which can be used to handle errors.   In some sense, those consensus  algorithms, which I'll call "Paxos" here, are the beginning of modern distributed systems: I might even argue that every correct distributed algorithm is built starting with Paxos.  Feel free to prove me wrong with examples!

This paper, in Section 2.1.3, "error handling", alludes to the problem.  It points out that "in a centralized system, a system crash means that the client, server, and communication channel are all completely destroyed, and no attempt is made to revive them.  In a distributed system, matters are more complex."  They go on to write about how if one machine fails, the other may end up waiting forever, and then if it has a timeout, it may timeout just when the failing machine comes back online.   "Simply waiting until the server is rebooted and trying again sometimes works and sometimes does not".   They conclude that doing an operation exactly once "is probably impossible".  Today we know better.

Their section on scheduling feels out-of-date.  These days we know how much more about scheduling.

They spend a lot of time on the details of remote procedure calls, naming, and protection.  As far as I can tell, these issues haven't changed much in 37 years.

When this paper was written, Sun's NFS was brand new.  NFS doesn't handle failures very well, and sadly it's almost still the state of the art for filesystems.   I don't know any distributed filesystem protocols that actually work in the face of failures.  (There are plenty of distributed filesystems built on top of Paxos, such as Amazon's EFS and Oracle's File Storage Service.  But they all talk NFS to the clients, and it's really tough to write applications that cope with NFS failures.   I find the paper's discussion of how to build distributed filesystems to be a painful:  they didn't really get at the difficult issues of distributed filesystems.  The problem isn't the filesystem (at least not when you have Paxos), the problem is that the programming interface that everyone is using just doesn't work well in a distributed environment.  For example, Unix open(), read(), write(), fsync(), and close() functionality is an unsolved problem.   What's needed is an API that both permits caching (so that the filesystem will be fast enough) and understands failures (so the application can cope with things like network partitions).

They conclude that distributed operating systems "will be an interesting and fruitful area of research for a number of years to come".  It seems like that may still be true.




No comments: