Saturday, January 17, 2009

Quote of the day

It should be obvious that while garbage collection (or compaction) is going on, the execution of the user's program must be suspended. Hence out of common courtesy to the user, we must provide for some notification--perhaps a flashing message in the corner of the screen--that garbage collection is in progress. Otherwise when the program stops running, the user may suspect a bug, and an inexperienced user may panic.

--Introduction to Compiler Construction by Thomas Parsons, Chapter 8.2, "Runtime memory management"
(The book spends about half a page on garbage collection, mostly to describe the mark-sweep algorithm.)


With Lisp machines, it used to be that people would talk about things like running the garbage collector overnight, when the machine was not in use. Virtual memory in use was typically several times the size of the physical memory. Within its context, the quote is accurate (or at least 10 years earlier it was): running mark-sweep on mostly virtual memory with a slow disc will take a lot of time.

I'm happy that this concern is very outdated. If it weren't outdated, Factor (or Java) could never be a reasonable choice of programming language. There are two main reasons for the improvement.
  1. Memory size has increased. Programs that would use tons of swap space 15 years ago now fit comfortably on the RAM.
  2. Algorithms have improved. Not only do we have generational garbage collection to improve throughput, but there are also incremental techniques to spread out the pause, and various methods to take advantage of multiple processors.

So now, the idea of a flashing message for garbage collection in progress is hilariously quaint. If I were going to write a compiler textbook, I'd spend a whole chapter on garbage collection. It is essential for any modern programming language, and a good algorithm is essential.

5 comments:

Adelle said...

It doesn't seem that quaint when a poorly written system tray app that got installed with my display driver uses 2xPhysical memory and then freezes the PC for 30 seconds when it does a forced GC due to virtual memory exhaustion (and after most other running apps have been paged-out). Maybe Java does it better, and maybe .Net 1.0 did it particularly badly, but I cannot be convinced that GC is ever a good idea on a PC that has more virtual memory than physical memory, and I am far from certain that it could be a good idea ever. Reference counting is a perfectly acceptable solution in wide range of circumstances - sometimes suboptimal but always predictable (and with well-known optimizations for the suboptimal cases).

Daniel Ehrenberg said...

Adelle,
I don't know what application you're talking about, but it's a pretty easy change to the program to force a GC earlier than virtual memory exhaustion. Most GCs force a collection before growing the amount of memory available. A runtime would have to be hardly tested at all to not notice an error like this. I'd blame this on the GC implementation rather than GC itself.

Anonymous said...

Disclaimer: I don't specialize in garbage collection, but once every few years I have reason to review the state of the art, so I'm reasonably up to date for someone who doesn't spend a significant percentage of their life on GC.

So...my current understanding is that a really good GC can have almost zero overhead so long as no more than 1/7 of total RAM is allocated at any given time (that precise ratio appears in the research), and that GC overhead grows to the majority of overhead as that fraction grows to 1.0.

Further, I do not recall any general GC algorithms that avoid a cliff (many orders of magnitude of slowdown) once allocated memory grows beyond physical RAM and spills out into virtual pages.

(All of the ones I've seen that make more grandiose claims turn out to be totally vague in their support of such claims.)

If anyone has citations that have firm claims that do significantly better than the above for normal use cases, I'd be very interested.

P.S. Although, as is well known, copying generational GC has big wins on many kinds of apps, it is well-documented in the literature that other kinds of apps (arguably a smaller class of apps) get a bigger win from simple reference count GC.

Therefore, people who choose to reference count for simplicity needn't feel ashamed and inadequate; it can be argued to be a YAGNI issue: implement e.g. copying generational GC if and only if it becomes proven to justify its considerable implementation effort.

-- Doug Merritt

Chris Forbes said...

Adelle:

The particular app you're talking about goes crazy and allocates out the wazoo - and most of that memory cannot be reclaimed by ANY GC technique, since it's still referenced.

Btw, .NET 1.0's GC isn't exactly state-of-the-art anymore, but it's no worse than Java 5 - probably better actually.

Markus Kohler said...

I think the author of the book still has some point.

I know many cases where users would say that "This Java application is so slow" and even generalize that to "Java is slow" because the memory configured was too small.
In general most GC's used today don't work well when physical memory is exceeded. This can lead to long pauses, and the application typically (in Java atels ) has know way to inform the user about the problem.

Yes, today GC's usually don't stop the the world for a long time, but most do at least sometimes.

So IMHO it would make sense if there would be standard API's that can be used to figure out whether there was enough memory configured.