Why garbage collect?

Language requirements

Garbage collection may be essential or merely highly desirable. It may be a language requirement: heap allocation is required for data structures that may survive the procedure that created them. If these data structures re then passed to further procedures or functions, it may be impossible for the programmer or compiler to determine the point at which it is safe to deallocate them. The prevalence of sharing and delayed execution of suspensions means that functional languages often have particularly unpredictable execution orders. Garbage collection is mandatory.

Problem requirements

Garbage collection may be a problem requirement. Boehm and Chase offer a helpful illustration [Boehm and Chase, 1992]. Suppose a general stack data type is to be implemented is C as A linked list. Each node on the stack contains two pointer fields: data and next. The Pop operation is to deallocate the top of the stack, call it first, and return a pointer to the remainder of the stack. Should Pop deallocate the data referenced from the top element, first->data? If the data is statically allocated, the answer is ‘no’. Otherwise, if this is the last reference to the data, the answer is ‘yes’. If the data may be pushed on to more than one stack --- it is in Diagram 1.5 on the preceding page --- the answer is ‘maybe’. Some convention is required for deallocation even for such a simple abstraction. This will either complicate the interface to the stack, reduce its applicability or force unnecessary copying (so that deallocation decisions can be made locally).

Should the data object be deallocated when the stack is popped? Diagram 1.5 Should the data object be deallocated when the stack is popped?

Software engineering issues

Software engineering is most succinctly described as the management of complexity in large scale software systems. Two of the most powerful tools available to the software engineer are abstraction and modularity. We strongly believe that explicit memory management cuts against these principles. Automatic memory management gives increased abstraction to the programmer. The model of memory allocation is less low-level and programmers are relieved of the burden of book-keeping detail: their time is better spent on higher-level details of the design, and implementation of the programming problem at hand. Memory management by the run- time system is adopted by all high-level programming languages for static and stack-allocated data. Abstracting away from such low-level issues is universally recognized by designers of high-level programing languages to be essential for global and lexically-scoped data. Programmers do not have to worry where to place global data, or how to set up or take down procedure activation frames on the stack. We believe that the case for abstraction applies equally strongly to heap-allocated data in complex programs.

Reliable code is understandable code. At the level of the module, this means that a programmer should be able to understand its behavior from the module itself, or, in the worst case, a few neighbouring modules. It should not be necessary to understand an entire program before being able to develop a single module. This is clearly essential for large-scale projects involving teams of developers. In contrast, explicit allocation can allow one module to cause the failure of another through space leaks or premature reclamation of storage. The behavior of the module is no longer independent from the context in which it is used.

The oft-cited goal of allowing software components to be combined in the same way as hardware components requires that interfaces should be simple and well-defined. Modules that are extensible may be composed more easily with other modules: the module is reusable in different contexts. Increasing module cohesion also makes programs easier to maintain. Meyer suggests that every module should communicate with as few others as possible, and if any two modules do communicate, they should exchange as little information as possible [Meyer, 1988]. Wilson correctly observes that ‘liveness is a global property’ [Wilson, 1994]. Adding book-keeping detail to module interfaces weakens abstractions and reduces the extensibility of modules. Modifications to the functionality of a module might entail alteration of its memory management code. Since liveness is a non-local matter, changes to book-keeping code might radiate beyond the module being developed.

While global explicit dynamic memory management may be efficient and appropriate for monolithic systems built from hierarchical designs by stepwise refinement, this approach to design seems at odds with the philosophy of object-orientation. It conflicts with the principle of minimal communication and clutters interfaces. If objects are to be reused in different contexts, the new context must understand these rules of engagement, but this reduces the freedom of composition of objects. One author has suggested that the problem of memory management in complex systems may only be solvable without garbage collection if programs are designed with correct memory management as their prime goal [Nagle, 1995]. Garbage collection, on the other hand, uncouples the problem of memory management from class interfaces, rather than dispersing it throughout the code. This is why it has been a fundamental component of many object-oriented languages.

A further indication of the extent of this problem is the range of tools available to assist with checking correct usage of heap memory: the best-known examples include CenterLine [CenterLine, 1992] and Purify [Purify, 1992]. The very existence of tools of this kind reveals the importance of correct memory management and the difficulty of getting it right. However, such tools are only practically useful as debugging aids since they impose a considerable run-time overhead on programs (the CenterLine interpreter by a factor of fifty, the Puriy link-time library by a factor of two to four [Ellis, 1993]).

Although these tools are often very useful for tracking down programming errors, they do not address the heart of the problem. Debugging tools do nothing to simplify the interfaces of complicated systems, nor do they enhance the reusability of software components. Considerable effort still must be devoted to correcting an implementation or, even worse, a design after a leak or a dangling reference is discovered. Debugging tools tackle the symptoms rather than the disease itself. Garbage collection, on the other hand, is an effective software engineering tool because it relieves the programmer from the burden of discovering memory management errors by ensuring that they cannot arise.

Work by Rovner suggests that a considerable proportion of development time may be spent on memory management bugs [Rovner, 1985]. He estimated that forty percent of the time developing the Mesa system was spent on memory management[3]. Today, object-oriented programming languages are increasingly commonly used. Programs written in these languages typically allocate a greater proportion of their data on the heap than their conventional procedural counterparts. The data structures generated, and the problems tackled, by object-oriented programs are often more complex. These factors can only increase the intricacy of explicit storage management.

Designers and programmers are tempted to be over-defensive in order to overcome the complexities of explicit dynamic memory management. Data is allocated statically or copied between modules rather than being shared: each module is then free to destroy its copy of the object at will --- the global liveness decision is transformed into a local one. Unnecessary copying and static allocation are, at best, wasteful of space since cautions overestimates of memory requirements must be made. If used on large problems, however, static limits may prove inadequate and the programs will fail.

A commonly used alternative is to build a domain-specific garbage collector. Domain-specific collection often fails to take advantage of advances in garbage collection techniques. Because their applicability is by definition limited, the costs of development of such collectors cannot be amortized over a wide set applications. This means that testing is likely to be less thorough. Wilson notes that the very existence of such weakly engineered collectors is testimony to the importance of garbage collection [Wilson, 1994]. The solution is to make garbage collection part of system rather than a ‘bolt-on’ extra.


[3] There is a real need for more research to be published on the cost of memory management bugs to development time.

results matching ""

    No results matching ""