Memory Allocation and Garbage Collection

One strength of the J2SE platform is that it shields the complexity of memory allocation and garbage collection from the developer. However, once garbage collection is the principal bottleneck, it is worth understanding some aspects of this hidden implementation. Garbage collectors make assumptions about the way applications use objects, and these are reflected in tunable parameters that can be adjusted for improved performance without sacrificing the power of the abstraction.

An object is considered garbage when it can no longer be reached from any pointer in the running program. The most straightforward garbage collection algorithms simply iterate over every reachable object. Any objects left over are then considered garbage. The time this approach takes is proportional to the number of live objects, which is prohibitive for large applications maintaining lots of live data.

Beginning with the J2SE platform, version 1.2, the virtual machine incorporated a number of different garbage collection algorithms that are combined using generational collection. While naive garbage collection examines every live object in the heap, generational collection exploits several empirically observed properties of most applications to avoid extra work.

The most important of these observed properties is infant mortality. The blue area in the diagram below is a typical distribution for the lifetimes of objects. The X axis is object lifetimes measured in bytes allocated. The byte count on the Y axis is the total bytes in objects with the corresponding lifetime. The sharp peak at the left represents objects that can be reclaimed (i.e., have “died”) shortly after being allocated. Iterator objects, for example, are often alive for the duration of a single loop.

Some objects do live longer, and so the distribution stretches out to the the right. For instance, there are typically some objects allocated at initialization that live until the process exits. Between these two extremes are objects that live for the duration of some intermediate computation, seen here as the lump to the right of the infant mortality peak. Some applications have very different looking distributions, but a surprisingly large number possess this general shape. Efficient collection is made possible by focusing on the fact that a majority of objects “die young”.

To optimize for this scenario, memory is managed in generations, or memory pools holding objects of different ages. Garbage collection occurs in each generation when the generation fills up. Objects are allocated in a generation for younger objects or the young generation, and because of infant mortality most objects die there. When the young generation fills up it causes a minor collection. Minor collections can be optimized assuming a high infant mortality rate. The costs of such collections are, to the first order, proportional to the number of live objects being collected. A young generation full of dead objects is collected very quickly. Some surviving objects are moved to an tenured generation. When the tenured generation needs to be collected there is a major collection that is often much slower because it involves all live objects.

The diagram below shows minor collections occurring at intervals long enough to allow many of the objects to die between collections. It is well-tuned in the sense that the young generation is large enough (and thus the period between minor collections long enough) that the minor collection can take advantage of the high infant mortality rate. This situation can be upset by applications with unusual lifetime distributions, or by poorly sized generations that cause collections to occur before objects have had time to die.

The default arrangement of generations looks something like this.

  • At initialization, a maximum address space is virtually reserved but not allocated to physical memory unless it is needed. The complete address space reserved for object memory can be divided into the young and tenured generations.
  • The young generation consists of eden plus two survivor spaces . Objects are initially allocated in eden. One survivor space is empty at any time, and serves as a destination of the next, copying collection of any live objects in eden and the other survivor space. Objects are copied between survivor spaces in this way until they old enough to be tenured, or copied to the tenured generation.
  • One portion of the tenured generation called the permanent generation is special because it holds all the reflective data of the virtual machine itself, such as class and method objects.

The default garbage collector is meant to be used by applications large and small. Its default parameters were designed to be effective for most small applications. The default parameters aren’t optimal for many server applications.

Sizing the generation


Credits: Tuning Garbage Collection

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.