How the Mark-Sweep-Compact Algorithm Works

Page Contents

The JVM has a choice of several different garbage collection algorithms. With the wide range of applications in modern computing, one size definitely doesn’t fit all. Most of these use the underlying Mark-Sweep-Compact algorithm, at least for some aspects of garbage collection.

To effectively monitor and tune GC, it’s worth taking the time to understand how the Mark-Sweep-Compact algorithm in Java actually works.

What is Garbage Collection and Why is it Important to Monitor its Performance?

Most modern languages use some form of garbage collection to free up memory that’s no longer in use. Earlier programming languages left memory management to the developer, and were often prone to memory leaks and out-of-memory errors.

In Java, the GC releases any memory that no longer has valid references pointing to it. This can happen for various reasons, including:

A method completes, so its local variables are no longer in scope;
Resources such as streams are closed;
The programmer explicitly sets a variable to null.

GC can become heavy on system resources, especially if it’s poorly tuned. This can cause unacceptable application performance. For critical applications, it’s important to monitor GC performance to pick up any issues before they become a real problem.

For more information about GC, you may like to read: What is Java Garbage Collection?

Garbage Collection Algorithms in Java.

Java has seven different GC collection algorithms: Serial, Parallel, Concurrent Mark Sweep, G1 GC, ZGC, Shenandoah and Epsilon. You can learn more about these, with their strengths and weaknesses, in this article dealing with Garbage Collection Algorithms in Java.

Each of these uses the basic computer science concepts of garbage collection in different forms and combinations.

Some of the basic GC algorithms used in computing are:

Mark-Sweep: Marks objects currently in use, and sweeps unmarked objects from memory;
Mark-Sweep-Compact: Marks objects currently in use, sweeps away unmarked objects and defragments the memory space;
Mark-Copy: Marks objects currently in use and copies them to a new space, freeing the original space for re-use;
Mark-Compact: Marks objects in use and compacts them.

In this article, we’re primarily interested in the Mark-Sweep-Compact algorithm, and how it’s applied in Java.

The Mark-Sweep-Compact Algorithm in Java: How it Works

For each object in the heap, the GC maintains an allocation list, which is also known as the OOP (Ordinary Object Pointer) table. Every object has a flag, initially cleared, which indicates if it is in use.

The Mark-Sweep-Compact algorithm has three phases, as listed below.

1. Mark Phase

This phase begins by finding ‘garbage roots’: objects that are known to be in use. It identifies these by searching for:

Methods and variables held in the stack;
Static Variables and methods;
Constants;
Objects used by the native stack.

GC then checks each root to see what objects in the heap it refers to. It recursively checks each one to find all objects that still have a live reference. The path from the root to each live object is known as the reference chain.

The diagram below illustrates this:

Fig: Garbage Roots Showing Reference Chain

For example, Object 1 is a root, which holds references to Objects 4 and 7. Object 4 has a reference to Object 5, which in turn has a reference to Object 9.

If Object 1 in the previous diagram no longer exists, possibly because it was popped from the stack, all the objects in its reference chain would no longer have valid references, unless they were also referred to by other live objects. This is shown in the diagram below::

Fig: Reference Chain after Object 1 is removed

Objects 4, 5, 6,7 and 9 become eligible for garbage collection.

For each object in the reference chain, the Mark phase sets its flag in the allocation list to indicate that it’s in use. Once the phase has completed, any item in the allocation list that does not have its flag set is deemed to be garbage.

Sweep Phase

The sweep phase checks each entry in the allocation list. If the flag is not set, it adds the memory previously held by that object to the JVM’s free list, so the memory can be reused. It also removes the object from the allocation list.

GC then clears the flag on all items in the allocation list ready for the next cycle.

Compact Phase

After the sweep, memory will have been released, but it won’t always be in a contiguous block. This means that the JVM may still have problems allocating memory for large objects, even though there is enough free space in total.

The purpose of the Compact phase is to gather all the used memory together at one end of the heap space, leaving a large block of free memory available for allocating new objects. This is illustrated in the diagram below. It shows fragmented memory after the Sweep phase, and contiguous memory after the Compact phase.

Fig: Fragmented vs Compacted Memory

This phase is expensive both time-wise and resource-wise. All objects must be relocated, and every reference to each object must be updated to point to its new location.

Different algorithms may be used to achieve compaction, including:

Table-based: The GC keeps a table of old and new locations, and all objects in the heap space are relocated.
Compression: This algorithm attempts to keep as many objects as possible in their original location.

How the Mark-Sweep-Compact Algorithm is Used in Java

Each of the GC algorithms in Java approaches the task differently.

Serial GC: Uses the Mark-Sweep-Compact algorithm when cleaning the Old Generation, but uses the Mark-Copy algorithm for the Young Generation.
Parallel GC: Works in the same way as Serial, but several worker threads operate together to speed up the tasks.
CMS: Uses the Mark-Sweep algorithm working concurrently as far as possible with application threads. Compaction is only done as a last resort, if the system is unable to allocate an object due to insufficient contiguous space. In practice, this resulted in several problems. Compaction, if it happened, led to long pauses where application threads couldn’t run. Additionally, fragmented memory impacted on system performance. CMS was discontinued in later versions of Java.
Newer Algorithms: These include G1, Z and Shenandoah. They use a combination of algorithms for different scenarios. Memory is split into much smaller regions, and GC works on only one region at a time. Compaction can then be done really quickly when needed. Much work is done concurrently with application threads, using load barriers to prevent conflicts.

Monitoring Garbage Collection

Enabling GC logs allows us to monitor various aspects of GC, including time spent in the Mark, Sweep and Compact phases.

To enable the log, we use command line switches when invoking the JVM.

For Java 9 and later, the switch is: -Xlog:gc*:file=<gc-log-file-path>
For Java 8 and earlier, the switches are: -XX:+PrintGCDetails -Xloggc:<gc-log-file-path>

The easiest way to analyze the logs is to upload them to GCeasy, which produces a range of diagnostic graphs and reports. The image below shows a section of a GCeasy report showing graphs of time spent by GC task:

Fig: Section of GCeasy Report Showing Time Spent by Task

In this example, the program ran using the G1 GC algorithm.

You can use the following link to see the full GCeasy report.

Conclusion

The Mark-Sweep-Compact algorithm in Java is used in most of the GC algorithms for some of their tasks. The newer algorithms have adapted and refined this strategy to avoid long GC pause times.

Using GCeasy, we can monitor and analyze GC performance to pre-empt major issues before they become a problem in production,

GC easy – Universal Java GC Log Analyser