We often hear the term ‘Stop-the-World Event’ when learning about Java Garbage Collection. What does this term mean? What causes it? What is the performance impact of this Event? Is there a way to optimize it? Let’s discuss this in this post.

Garbage Collection Evolution over decades

3 decades before Java came into existence, Developers when writing code were required to do three things in their code:

  1. Create new objects in memory
  2. Write the Business Logic 
  3. De-allocate the objects from memory

There was a challenge to this approach: Business applications are quite complex. They have multiple workflows/transaction types. If the Developer accidentally misses to deallocate the object even in one workflow, then those objects would grow its size and result in OutOfMemorError. Thus, OutOfMemoryError was pretty pervasive back in the days. When Java entered the programming language market in 1995, it requested developers to only two things:

  1. Create new objects in memory 
  2. Write the Business Logic code 

Java said that it will take care of deallocating the objects from memory automatically. Developers loved it, because they could focus on writing business code. Management liked it, because there would be less number of OutOfMemoryError. Since then, due to the popularity of this feature, most of the programming languages that have come to the market have this automatic garbage collection capability. 

Side Effects of Automatic Garbage Collection

Even though automatic garbage collection improves the developer productivity and application availability, it comes with a price. Automatic garbage collection has a couple of undesirable side effects:

a. GC Pauses (i.e. Stop-the-World Events)

b. High CPU Consumption

Modern applications tend to create tons of new objects. Even just to service a simple vanilla sign-on request, today we are creating thousands of objects. To do automatic garbage collection, JVM has to investigate every object in memory and has to find all the objects who are referencing it. Then once again it has to investigate those referencing objects to see who all are referencing these referencing objects. This scanning has to continue to multiple degrees until root level objects are identified. While this GC scanning is in motion, if our application modifies the object references then all the scanning done by the Garbage Collector would go for toss. Thus to avoid this problem, JVM pauses our application from running, whenever a GC event runs. This pause is called the ‘Stop-the-World‘ event. i.e. JVM’s world has stopped. No application threads will be allowed to run; No customer transactions would be processed when the GC event is running. 

GC Myths Busted: Both Young GC and Full GC Pause the JVM

There are two types of GC events:
a. Young GC
b. Full GC

Young GC runs on the Young Gen of the Memory, whereas Full GC runs on all the memory regions (i.e. Young Gen, Old Gen and MetaSpace). 

Industry refers only to Full GC as the Stop-the-World GC event. This is misleading and often confusing to engineers. Because it gives the impression that Young GC doesn’t stop the JVM. Both Young GC and Full GC do stop the JVM. Since Young Gen is a small region, young GC typically stops the JVM for a small time period when compared to Full GC. But they do stop the JVM.

Another undesirable side effect of this automatic garbage collection is high CPU consumption. Since JVM has to scan millions of objects that our application creates in memory continuously, it will consume enormous amounts of CPU cycles. Say suppose if your application’s CPU consumption is 60%, most likely 40% of CPU is consumed by garbage collector. Uber – a large car-ride sharing application, recently optimized their GC settings and saved millions of dollars in computing cost. You can learn more about Uber GC Tuning case study from this post.

How These Freezes Impact Your Application

When application gets stopped due to frequent ‘Stop-the-world’ GC events, it will result in two major problems:

Unpleasant User Experience: Since application is stopped, all the customer transactions that the application was processing will get paused. Thus customers will get delayed responses. It can lead to an unpleasant user experience. If you are building real-time applications like trading applications, rockets or highly time sensitive applications like e-commerce sites, banking applications… every second matters. Long ‘Stop-the-World’ GC events are not preferred. 

Increase in Computing Cost: Because of ‘Stop-the-world’ GC events, your computing cost will start to go higher. The cost of computing rises significantly due to enormous CPU consumption and pauses incurred by GC events. For more details, please read this white paper, which explains how enterprises waste millions of dollars on computing cost due to garbage collection

How to Optimize the GC Performance?

You can optimize the GC Pause time i.e. ‘Stop-the-World’ events time by tuning your JVM Memory and GC settings. Here is a blog post that shares 9 tips to optimize the GC Pause Time. In nutshell they are:

1. Start Tuning from Scratch: Several applications contain outdated JVM arguments, degrading the application performance. Configure only those relevant to your application’s current needs.

2. Resize Your Heap: Increasing or decreasing your heap size (-Xmx), has good potential to avoid frequent and/or long GC pauses.

3. Choose the Right GC Algorithm: Select a garbage collection algorithm that best matches your application’s workload and performance requirements.

4. Adjust Internal Memory Region Sizes: Fine-tune the sizes of young, old, metaspace memory regions to reduce GC pause times.

5. Tune GC Algorithm Settings: Configure JVM arguments specific to the chosen GC algorithm for optimal performance.

6. Address the Causes of GC Events: Analyze GC log files to identify and mitigate specific triggers for GC events.

7. Disable Explicit GC: Prevent or minimize the impact of System.gc() calls to reduce unnecessary pauses.

8. Allocate Sufficient System Capacity: Ensure your application has enough CPU, memory, and I/O resources to support efficient GC operations.

9. Reduce Object Creation Rate: Lower the frequency of GC events by minimizing memory allocation and object creation in your application.

Conclusion

There is this famous saying: “There is no Free Lunch”. It’s very true in automatic garbage collection as well. Yes, it has improved the developer productivity and reduced the number of OutOfMemoryError. However it comes with the ‘Stop-the-World’ GC pause times and high CPU consumption. When GC settings are properly tuned you can bring down the GC overhead and enjoy its benefits. If you would like to learn more about GC Tuning, please check out my online ‘JVM Performance Engineering & Troubleshooting Master Class’.