On a major B2B application different GC algorithms behaviors were studied. This application is basically a web service provider servicing SOAP and REST requests from its clients.

This application doesn’t have any web browser interactions. The application runs on 8 Core CPU, Red Hat Linux 6.9. It’s using Java 7, Tomcat 7 and other popular Java frameworks.

This study conducted over a 3 hour period in the production environment during off-peak hours. This application runs on multiple JVM instances across multiple servers. We basically configured 4 different JVM instances with the below-mentioned settings. Remaining JVM instances were running with its old settings (which I can’t tell & not of interest to this article). Traffic was evenly distributed across all JVM instances “Round-Robbin” algorithm in the load balancer.

G1 GC: -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -XX:PermSize=300m 
-XX:+UseG1GC -XX:MaxGCPauseMillis=500

CMS GC:  -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -XX:PermSize=300m 
-XX:NewRatio=1  -XX:+UseConcMarkSweepGC

Parallel GC: -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -XX:PermSize=300m 
-XX:NewRatio=1 -XX:-UseParallelOldGC

Serial GC: -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -XX:PermSize=300m 
-XX:NewRatio=1 -XX:-UseSerialGC

Note in the JVM settings, Heap Size (-Xmx, -Xms, –XX:NewRatio), Perm Size (–XX:MaxPermSize, -XX:PermSize) and all other parameters are kept identical. Only GC algorithms vary.

Key Performance Indicators

In any study, key performance indicators should be carefully identified. As far as a Garbage Collection study is concerned, (in my humble opinion) key performance indicators are:

  1. Memory & CPU Utilization
  2. Latency
  3. Throughput

Latency and Throughput are slightly confusing terminology. Let me make an attempt to clarify it through an example. Let’s say your application is running for a 1 hour period (i.e. 60 seconds). In this 1 hour period, 5 GCs run.

  • 1st GC took: 1 second
  • 2nd GC took: 2 seconds
  • 3rd GC took: 1 second
  • 4th GC took: 1 second
  • 5th GC took: 1 second

Latency

Latency is the maximum GC Pause time. In this example, maximum GC pause time is 2 seconds. Thus Latency is 2 seconds. Latency is an important KPI, because, during GC pauses, the application will freeze. Let’s say your application’s SLA commitment is 600 ms. In general, your average response time is 500ms. Then you are within the SLA limits, which is a good thing. Let’s say your GC runs now and it takes 2 seconds to complete. Then your application’s response time during this window will become 2 seconds & 500 ms. It means you have breached the SLA commitment. Latency has a direct impact on your end user’s experience.

Throughput

Throughput is the number of results produced per unit of time. In this example, total time spent on GC is 6 seconds (i.e. adding 1st, 2nd 3rd, 4th and 5th GC times). It means 10% of the time is spent in GC (i.e. 6 / 60). It means throughput is 90% (i.e. 100 – 10%). So if you have a high throughput, it means your application is performing a lot better with less overhead. In this example, 90% is a poor throughput.

One should target for low latency and high throughput. Now a question might be, “What is the acceptable latency and throughput?” The answer is: It depends; It depends on the nature of your application, it depends on your SLA agreements with your clients; it depends on the price you are willing to pay for your compute power; it depends on your competitors’ response time, etc.

Tools

The following tools were used for this study:

  1. The CPU utilization metric was captured from the application performance monitoring tool New Relic.

  2. Throughput and Latency metrics were captured from the universal garbage collection analysis tool GCEasy.

Performance Summary

The below table summarizes all the KPIs gathered from this study:

GC Algorithm
CPU Utilization
Max Latency
Throughput
Complete Report
G1
9.80%
780 ms
96.96%
G1 GC Report
CMS
8.50%
3 sec 100 ms
97.29%
CMS GC Report
Parallel
7.60%
4 sec 560 ms
97.02%
Parallel GC Report
Serial
7.10%
6 sec 500 ms
96.86%
Serial GC Report

Here are some key observations from this study:

  • CPU utilization has been comparable among all GC settings. There isn’t a significant difference. Among all GC settings, G1 GC consumes a maximum CPU performance of 9.80%. The least CPU consumption came from the Serial GC setting which takes only 7.10%.

  • Irrespective of the GC algorithm throughput remains fairly consistent. CMS GC having slightly better throughput 97.29% than other GC algorithms.

  • G1 GC produces the best latency because of setting the -XX:MaxGCPauseMillis system property.

  • -XX:MaxGCPauseMillis is set to 500 ms. This setting is closely honored, thus we are seeing the max GC pause time to be 780 ms.

  • Serial GC has worst latency at 6 sec 500 ms among all GC algorithms.