At my work we have multiple environments in our Application Development department.  They are separated by clusters in each environment with a load balancer on the front.  Most of the individual servers run httpd with mod_jk to talk to Tomcat.  One of the issues we experience is the PermGen space running out and all the cores hitting 100%.  Recently, this was happening on our production server.  We realized a few things.

Our Tomcat servers have several grails applications.  Our JDBC connection through Tomcat uses a single entry in server.xml for each database.  This means that all connections to the Tomcat server are funneled through this one tunnel back to the database.  As people open and close their connections, the garbage collection sees these objects and does not collect them.  Moreover, Grails tends to not optimize the queries it runs against the database and these “ad-hoc” queries are seen as new objects in Grails.  They are not collected.  Another thing to note is builds that are done from Jenkins or another build product.  Each new build is not collected completely by the garbage collector.  Over time, the PermGen space will be used up.

On our servers we noticed that when enough people were logged on to the application, the CPU would run high and eventually no more people could log on.  The best way to solve this issue is to add the following to the startup environment settings in Tomcat:

-XX:+EliminateLocks -XX:+UseBiasedLocking

As a side note, always make sure you put the following in your startup as well:

-server -XX:+UseG1GC -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled

These settings help make better use of the memory and initiate garbage collection with the newer generation of settings.  If you look closely, you will see that the GenSweepingEnabled is actually an older, depreciated entry, but still useful and since Tomcat doesn’t complain, I like to leave it in there for peace of mind.

However, other tuning adjustments should be made to both the system and server.xml file to actually affect the PermGen space issue.  First, let’s look at how Tomcat uses memory.  I won’t purport to be an expert at memory utilization, but Tomcat tends to use swap space quite a bit.  In fact, because of its use of swap space, it can greatly slow down the server in terms of thread execution.  On Oracle systems, we try to adjust the swap in such a way that the server uses physical memory first.  It’s faster.  To that end, you can tell the server how much weight should be given to runtime memory as opposed to using page cache.  To see what yours is, type this:

cat /proc/sys/vm/swappiness

My systems, and I believe the Centos default, is 30.  We want to reduce that.  Simply type:

echo 5 | sudo tee /proc/sys/vm/swappiness

This will change the weight of cache to 5 in favor of runtime memory.  Now run:
Tomcat
cat /proc/sys/vm/swappiness

again to double-check the change.  Finally, we will need to open up /etc/sysctl.conf and add the following to it so the setting holds on reboot:

vm.swappiness=5

We also want to change the server.xml.  There are two settings that I directly look at on ours.  First are the maxThreads and the minSpareThreads.  The default looks like the following:

<Executor name=”tomcatThreadPool” namePrefix=”catalina-exec-”
maxThreads=”150″ minSpareThreads=”4″/>

The rule of thumb is to change this as a multiple of 200 per the amount of cores you have.  If you have two cores, use 400.  If you have four cores, use 800.  For our system, we will now make the entry look like this:
 <Executor name=”tomcatThreadPool” namePrefix=”catalina-exec-“
        maxThreads=”400″ minSpareThreads=”4″/>
You may also notice that under the <GlobalNamingResources> section, each JNDI resource will have certain parameters set.  Below is a sort of default for Tomcat 1.8:
maxTotal=”100″ maxIdle=”75″ maxWaitMillis=”30000″
Below is the old setup for the same line:
maxActive=”100″ maxIdle=”75″ maxWait=”30000″
I’ve mixed that up with maxActive, I don’t know how many times.  If it is earlier than 1.8, use maxActive.  We want to change the maxTotal to accomodate more connections.  I would do this in increments based on the amount of memory you have dedicated to Tomcat.  For now, a good setting might be:
 maxTotal=”400″ maxIdle=”75″ maxWaitMillis=”30000″
And Finally, if you want to diagnose issues with Heap and PermGen faster, I cannot recommend enough using VisualVM and installing sysstat on your Centos 6+ servers.  VisualVM is a java based heap and thread monitor.  It requires JMX to run and some configuration changes.  I will try to setup a tutorial and install guide in one of my next posts.  Sysstat will give you several tools, not the least of which is sar and iostat.  Use them, and good hunting!