Scalable NIO Servers – Part 2 – Memory
As my ongoing investigation into picking a suitable and highly scalable server for my future applications, today I will be talking about memory. Yesterday, we discussed performance and saw in general how Netty and Grizzly compared to one another. Today, we will further extend that by comparing memory usage footprint at start, over time, and how the memory usage compares as a trend with respect to garbage collection. As GC is one of the biggest culprits of performance, the more efficient the server at conserving memory, the better the performance. So, without further ado, let’s get to the testing.
This data was once again compared using a simple echo client/server program utilizing a varying number of client threads sending simple messages as fast as possible. As the client and server were run from the same machines, there is some issue with CPU bottlenecks that may misconstrue certain data points. This data should only be used as a preliminary idea and others should undergo their own testing to ensure expectations are properly met.
The following data represents the memory when testing JBoss Netty. In idle mode, the startup memory was around 1.5 – 2.0 MB. During a 5 minute testing span, the following graph was observed in JConsole. Notice that it underwent 430 minor collections in 0.346 seconds (or on average about 0.8 ms per collection). The memory ranged from its idle point to a high of near 40 MB.

The next test was Grizzly. Grizzly had similar idle memory modes in the 1.5 MB range. Over a 5 minute period, Grizzly underwent 100 collections in 0.178 seconds (or about 1.7 ms per collection). Grizzly seemed to perform less collections compared to Netty, but Netty was faster at performing the collections. However, in the 5 minute period, Grizzly only consumed 0.178 seconds or almost 50% less than Netty. In terms of memory usage, Netty only consumed a peak maximum of 40 MB. Grizzly peaked around a little over 70 MB.

Up next was Apache Mina. Once again, idle memory in Mina was less than 2 MB. In terms of the 5 minute testing span, Mina did very well only invoking minor GC cycles 143 times over 0.137 seconds. This was ever better than Grizzly’s 0.178 seconds. The average collection was about 0.9 ms slightly higher than Netty. Mina was also very similar in memory usage to Netty in maxing out at 40 MB. Overally, Mina seemed to perform very well.

Finally, our last test was against xSocket. xSocket seemed to demonstrate the worst performance in terms of memory compared to the other three. It underwent 159 minor collections, but took 3.5 seconds to perform them. That is an average of 22 ms per collection (22x worst than Nina and Netty). xSocket also peaked out around 55 MB compared to 40 MB of Nina and Netty. The slow collection times of Netty would most likely indicate the rationale behind the slower performance of xSocket as well.

Note, however, that there was no optimization of JVM arguments to any of these tests in order to undergo optimal testing. The following JVM args were used in each test: -server -Xms512m -Xmx512m -XX:+UseParallelGC
Due to the test environment, hardware used, and lack of full optimization analysis, Netty, Mina, and Grizzly are all very competitive against each other in terms of memory. Netty and Mina may be slightly better than Grizzly at certain aspects of memory, though. Next, we will look at the feature sets of the different libraries including the pros, cons, and ease of use.

I think it’s great that you not only looked at response time, but also at memory usage.
It would be interesting to get the numbers for how much memory was freed by the GC in average for one message send.
http://www.tagtraum.com/gcviewer.html might help you to do this.
Or just post the GC log files :)
I don’t think that you can measure the memory consumption by looking at jconsole stats. There are so many so many factors (jconsole refresh time, os scheduling, gc scheduling, etc etc) involved that comparing numbers obtained in this way is questionable to say the least.
Maybe triggering a full GC while the server is under load would give you a better picture of the retained memory. Taking heap dumps and analyzing those would be the best option.
Also comparing gc telemetry of an app that runs only for 5min is IMO too inaccurate too. The JVM (especially the server VM) takes a good amount of time (tens of seconds to minutes) to warm up.
It’s really interesting to see these numbers, but I’m afraid that we can’t put too much weight on them. I’d actually argue that you tested the HotSpot JVM more than the NIO frameworks and I wouldn’t be surprised if the results under other VMs were very different.