Java: String Concatenation

April 6, 2009

So, after spending a couple of hours trying to figure out why a particular function was so slow compared to my expectations, I realized my issue dealt with the fact that I forgot about String.concat. I had embedded in my head the past year to always use StringBuilder when possible when concatenating strings. After figuring out why another method was considerably faster, I started to dig into the performance. It is there I realized that String.concat existed and is actually faster in certain cases. Wanting to understand this more, I did a full thorough analysis and comparison. Now, I realize that several others have already undergone this and produced similar results, so you ask, why go through it all again? Well, I wanted to (A) personally verify it myself to learn and understand more (B) dig into the inner depths of the byte code and java code to understand exactly why and (C) to provide more concrete examples and use cases. That being said, continue reading below for my full analysis including my DO lists.

If you do not choose to read any further, at least read the following DO’s:

  • DO use the plus operator if all operands are constants
  • DO use String.concat if only concatenating 2 strings
  • DO use StringBuilder within a for loop when updating values
  • DO use StringBuilder if concatenating more than 2 strings
  • DO use StringBuilder rather than re-assigning variables multiple times
  • DO ensure StringBuilder has proper sizing

Before we start, let’s compare each of the three primary methods to concatenate in Java:

Method 1: Plus Operator

String temp = “alpha” + “beta”;

Method 2: StringBuilder

String temp = new StringBuilder(32).append(“alpha”).append(“beta”).toString();

Method 3: String.concat

String temp = “alpha”.concat(“beta”);

For new developers, method 1 always seems like such a great, easy way to do things as it is cleaner and less code (I know I always used it when I first began Java development). .However, as we will come to see, less and clean does not translate to better and faster, necessarily.

In order to know more, let’s dig into the byte code to analyze each method against each other. Let’s start with method 1 using the plus operator. There are multiple methods in which the JVM optimizes this syntax as we will see in a bit, but the basic byte code generally resembles this:


NEW StringBuilder
DUP
GETSTATIC Test.value1 : LString;
INVOKESTATIC String.valueOf(LObject;)LString;
INVOKESPECIAL StringBuilder.(LString;)V
GETSTATIC Test.value2 : LString;
INVOKEVIRTUAL StringBuilder.append(LString;)LStringBuilder;
INVOKEVIRTUAL StringBuilder.toString()LString;
ASTORE 0

In other words, Java uses StringBuilder behind the scenes to concatenate the objects. It translates the left most operand into a String (ie: converts an integer, boolean, object, etc) into a String and then creates a new StringBuilder passing that String as the constructor argument. Subsequent operations are then concatenated using the append method. Finally, the StringBuilder.toString method is invoked to create the resultant String. That seems pretty straightforward, right? Well, there are several caveats that you should be aware of. First, if you look at the single String argument constructor, it basically creates a new StringBuilder object by creating a new char[] array containing the length of the specified string plus an additional 16 indexes (I would say bytes, but remember that char values are 2-byte values in Java). So, now you ask why I would care to bring that up? Well, whenever any append operation occurs whose value being appended would overflow the internal char buffer, the StringBuilder creates a new char[] array equivalent to the new required size plus additional padding. Since the constructor initially pads the buffer with 16 indexes, that means that if any of the concatenated strings are longer than 16 characters combined, then the buffer will be forced to grow. As you know, any creation of a new object is always an expensive operation and has an impact directly on memory. Needless to say, if you are not careful and you concatenate multiple strings together, you could end up growing and creating new char buffers with each concatenation…definitely an expensive and non-efficient task.

Another issue with the plus operator and its usage of StringBuilder is how it initially creates its buffer from the left most operand. As it always invokes the single String argument constructor, it must make sure that the left most operand is a String. That has implications when that first argument is not a String. For example, if use the statement String temp = 5 + value, Java must convert 5 to a String first. If you look at the byte code for this, Java actually invokes String.valueOf(5). Looking into that method, you come to find that Java creates a char array and a String (2 objects) to build that value. If you compare that to this statement String temp = value + 5, now the left most argument is already a String, so nothing is required. When Java appends 5, it merely invokes append(int) on the StringBuilder. That method just copies the value into the internal character buffer not creating any extra objects.

You are prolly asking by now that if the plus operator has these issues, why ever use the plus operator. In fact, many believe that ever using the plus operator is a bad idea. However, that is not entirely true. In general, it can be proven (as we will momentarily) that the plus operator has benefits via JVM optimization. Anytime that Java finds that all operands are constants (ie: a number, boolean, a string, or any string/int/boolean variable defined as static final), it automatically pre-concatenates during compilation to create a static string. In other words, the statement: String temp = “test” + CONSTANT + 5 + true + INT_CONSTANT; results in the single byte code of LDC “testVALUE5true10” if CONSTANT was defined as “VALUE” and INT_CONSTANT was defined as 10. So, when you have constants, you should always use the plus operator to get this JVM optimization (you should verify with the byte code of your JVM if your JVM does indeed perform this optimization).

So, now that we know about how the plus operator fundamentally works, let’s dig into how StringBuilder works best. First, if you have only ever used StringBuffer, look at using StringBuilder from now on (except in multi-threaded cases modifying the same instance simultaneously). StringBuffer and StringBuilder are the same except that StringBuilder does not synchronize its methods. As synchronization and locking is expensive, StringBuilder is significantly faster. Anyways, the advantage that StringBuilder provides is that rather than create string after string after string and more and more garbage and character arrays, StringBuilder uses a single char array and appends to it. Where StringBuilder breaks down is when you do not allocate enough space for the char array requiring it to continually expand over and over, resulting in more and more garbage. We saw this same behavior in how the JVM used StringBuilder with plus operators. So, in general, always ensure you have proper sizing in the char array by using the int-based argument constructor. Otherwise, aside from that, this functions exactly the same as the plus operator. The reason it is faster often times is because of pre-allocating the buffer with the proper size.

That brings us to the final methodology: String.concat. The concat method on the String class takes the current string, appends the specified string and returns a new String. Internally, the String.concat method creates a new char array with enough spacing for both strings, copies both strings into the new array, and then creates a new String with that array. However, it uses a package protected constructor that assigns the specified array to its internal array. This is different than using the public String constructor that takes a char array. That constructor actually creates a new duplicated array and copies the bytes. So, because of the package protected special constructor, the String.concat method is optimized in that it uses shared char arrays and only requires the creation of a new String and a new char array.

Now that we know about how the various methods function, let’s begin to analyze various use cases, how they compare and which to use as a best practice.

Use Case 1: Concatenating String Constants


// METHOD 1
String temp = “alpha” + “beta”;
// METHOD 2
String temp = new StringBuilder(16).append(“alpha”).append(“beta”).toString()
// METHOD 3
String temp = “alpha”.concat(“beta”);

If we use our newfound knowledge, we can prolly see that method 1 will be fastest based on JVM optimization of constants. The following table shows this to be true. Note that this test and all other tests are based by invoking the methods 1,000,000 times taking the system time (System.currentTimeMillis) immediately before and after the loop. The “# Objects” column is the number of actual objects that get created (via the new operator in the byte code).

Method # Objects Average Median Minimum Maximum
Plus Operator 0 11.2 16 0 16
StringBuilder 4 385.9 390 375 391
String.concat 2 109.3 109 109 110

As we can see, the plus operator is extremely fast. More interesting, String.concat beats out StringBuilder as we will continue seeing as the trend below in certain situations. This is mainly because StringBuilder creates a total of 4 objects (1 for the StringBuilder, 1 for the StringBuilder char array, 1 for the String resultant, and 1 for the String resultant char array [the StringBuilder.toString method causes the new resultant String to create its own char array].

Use Case 2: Concatenating String Constants and Numbers


// METHOD 1
String temp = CONSTANT + 5;
// METHOD 2
String temp = new StringBuilder(32).append(CONSTANT).append(5).toString();
// METHOD 3
String temp = CONSTANT.concat(String.valueOf(5));

Again, the JVM optimization should win out here. But what about the other two methods? String.concat won previously as it only created two objects. However, in this case, we have to create a new String and char array for the integer constant. Nonetheless, let’s see the results:

Method # Objects Average Median Minimum Maximum
Plus Operator 0 3.2 0 0 16
StringBuilder 4 362.5 359.5 359 375
String.concat 4 161.1 157 156 172

Interestingly enough, String.concat is still considerably faster than StringBuilder. Even though they create the same number of objects, String.concat is faster most likely as the result of only having to execute a few methods whereas StringBuilder has to use two append operations each resulting in its own execution path.

Use Case 3: Concatenating String Variables (Non-Constants)


// METHOD 1
String temp = value1 + value2;
// METHOD 2
String temp = new StringBuilder(32).append(value1).append(value2).toString();
// METHOD 3
String temp = value1.concat(value2);

This is where it gets interesting…before looking below, think about everything we have learned thus far. Which method do you think will win in this case? Because we are using non-constants, we do not expect JVM optimization and so we can expect method 1 and 2 to behave very similar due to each using StringBuilder. However, how do you think String.concat wil perform?

Method # Objects Average Median Minimum Maximum
Plus Operator 4 – 5 381 375 375 390
StringBuilder 4 382.8 382.5 375 391
String.concat 2 107.9 109.5 93 110

Did you get the results you expected? String.concat is considerably faster in this case again as the JVM can no longer apply its optimizations with normal non-constant variables. As expected, methods 1 and 2 fare similar results. However, consider the case where the second string being concatenated is 20 characters? Now, which do you think will be faster between method 1 and 2? If you said method 2, you are right. This is due to the fact that the plus operator only pads the value by 16 characters. Thus, the 20 character string will overflow the buffer requiring a new char array and copy operation to be performed to grow the array. As we used an initial size of 32 characters in the StringBuilder, it did not require the buffer expansion. This is the most important point to remember. Always ensure your buffers are sized appropriately to get the best performance.

Use Case 4: Concatenating Multiple String Variables

So far, String.concat is looking pretty good. So why don’t we always use it? Well, let’s consider a case where we concatenate more than 3 values.


// METHOD 1
String temp = s1 + s2 + s3 + s4;
// METHOD 2
String temp = new StringBuilder(64).append(s1).append(s2).append(s3).append(s4).toString();
// METHOD 3
String temp = s1.concat(s2).concat(s3).concat(s4);

Do you expect String.concat to still be best? Let’s look at the results and find out:

Method # Objects Average Median Minimum Maximum
Plus Operator 4 – 7 511 515 500 532
StringBuilder 4 417.3 422 406 422
String.concat 10 668.9 672 656 688

In this case, StringBuilder is actually faster. But why? Well, looking at the number of objects created, String.concat had to create 10 objects compared to just 4 by StringBuilder. Remember that String.concat creates a new String and the char array with each concat operator. Thus, with 5 concatenations, it results in 5 concatenations times 2 objects per concatenation or 10 objects. However, with StringBuilder, we properly size the array and only require the four objects without growing the array. By the way, notice how the plus operator, despite using StringBuilder internally, is slower. This is precisely because its 16 char padding is not enough for all the concatenations. Thus, the array must grow and create extra objects and arrays. Nevertheless, with multiple concatenations, StringBuilder is always more efficient.

Use Case 5: String Concatenation in a For Loop


// METHOD 1
String temp = “”;
for (int i = 0; i < count; i++) { temp += “test”; }

// METHOD 2
StringBuilder buffer = new StringBuilder(count * 16);
for (int i = 0; i < count; i++) { buffer.append(“test”); }
String temp = buffer.toString();

// METHOD 3
String temp = “”;
for (int i = 0; i < count; i++) { temp = temp.concat(“test”); }

These methods concatenate a string multiple times within a for loop. Knowing what we know, we can easily realize that method 2 should be considerably faster, but how much faster? If count is 10, we get the following results:

Method # Objects Average Median Minimum Maximum
Plus Operator 40 – 50 2586.1 2586.5 2562 2609
StringBuilder 4 687.7 688 687 688
String.concat 20 1345.4 1344 1328 1360

This shows us that StringBuilder is 4x faster than the plus operator and 2x faster than String.concat. But wait, if the plus operator is based on StringBuilder, how can it possibly be 4x worse? Well, that answer lies in the byte code that we saw previously for the plus operator. Whenever Java encounters a plus operator, it creates a StringBuilder, append, toString set for the entire operation of all subsequent operands. However, after the last operand, it invokes toString to create the resultant String object. Within the for loop, we assign the variable each time using a plus operator. Thus, if count is 10, we create 10 StringBuilder sets. As each set creates a StringBuilder, its char array, its resultant String, and the resultant String’s char array (4 objects), we create 10 * 4 or 40+ objects. Depending on the length of the concatenated strings, we might have to grow the array, creating even more garbage. All in all, the plus operator is extremely slow within for loops. So, always remember to never use this operator when assigning a value inside a for loop. However, with a StringBuilder, it just continually appends a String to the internal char array never creating extra garbage (assuming it is sized appropriately). String.concat suffers from the same issue as it requires the creation of two objects within each iteration of the loop.

Use Case 6: Assigning Variables with String Concatenation

Let’s complete our evaluation by looking at another common use case of assigning and re-assigning a variable (but this time not within a for loop). Sometimes, this is done purely for convenience reasons or aesthetic reasons.


// METHOD 1
String temp = “alpha”;
temp += “beta”;
temp += “chi’;

// METHOD 2
StringBuilder buffer = new StringBuilder(32);
buffer.append(“alpha”);
buffer.append(“beta”);
buffer.append(“chi”);
String temp = buffer.toString();

// METHOD 3
String temp = “alpha”;
temp = temp.concat(“beta”);
temp = temp.concat(“chi”);

If we remember for our previous use case and the repercussions of assigning variables multiple times, we can pretty easily guess the outcome here as well. So, without further ado:

Method # Objects Average Median Minimum Maximum
Plus Operator 8 – 10 448.4 445.5 437 469
StringBuilder 4 284.3 281 281 297
String.concat 4 243.8 242.5 234 266

This is actually pretty close between String.concat and StringBuilder. But why when the previous case was much worse? Well, let’s consider what happens if we change the number of concatenations from 2 to 5:

Method # Objects Average Median Minimum Maximum
Plus Operator 20 – 25 1240.6 1235 1234 1265
StringBuilder 4 476.4 476.5 468 485
String.concat 10 660.8 656.5 656 672

Now, our results start to show the trend more and more. Remember, in the for loop use case example, we had 10 iterations. This is just the same as 5 iterations basically. When assigning variables multiple times, always tend to use StringBuilder to save the number of object instances, conserve garbage, and use a single char buffer. However, for simple cases of just quick assigning a value, String.concat is faster due to not requiring the extra StringBuilder instance…instead it just creates a String and char array and copies the data.

So, now that we have all of this in our programming arsenal, let me re-iterate the list of DO’s again with more detail:

  • DO use the plus operator if all operands are constants
    • This uses JVM optimization to pre-concatenate the strings at compiler time
  • DO use String.concat if only concatenating 2 strings
    • This only requires creating a new String rather than a new StringBuilder and a resultant String
  • DO use StringBuilder within a for loop when updating values
    • StringBuilder is very efficient when you just keep appending data to its internal buffer rather than having to create String after String
  • DO use StringBuilder if concatenating more than 2 strings
    • Even though String.concat is faster when concatenating 2 strings, it is not efficient with multiple concatenations as it requires a new String creation with each concatenation….StringBuilder just concatenates to its internal buffer
  • DO use StringBuilder rather than re-assigning variables multiple times
    • Re-assigning creates extra garbage each time for both plus operator and String.concat, so StringBuilder is efficient in not requiring the extra garbage
  • DO ensure StringBuilder has proper sizing
    • If you fail to size your initial char array properly, you will result in your StringBuilder have to dynamically grow which means creating new char arrays and copying data [imagine this when using really large strings and having the JVM trying to find memory available to create a new char array with that same size]

The most important aspect of this entire practice is to understand how byte code works and how internal methods (String, Integer, etc) work so that you can optimize your own code by knowing the end result. By understanding how these three string concatenation methods work fundamentally, we can pretty easily take any use case situation and know how to optimize without having to actually test. The real moral of the story, however, is to know how Java is compiling and interpreting the byte code. These tests were based on JDK 1.6, so your JVM may be different in what it uses.

15 Responses to “Java: String Concatenation”

  1. Incredible! It’s a great entry.

    I’ve been looking for something like this for hours.
    If you don’t care and if I have enough time, I’d like to translate it into Spanish.

    Thanks for all.

  2. Very useful. Thanks.

  3. Really good, helpful and most complete account to String concatenation mystery.

    This must me since java 5. In previous versions was StringBuffer used instead of StringBuilder or was it merely creating new Strings ?

  4. @akhiriya – StringBuffer was prior to JDK 1.5. StringBuilder was introduced in 1.5. The two are equivalent except that StringBuffer synchronizes all operations while StringBuilder does not…as a result, it is much faster by avoiding the synchronized block overhead. StringBuffer should only be used when concurrently accessing the buffer between multiple threads to ensure appends happen properly.

  5. Great blog! I always wondered why some people bothered to use String.concat in their code, now I know. Actually, for anything non-performance sensitive, I will continue using the + operator since the code is easier to read. But still, you expanded my knowledge. Thanks.

  6. Great!!.
    It is good to find such excellent explanation on the StringBuilder.

  7. Thank you for this incredibly helpful and thorough explanation!

  8. (I know this is an old post, so maybe I won’t get a response, but I thought I would try)

    Since it seems that StringBuilders toString will output a string with spaces filling whatever the capacity was of the StringBuilder, does having to call trim() on the string greatly affect any of these tests one way or another? trim() adds one more object and sometime to the process.

  9. Thank you for such a valuable piece!

  10. [...] can find a thorough explanation of how string concatenation works in Java in this blog post: http://znetdevelopment.com/blogs/2009/04/06/java-string-concatenation Tagged: Javaquestionsstring /* * * CONFIGURATION VARIABLES: EDIT BEFORE [...]

  11. Please help… how should i concatenate the following present in a file in the form of

    value = R_make+R_model+R_year+R_submodel only and don’t concatenate others.

    R_index_code||V
    R_make||Acura
    R_model||CL
    R_makeid||10
    R_modelid||110
    R_year||2002
    R_submodelid||1
    R_submodelid||2
    R_submodel||Base
    R_engineid||1

  12. please fetch the value like
    Value=Acura+CL+2002+Base

  13. Very useful, thanks a lot!

  14. can any one explain this…………………………….
    class B
    {
    public static void main(Strin args[])
    {
    int a=6;
    String s=a+” “;
    S.O.P(s);
    }
    }
    o/p is 6

  15. Thanks…Great explanation….