String in Java is a sequence of characters. String is more like a utility class which works on that character sequence. This character sequence is maintained as a array called value[], for example

private final char value[];

We have discussed on some facts about String in Java and why is String immutable in Java?. We would suggest to go through articles on Facts about String in Java and Why is String immutable in Java?

substring method is used to get parts of String in Java. It is defined in java.lang.String class.

Let’s see how substring is internally implemented; what is the backing implementation for an substring

Types of Available Methods

There are two overloaded methods available:

  1. Accepts one arguments; beginIndex, and returns part of String started from beginIndex till end.
  2. Accepts two arguments; beginIndex and endIndex, and returns part of String starting from beginIndex to endIndex-1.

SubString implementation in Java

substring() method has two different implementation in Pre Java 7 and Post Java 7 release.

Until Java Version 7u6

Every time you call substring() method in Java, it will return a new String because String is immutable in Java.

A String is basically a char[] that contains the characters of the string with an offset and a count (i.e. the string is composed of count characters starting from the offset position in the char[]).

How SubString works till Java 7?
How SubString works till Java 7?

When calling substring, a new string is created with the same char[] but a different offset / count, to effectively create a view on the original string, Except when count = length and offset = 0 as it will return same string array as mentioned in below code snippet.

It would found that substring has memory leak, consider a scenario where original string is very long, and has array of size 1GB, no matter how small a substring is, it will hold 1GB array. This will also stop original string to be garbage collected, in case if doesn’t have any live reference. This is clear case of memory leak in Java, where memory is retained even if it’s not required. That’s how substring method creates memory leak.

This issue was a bug http://bugs.sun.com/view_bug.do?bug_id=6294060, which was fixed in substring implementation of Java 7.

Since Java Version 7u6

To resolve memory leak issue in older Java versions, Instead of sharing original character array, substring method creates a copy of it.

New char[] is created every time, because there is no more offset or count field in the string class.

How SubString works after Java 7?
How SubString works after Java 7?

As mentioned in below code snippet, Arrays.copyOfRange is used to create whole new copy of substring.

This may save some bytes with each String instance, but not sharing original array makes substring perform linearly, as compared to constant time previously.

If you have not yet upgraded your service JVM to Java 7 and still working on Java 6 or older version. It’s time to upgrade. Happy learning!! :)

2 Thoughts on “How SubString Works Internally in Java”

Leave a Reply

Your email address will not be published. Required fields are marked *