views:

3103

answers:

6

Hello,

I have a question about comparing a string with the empty string in Java. Is there a difference, if I compare a string with the empty string with == or equals? For example:

String s1 = "hi";

if (s1 == "")

or

if (s1.equals(""))

I know that one should compare strings (and objects in general) with equals, and not ==, but I am wondering whether it matters for the empty string. Thank you.

+18  A: 

It's going to depend on if the string is a literal or not. If you create the string with

new String("")

Then it will never match "" with the equals operator, as shown below:

 String one = "";
 String two = new String("");
 System.out.println("one == \"\": " + (one == ""));
 System.out.println("one.equals(\"\"): " + one.equals(""));
 System.out.println("two == \"\": " + (two == ""));
 System.out.println("two.equals(\"\"): " + two.equals(""));

--

one == "": true
one.equals(""): true
two == "": false
two.equals(""): true

Basically, you want to always use equals()

MrWiggles
May be worth showing a code example of equals() too.
Kezzer
+4  A: 

A string, is a string, is a string, whether it's the empty string or not. Use equals().

Rob
+26  A: 
s1 == ""

is not reliable as it tests reference equality not object equality (and String isn't strictly canonical).

s1.equals("")

is better but can suffer from null pointer exceptions. Better yet is:

"".equals(s1)

No null pointer exceptions.

EDIT: Ok, the point was asked about canonical form. This article defines it as:

Suppose we have some set S of objects, with an equivalence relation. A canonical form is given by designating some objects of S to be "in canonical form", such that every object under consideration is equivalent to exactly one object in canonical form.

To give you a practical example: take the set of rational numbers (or "fractions" are they're commonly called). A rational number consists of a numerator and a denomoinator (divisor), both of which are integers. These rational numbers are equivalent:

3/2, 6/4, 24/16

Rational nubmers are typically written such that the gcd (greatest common divisor) is 1. So all of them will be simplified to 3/2. 3/2 can be viewed as the canonical form of this set of rational numbers.

So what does it mean in programming when the term "canonical form" is used? It can mean a couple of things. Take for example this imaginary class:

public class MyInt {
  private final int number;

  public MyInt(int number) { this.number = number; }
  public int hashCode() { return number; }
}

The hash code of the class MyInt is a canonical form of that class because for the set of all instances of MyInt, you can take any two elements m1 and m2 and they will obey the following relation:

m1.equals(m2) == (m1.hashCode() == m2.hashCode())

That relation is the essence of canonical form. A more common way this crops up is when you use factory methods on classes such as:

public class MyClass {
  private MyClass() { }

  public MyClass getInstance(...) { ... }
}

Instances cannot be directly instantiated because the constructor is private. This is just a factory method. What a factory method allows you to do is things like:

  • Always return the same instance (abstracted singleton);
  • Just create a new intsance with every call;
  • Return objects in canonical form (more on this in a second); or
  • whatever you like.

Basically the factory method abstracts object creation and personally I think it would be an interesting language feature to force all constructors to be private to enforce the use of this pattern but I digress.

What you can do with this factory method is cache your instances that you create such that for any two instances s1 and s2 they obey the following test:

(s1 == s2) == s1.equals(s2)

So when I say String isn't strictly canonical it means that:

String s1 = "blah";
String s2 = "blah";
System.out.println(s1 == s2); // true

But as others have poitned out you can change this by using:

String s3 = new String("blah");

and possibly:

String s4 = String.intern("blah");

So you can't rely on reference equality completely so you shouldn't rely on it at all.

As a caveat to the above pattern, I should point out that controlling object creation with private constructors and factory methods doesn't guarantee reference equality means object equality because of serialization. Serialization bypasses the normal object creation mechanism. Josh Bloch covers this topic in Effective Java (originally in the first edition when he talked about the typesafe enum pattern which later became a language feature in Java 5) and you can get around it by overloading the (private) readResolve() method. But it's tricky. Class loaders will affect the issue too.

Anyway, that's canonical form.

cletus
+1 for a great explanation. But you may want to explain what "canonical" means in this context -- that the JVM holds a table of created strings so it doesn't need to create the same string repeatedly. One of the few places Java has a clear performance advantage over C and C++.
rtperson
For C/C++, GCC at least has a compile-time option to reuse string constants. It's smart enough to support substrings where possible, too. It doesn't do this at runtime, though (unless you have a library which does this, e.g. using implicit sharing like Qt).
strager
+4  A: 
"".equals(s)

Seems to be the best option, but there is also Stringutils.isEmpty(s) contained in the Apache commons lang library

Raibaz
Interesting, a library call which is longer and more obscure than the code it replaces. ;)
Peter Lawrey
While I wouldn't add commons-lang for this if I wasn't already using it for something else but the chances are I *would* be using it. The null pointer avoiding version of the statement has always given me hives and I hate the check for null way as well so I prefer the StringUtils version.
Michael Rutherfurd
while its longer, its not more obscure to use a library call like StringUtil.isNullOrEmpty(someString); , and the reason is that having a name there makes it much more readable. the performance hit is negligible, so its a win.
Chii
i guess this falls into the subjectivity of readability :)To me StringUtils.isEmpty(foo) is more readable than "".equals(foo), but i can understand Peter Lawrey's point of view.
Raibaz
+3  A: 

It's a bit sideways from your original question, but there's always

if(s1.length() == 0)

I believe this is equivalent to isEmpty() method from 1.6.

eaolson
rtperson
Yes, but so will s1.isEmpty() or s1.equals("").
eaolson
A: 

Given two strings:

String s1 = "abc";
String s2 = "abc";

-or -

String s1 = new String("abc");
String s2 = new String("abc");

The == operator performed on two Objects checks for object identity (it returns true if the two operators return to the same object instance.) The actual behavior of == applied to java.lang.Strings does not always appear to be consistent because of String interning.

In Java, Strings are interned (at least partly at the discretion of the JVM.) At any point in time, s1 and s2 may or may not have been interned to be the same object reference (supposing they have the same value.) Thus s1 == s2 may or may not return true, based solely on whether s1 and s2 have both been interned.

Making s1 and s2 equal to empty Strings has no effect on this - they still may or may not have been interned.

In short, == may or may not return true if s1 and s2 have the same contents. s1.equals(s2) is guaranteed to return true if s1 and s2 have the same contents.

Jared