views:

1040

answers:

6

I do not understand why Java's String.substring() method is specified the way it is. I can't tell it to start at a numbered-position and return a specified number of characters; I have to compute the end position myself. And if I specify an end position beyond the end of the String, instead of just returning the rest of the String for me, Java throws an Exception.

I'm used to languages where substring() (or substr()) takes two parameters: a start position, and a length. Is this objectively better than the way Java does it, and if so, can you prove it? What's the best language specification for substring() that you have seen, and when if ever would it be a good idea for a language to do things differently? Is that IndexOutOfBoundsException that Java throws a good design idea, or not? Does all this just come down to personal preference?

A: 

If you leave off the 2nd parameter it will go to the end of the string for you without you having to compute it.

great_llama
Sometimes you want from x until 20 characters later or the end of the string, whichever is shorter.
Yishai
And sometimes you want from x up to half the rest of the string, excluding digits. No API can fulfill all possible requirements.
Michael Borgwardt
No API can fulfill all possible requirements, but I think some APIs may demonstrably fulfill more requirements than others. I'm still musing on that, and watching responses. :)
skiphoppy
+2  A: 

second parameter should be optional, first parameter should accept negative values..

Tolgahan Albayrak
+4  A: 

I'm used to languages where substring() (or substr()) takes two parameters: a start position, and a length. Is this objectively better than the way Java does it, and if so, can you prove it?

No, it's not objectively better. It all depends on the context in which you want to use it. If you want to extract a substring of a specific length, it's bad, but if you want to extract a substring that ends at, say, the first occurrence of "." in the string, it's better than if you first had to compute a length. The question is: which requirement is more common? I'd say the latter. Of course, the best solution would be to have both versions in the API, but if you need the length-based one all the time, using a static utility method isn't that horrible.

As for the exception, yeah, that's definitely good design. You asked for something specific, and when you can't get that specific thing, the API should not try to guess what you might have wanted instead - that way, bugs become apparent more quickly.

Also, Java DOES have an alternative substring() method that returns the substring from a start index until the end of the string.

Michael Borgwardt
+5  A: 

There are times when the second parameter being a length is more convenient, and there are times when the second parameter being the "offset to stop before" is more convenient. Likewise there are times when "if I give you something that's too big, just go to the end of the string" is convenient, and there are times when it indicates a bug and should really throw an exception.

The second parameter being a length is useful if you've got a fixed length of field. For instance:

// C#
String guid = fullString.Substring(offset, 36);

The second parameter being an offset is useful if you're going up to another delimited:

// Java
int nextColon = fullString.indexOf(':', start);
if (start == -1)
{
    // Handle error
}
else
{
    String value = fullString.substring(start, nextColon);
}

Typically, the one you want to use is the opposite to the one that's provided on your current platform, in my experience :)

Jon Skeet
+1 for that last sentence. :)
jmucchiello
+1, for the same reason.
Carl Manaster
+1. Same reason.
Grant Wagner
A: 

Having gotten some feedback, I see when the second-parameter-as-index scenario is useful, but so far all of those scenarios seem to be working around other language/API limitations. For example, the API doesn't provide a convenient routine to give me the Strings before and after the first colon in the input String, so instead I get that String's index and call substring(). (And this explains why the second position parameter in substr() overshoots the desired index by 1, IMO.)

It seems to me that with a more comprehensive set of string-processing functions in the language's toolkit, the second-parameter-as-index scenario loses out to second-parameter-as-length. But somebody please post me a counterexample. :)

skiphoppy
"For example, the API doesn't provide a convenient routine to give me the Strings before and after the first colon in the input String, so instead I get that String's index and call substring()." - If you wanted an API to do everything, we'd have huge APIs to learn and Stack Overflow would consist of answers asking why someone didn't use one of the 30000 methods on the String object instead of spending 30 seconds writing it themselves. That's why we're programmers, because not everything has been written yet. If you need to do task X a lot, write your own library to do it.
Grant Wagner
Why in the world would everybody want to write their own library when they could have one implementation debug all the entailed fencepost errors?
skiphoppy
A: 

If you store this away, the problem should stop plaguing your dreams and you'll finally achieve a good night's rest:

public String skipsSubstring(String s, int index, int length) {
    return s.subString(index, index+length);
}
Bill K
That'll simply blow up with an IndexOutOfBoundsException. :) I'd need to add a check to make the second parameter to substring() be min(index + length, s.length()).
skiphoppy
Having looked at the implementation of String, it seems like it wouldn't be too hard for Java to provide both paradigms: substring() can continue as is, and substr() could follow semantics more common in other languages. Internally the implementation seems like it would be straightforward.
skiphoppy
I think it would be awful to have both substring() and substr(), with different semantics. I can pretty much guarantee that I would be using the wrong one more than half the time - and I think most of my fellow programmers would, too.
Carl Manaster
Maybe if the names were better distinguished, then?
skiphoppy
>> That'll simply blow up with an IndexOutOfBoundsException. :) I'd need to add a check to make the second parameter to substring() be min(index + length, s.length()).----Yes, and when you catch that, be sure to throw an IndexOutOfBoundsException--do NOT fail silently or guess, that's horrific design. Fail early and fail hard.
Bill K