views:

113

answers:

6

So in Java, whenever an indexed range is given, the upper bound is almost always exclusive.

From java.lang.String:

substring(int beginIndex, int endIndex)

Returns a new string that is a substring of this string. The substring begins at the specified beginIndex and extends to the character at index endIndex - 1

From java.util.Arrays:

copyOfRange(T[] original, int from, int to)

from - the initial index of the range to be copied, inclusive
to - the final index of the range to be copied, exclusive.

From java.util.BitSet:

set(int fromIndex, int toIndex)

fromIndex - index of the first bit to be set.
toIndex - index after the last bit to be set.

As you can see, it does look like Java tries to make it a consistent convention that upper bounds are exclusive.

My questions are:

  • Is this the official authoritative recommendation?
  • Are there notable violations that we should be wary of?
  • Is there a name for this system? (ala "0-based" vs "1-based")

CLARIFICATION: I fully understand that a collection of N objects in a 0-based system is indexed 0..N-1. My question is that if a range (2,4) given, it can be either 3 items or 2, depending on the system. What do you call these systems?

AGAIN, the issue is not "first index 0 last index N-1" vs "first index 1 last index N" system; that's known as the 0-based vs 1-based system.

The issue is "There are 3 elements in (2,4)" vs "There are 2 elements in (2,4)" systems. What do you call these, and is one officially sanctioned over the other?

+1  A: 

Its just 0 to n-1 based.

A list/Array contains 10 items 0-9 indexed.

You cannot have a 0 indexed based list that is 0-n where the cout is n, that includes an item that does not exists...

This is the typical way things work.

  1. Yes.
  2. Excel Ranges/Sheets/Workbooks.
  3. Index (information technology)
astander
I understand that a collection of `N` objects in a 0-based system is indexed 0..N-1. My question is that if a range (2,4) given, is that 3 items or 2?
polygenelubricants
+1 for a good answer of OP
stacker
That will depend on the context of the object list you refer to. As mentioned previously, the documentation *should* help you with this. More likely than not it is 0 based, but as I mentioned, there are deviations...
astander
+4  A: 

In general, yes. If you are working in a language with C-like syntax (C, C++, Java), then arrays are zero-indexed, and most random access data structures (vectors, array-lists, etc.) are going to be zero-indexed as well.

Starting indices at zero means that the size of the data structure is always going to be one greater than last valid index in the data structure. People often want to know the size of things, of course, and so it's more convenient to talk about the size than to talk about the the last valid index. People get accustomed to talking about ending indices in an exclusive fashion, because an array a[] that is n elements long has its last valid element in a[n-1].

There is another advantage to using an exclusive index for the ending index, which is that you can compute the size of a sublist by subtracting the inclusive beginning index from the exclusive ending index. If I call myList.sublist(3, 7), then I get a sublist with 7 - 3 = 4 elements in it. If the sublist() method had used inclusive indices for both ends of the list, then I would need to add an extra 1 to compute the size of the sublist.

This is particularly handy when the starting index is a variable: Getting the sublist of myList starting at i that is 5 elements long is just myList.sublist(i, i + 5).

All of that being said, you should always read the API documentation, rather than assuming that a given beginning index or ending index will be inclusive or exclusive. Likewise, you should document your own code to indicate if any bounds are inclusive or exclusive.

Joe Carnahan
+1 for "you should always read the API documentation" and "you should document your own code to indicate"
matt b
Just to clarify relevance to the OP, I believe that the popularity of half-open ranges in Java came directly from the use of half-open ranges in C, which in turn came as a natural extension of zero-based indexing. So, I think that a discussion of zero-based indexing *is* relevant to the original question. (That being said, it's my fault if I didn't make that connection between zero-based indexing and half-open ranges explicit in my original answer.)
Joe Carnahan
A: 

This practice was introduced by Josh Bloch to Collections API as a contract.

After that it became a standard in java and when anybody dicide to create a public library he assumes that he should keep the contract because users expect to see already known behavior in new libraries.

Roman
So this is the "Bloch system", then? Surely this must've had historical use way before Java/Java Collections Framework?
polygenelubricants
I don't know its name and I'm not sure that it exists. I watched a video on youtube where Josh Bloch was talking about good principles in API design. And there he said that *inclusive lower bound and exclusive upper bound principle* is actually a standard and shouldn't be ever violated when you're developing public libraries. He also mentioned about he was the first (or one of the first, I don't remember) who introduced it in java.
Roman
@Downvoter: with which part of my answer do you disagree?
Roman
I'm confused why people downvoted you, because unlike some of the others here, you actually "get" what I'm asking.
polygenelubricants
I'm not the downvoter ;-), but I'm curious where this contract is documented. It's a pattern that is followed consistently across the Collections API, and of course everyone has noticed it, but I've never seen it named nor centrally documented. Regardless, the pattern of passing the size of an array or string (which is the same as the exclusive ending index if the starting index is zero) dates back long before Java, right?
Joe Carnahan
@Joe Carnahan: I'm not sure if it's documented anywhere (but if I don't know it and you don't know it then it doesn't mean that it's not documented) but every middle or higher java developer knows that this principle shouldn't be violated if you want other people to use your product. Apache Commons and Google Collections and numerous other libraries (Google Data API for example) comply with this contract.
Roman
Interesting theory, but the array in computer science and the `Vector`, `String`, etc in Java API were already 0 based long before the Bloch era.
BalusC
@BalusC: I don't think that @polygenelubricants agrees that there is a connection between zero-based indexing and the custom of using exclusive ending indices when describing ranges. I thought the connection was clear (size of a zero-based array = exclusive ending index for that array), but apparently that connection isn't as clear as you and I thought it was.
Joe Carnahan
A: 

The indexes in array like datastructures are indeed always 0-based. The String is basically backed by a char[]. The Collections framework is under the hood based on arrays and so on. This makes designing/maintaining/using the API easier without changing the "under-the-hood" way to access the desired element(s) in the array.

There are however some "exceptions", such as the parameterindex-based setter methods of PreparedStatement and the columnindex-based getter methods of ResultSet. They are 1-based. Behind the scenes they does also not really represent an array of values.

This would probably bring up a new question: "Why are array indexes zero based?". Now, our respected computer programming scientist E.W. Dijkstra explains here why it should start with zero.

BalusC
+1  A: 

Credit goes to FredOverflow in his comment saying that this is called the "half-open range". So presumably, Java Collections can be described as "0-based with half-open ranges".

I've compiled some discussions about half-open vs closed ranges elsewhere:


siliconbrain.com - 16 good reasons to use half-open ranges (edited for conciseness):

  • The number of elements in the range [n, m) is just m-n (and not m-n+1).
  • The empty range is [n, n) (and not [n, n-1], which can be a problem if n is an iterator already pointing the first element of a list, or if n == 0).
  • For floats you can write [13, 42) (instead of [13, 41.999999999999]).
  • The +1 and -1 are almost never used, when handling ranges. This is an advantage if they are expensive (as it is for dates).
  • If you write a find in a range, the fact that there was nothing found can easily indicated by returning the end as the found position: if( find( [begin, end) ) == end) nothing found.
  • In languages, which start the array subscripts with 0 (like C, C++, JAVA, NCL) the upper bound is equal to the size.

Half-open versus closed ranges

Advantages of half-open ranges:

  • Empty ranges are valid: [0 .. 0]
  • Easy for subranges to go to the end of the original: [x .. $]
  • Easy to split ranges: [0 .. x] and [x .. $]

Advantages of closed ranges:

  • Symmetry.
  • Arguably easier to read.
  • ['a' ... 'z'] does not require awkward + 1 after 'z'.
  • [0 ... uint.max] is possible.

That last point is very interesting. It's really awkward to write an numberIsInRange(int n, int min, int max) predicate with a half-open range if Integer.MAX_VALUE could be legally in a range.

polygenelubricants
A: 

The easy way to think of half-open ranges is this: the first term identifies the start of elements within the range, and the second term identifies the start of elements after the range. Keep that in mind, and it all makes a good deal more sense. Plus the arithmetic works out better in many cases, per @polygenelubricants' answer.

Carl Manaster