views:

984

answers:

5

I came upon a strange behavior that has left me curious and without a satisfactory explanation as yet.

For simplicity, I've reduced the symptoms I've noticed to the following code:

import java.text.SimpleDateFormat;
import java.util.GregorianCalendar;

public class CalendarTest {
    public static void main(String[] args) {
        System.out.println(new SimpleDateFormat().getCalendar());
        System.out.println(new GregorianCalendar());
    }
}

When I run this code, I get something very similar to the following output:

java.util.GregorianCalendar[time=-1274641455755,areFieldsSet=true,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=1929,MONTH=7,WEEK_OF_YEAR=32,WEEK_OF_MONTH=2,DAY_OF_MONTH=10,DAY_OF_YEAR=222,DAY_OF_WEEK=7,DAY_OF_WEEK_IN_MONTH=2,AM_PM=1,HOUR=8,HOUR_OF_DAY=20,MINUTE=55,SECOND=44,MILLISECOND=245,ZONE_OFFSET=-28800000,DST_OFFSET=0]
java.util.GregorianCalendar[time=1249962944248,areFieldsSet=true,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2009,MONTH=7,WEEK_OF_YEAR=33,WEEK_OF_MONTH=3,DAY_OF_MONTH=10,DAY_OF_YEAR=222,DAY_OF_WEEK=2,DAY_OF_WEEK_IN_MONTH=2,AM_PM=1,HOUR=8,HOUR_OF_DAY=20,MINUTE=55,SECOND=44,MILLISECOND=248,ZONE_OFFSET=-28800000,DST_OFFSET=3600000]

(The same thing happens if I provide a valid format string like "yyyy-MM-dd" to SimpleDateFormat.)

Forgive the horrendous non-wrapping lines, but it's the easiest way to compare the two. If you scroll to about 2/3rds of the way over, you'll see that the calendars have YEAR values of 1929 and 2009, respectively. (There are a few other differences, such as week of year, day of week, and DST offset.) Both are obviously instances of GregorianCalendar, but the reason why they differ is puzzling.

From what I can tell the formatter produces accurate when formatting Date objects passed to it. Obviously, correct functionality is more important than the correct reference year, but the discrepancy is disconcerting nonetheless. I wouldn't think that I'd have to set the calendar on a brand-new date formatter just to get the current year...

I've tested this on Macs with Java 5 (OS X 10.4, PowerPC) and Java 6 (OS X 10.6, Intel) with the same results. Since this is a Java library API, I assume it behaves the same on all platforms. Any insight on what's afoot here?

(Note: This SO question is somewhat related, but not the same.)


Edit:

The answers below all helped explain this behavior. It turns out that the Javadocs for SimpleDateFormat actually document this to some degree:

"For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created."

So, instead of getting fancy with the year of the date being parsed, they just set the internal calendar back 80 years by default. That part isn't documented per se, but when you know about it, the pieces all fit together.

+1  A: 

Looking through SimpleDateFormat it seems like it's something to do with serialization:

/* Initialize the fields we use to disambiguate ambiguous years. Separate
 * so we can call it from readObject().
 */
private void initializeDefaultCentury() {
    calendar.setTime( new Date() );
    calendar.add( Calendar.YEAR, -80 );
    parseAmbiguousDatesAsAfter(calendar.getTime());
}
Tom
The comment implies that readObject() also calls this method, but it doesn't explain why...
Quinn Taylor
A: 
System.out.println(new SimpleDateFormat().getCalendar());
System.out.println(new GregorianCalendar());

comparing above code is comparing apples and pears

The first provides you a tool to parse String into Dates and vice versa The second is a DateUtility that allows you to manipulate Dates

There is not really a reason why the should provide similar output.

Compare it with the following

System.out.println(new String() );
System.out.println(new Date().toString() );

both lines will output a String but logicly you wouldnt expect the same result

Peter
Actually, your contrived code example makes no sense — I'm actually comparing two instances of GregorianCalendar (note the call to getCalendar() on the first line), not something so disparate as Date and String. I know what SimpleDateFormat and Calendar do.
Quinn Taylor
Peter's point is that his code exa,ple is also comparing two instances of the same thing -- java.lang.String -- which are gotten two different ways: one's a "top-level" string, the other is produced by a Date object (and represents that Date's internals). The only difference between his example and yours is that the SimpleDateFormat's calendar /is/ part of its internal state, and the String is probably just returned -- that is, a the class probably doesn't keep a reference to it after toString() is called. But it might, and without looking inside, we don't know it does not.
tpdi
I appreciate the clarification, but I disagree that the /only/ difference is whether a printed value is part of another object's internal state. By definition, the two strings he provides are different — one is the current date. I understand that the two calendars may not be identical, but I'm only focusing on a few small differences, and everything else is (largely) identical.
Quinn Taylor
+2  A: 

You are investigating internal behaviour. If this goes outside the published API then you are seeing undefined stuff, and you should not care about it.

Other than that, I belive that the year 1929 is used for considering when to interpret a two digit year as being in the 19xx instead of the 20xx.

Thorbjørn Ravn Andersen
Actually, now that I know what I'm looking for, I find it **is** published in the API. Under the table of format pattern letters, the bullet for _Year_ documents the default skewed century of "80 years before and 20 years after the time the SimpleDateFormat instance is created". This is as reasonable a default as any, and useful to know. (Now I don't have to worry about whether it's a bug!) It would be nice if `get2DigitYearStart()` at least mentioned this in its own documentation... :-)
Quinn Taylor
+1  A: 

SimpleDateFormat has mutable internal state. This is why I avoid it like the plague (I recommend Joda Time). This internal calendar is probably used during the process of parsing a date, but there's no reason it would be initialized to anything in particular before it has parsed a date.

Here's some code to illustrate:

import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.GregorianCalendar;

public class DateTest {
    public static void main(String[] args) {
        SimpleDateFormat simpleDateFormat = new SimpleDateFormat();
        System.out.println("sdf cal: " + simpleDateFormat.getCalendar());
        System.out.println("new cal: " + new GregorianCalendar());
        System.out.println("new date: " + simpleDateFormat.format(new Date()));
        System.out.println("sdf cal: " + simpleDateFormat.getCalendar());
    }
}
Kevin Peterson
Good feedback. In my case, mutable internal state is of no concern, since I'm using it as a private variable and never allowing any of its state to escape (it's used purely for formatting Unix timestamps). I'm also looking forward to Joda Time being integrated into Java 7, but for the code at hand, as simple as possible (fewer external JARs) is preferable. Thanks!
Quinn Taylor
+4  A: 

I'm not sure why Tom says "it's something to do with serialization", but he has the right line:

private void initializeDefaultCentury() {
    calendar.setTime( new Date() );
    calendar.add( Calendar.YEAR, -80 );
    parseAmbiguousDatesAsAfter(calendar.getTime());
}

It's line 813 in SimpleDateFormat.java, which is very late in the process. Up to that point, the year is correct (as is the rest of the date part), then it's decremented by 80.

Aha!

The call to parseAmbiguousDatesAsAfter() is the same private function that set2DigitYearStart() calls:

/* Define one-century window into which to disambiguate dates using
 * two-digit years.
 */
private void parseAmbiguousDatesAsAfter(Date startDate) {
    defaultCenturyStart = startDate;
    calendar.setTime(startDate);
    defaultCenturyStartYear = calendar.get(Calendar.YEAR);
}

/**
 * Sets the 100-year period 2-digit years will be interpreted as being in
 * to begin on the date the user specifies.
 *
 * @param startDate During parsing, two digit years will be placed in the range
 * <code>startDate</code> to <code>startDate + 100 years</code>.
 * @see #get2DigitYearStart
 * @since 1.2
 */
public void set2DigitYearStart(Date startDate) {
    parseAmbiguousDatesAsAfter(startDate);
}

Now I see what's going on. Peter, in his comment about "apples and oranges", was right! The year in SimpleDateFormat is the first year of the "default century", the range into which a two-digit year string (e.g, "1/12/14") is interpreted to be. See http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html#get2DigitYearStart%28%29 :

So in a triumph of "efficiency" over clarity, the year in the SimpleDateFormat is used to store "the start of the 100-year period into which two digit years are parsed", not the current year!

Thanks, this was fun -- and finally got me to install the jdk source (I only have 4GB total space on my / partition.)

tpdi
Tom was tipped off to serialization because the comment mentions readObject(), a method used for deserializing objects. However, the comment means that initializeDefaultCentury() was made a separate method so it could **also** be called from readObject() — the behavior is not strictly tied to serialization.
Quinn Taylor
Thanks for the in-depth answer. @Peter's "apples and oranges" claim was completely different, though. What actually happens is that SimpleDateFormat intentionally modifies the year of the calendar it stores internally — otherwise, its calendar would be identical to a just-created GregorianCalendar. (Also, the line position of a method in the source code is not "early" or "late" — what matters is the order in which they're called.) I've found that the SimpleDateFormat docs actually explain this, so I disagree with "efficiency over clarity", but I'm accepting this as the most complete answer.
Quinn Taylor
By late I meant "late" in the sequence of calls, not the line number; or more clearly: "SimpleDateFormat use a Locale object to find the calendar to construct, constructs (for most Locales) a GregorianCalendar, and then modifies it. What Peter was saying is that SimpleDateFormat has-a GregorianCalendar that it uses to store its internal state, and that GregorianCalendar, while get-able, isn't necessarily going to have the same state as one we new up independently.
tpdi
And yes, I agree with you on the serialization thing -- the code was refactored for serialization, but serialization isn't /why/ the year gets set to (now - 80 years).
tpdi