tags:

views:

2008

answers:

3

I'm using SimpleDateFormat with the pattern "EEE MM/dd hh:mma", passing in the date String "Thu 10/9 08:15PM" and it's throwing an Unparseable date exception. Why? I've used various patterns with SimpleDateFormat before so I'm fairly familiar with its usage. Maybe I'm missing something obvious from staring at it too long.

The other possibility is funky (technical term) whitespace. The context is a screen-scraping app, where I'm using HtmlCleaner to tidy up the messy html. While I've found HtmlCleaner to be pretty good overall, I've noticed strange issues with characters that look like whitespace but aren't recognized as such with a StringTokenizer, for example. I've mostly worked around it and haven't dug into the character encoding or anything like that but am starting to wonder.

+1  A: 

Try this instead for your pattern:

EEE MM/d hh:mma

The difference is the single d instead of double dd, since your date is for 10/9 instead of 10/09.

Joel Coehoorn
Thanks for the reply, but that didn't work either.
+1  A: 

To test if it's the date format, write a test class to prove it out. For these types of things, I like to use bsh (beanshell). Here was my test:

sdf = new java.text.SimpleDateFormat("EEE MM/dd hh:mma");
System.out.println(sdf.format(sdf.parse("Thu 10/9 08:15PM")));

Which outputted: Fri 10/09 08:15PM

So, at least with my jdk / jre version (1.6), the format strings seem to work just fine. i think the next step is to make sure the string you're dealing with is exactly what you think it is. Can you add logging to your code, and dump out the input string to a log file? Then you could look at it in a nice text editor, run it through your test class, or look at it in a hex editor to make sure that it's just normal text.

Good luck!

Eric Tuttleman
Friday rather than thursday? What year is this for- it's never specified, and that might be our problem.
Joel Coehoorn
I hand't noticed that! Good catch. However; it's not why the exception is being thrown. i believe that his note saying that there's some odd characters in the string is probably closer to the issue. I believe via logging, he'll be able to figure out what additional filters to put on input.
Eric Tuttleman
I'm using the hex viewer/editor of TextPad. 'Normal' spaces have a hex value of '20', while the spaces between the date and time in my datetime string have a hex value of 'A0'. What's the difference in these characters?
supposed it is encoded as iso-8859-1[5], 0xA0 is a non-breaking space.
WMR
Sounds like you've found your issue. Can you run through your string - maybe as a char array - and replace characters you don't understand with spaces?
Eric Tuttleman
yup, that's what I had already done, re: the char array. Fortunately, the A0 char returned true for Character.isSpaceChar(), so I replaced them with spaces. Strangely, they returned false for isWhitespace(). Still not sure how to get the hex value for a char in java.
I believe it's in the Integer.toString methods.But, I wouldn't convert the character to a hex string to compare it, unless that makes the code a lot clearer for you. I'd probably just find the dec. value of A0 (160) and match against that. Could also reverse logic to accept chars instead.
Eric Tuttleman
A: 

First question here on StackOverFlow so I'm not sure what the proper way to mark this resolved is. Most of the answers are in the comments of Eric's answer.

The root cause was a 'space' character in the date string that was not recognized as such. It was a hex char of 'A0', which is a non-breaking space. I ended up converting the date string to a char array, checking the characters with Character.isSpaceChar(), and replacing those that returned true with a " " char.