views:

82

answers:

3

I have this sample string:

Sample string 1:
A^1.1#B^1#I^2#f^0#p^1#d^2010-07-21T08:52:05.222ZKHBDGSLKHFBDSLKFGNIF#%$%^$#^$XLGCREWIGMEWCERG

Sample string 2:
A^1.1#B^1#f^0#p^1#d^2010-07-22T07:02:05.370ZREGHCOIMIYR$#^$#^$#^EWMGCOINNNNNNVVVRFGGYVJ667VTG

So, from these strings, I need to take out the time stamp:

2010-07-21T08:52:05.222 or
2010-07-22T07:02:05.370

Basically values b/w d^ and Z

What is the best ("smartest") way to do this? substring(), regex?

+1  A: 
Pattern p = Pattern.compile("(\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d{3})");
//you could also use "d\\^(.*)Z" as your regex patern
Matcher m = p.matcher("your string here");

if (m.find()) {
    System.out.println(m.group(1)); //print out the timestamp
}

Taken from here

Also, make sure to reuse the Pattern p object if you're looping through a series of strings

sigint
Since it's always between the "d^" and "Z" wouldn't it be easier just to do d\^(.*?)Z and use the capture group?
"d\\^(.*?)Z" also works
zengr
@fy-tide Ah, you are correct. Edited to show both strict and simple regex patterns.
sigint
+1  A: 

With two small assumptions you can do it without a regex.

  1. The ^d right before the date string is the first one that appears in the text. I assume that delimiter always means "A date follows."
  2. That date format looks pretty regular, so I'm assuming the length won't change.

Just get the index of the starting ^d delimiter to find out where the date starts and use the length to get then ending index.

public static void main(String[] args) {
    String s1 = "A^1.1#B^1#I^2#f^0#p^1#d^2010-07-21T08:52:05.222ZKHBDGSLKHFBDSLKFGNIF#%$%^$#^$XLGCREWIGMEWCERG";
    String s2 = "A^1.1#B^1#f^0#p^1#d^2010-07-22T07:02:05.370ZREGHCOIMIYR$#^$#^$#^EWMGCOINNNNNNVVVRFGGYVJ667VTG";

    System.out.println( parseDate(s1) );
    System.out.println( parseDate(s2) );
}

public static String parseDate(String s) {
    int start = s.indexOf("d^") + 2;
    int length = 23;

    String date = s.substring(start, start + length);
    return date;
}

Output:

2010-07-21T08:52:05.222
2010-07-22T07:02:05.370

Bill the Lizard
ok, so from good programming practice, isn't assuming the length of a string bad?
zengr
@zengr: I have to assume it because I only have 2 samples to look at. You shouldn't assume it and you shouldn't have to. You should be able to find out if that date format is fixed length. The regular expression you accepted is assuming a known format too.
Bill the Lizard
@zengr: I will add though, if you can't verify that those two assumptions will be true for any input, you should definitely stick with the regex solution.
Bill the Lizard
A: 

I would go with a regex, something like (\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d\.\d{3}).

You might want to get fancier, and prevent months outside the range 01-12, days outside of 01-31 etc. for hours but it should be good enough as is given the sample data that you've provided.

Substrings might work if the date is always prefixed with ^d, but I still think the regex is cleaner.

crowne