tags:

views:

358

answers:

4

I get travel confirmations that look like this: "SQ 966 E 27JUL SINCGK" = "Airline Space Flight Space BookingClass Space Date_with_Month_as_name Space 3LetterFrom 2LetterTo". I can chop all this into pieces using a regex to submit it to a website. But the site would expect instead of 27JUL 27/07/2009 or at least 27/07. Is there a way to transform a regex result based on a piece in the input. Jan -> 01, Feb -> 02 ... Dec -> 12.

(Regex flavour is Java)

+5  A: 

DateFormat is a more appropriate class:

DateFormat output = new SimpleDateFormat("dd/MM", Locale.US);
DateFormat input = new SimpleDateFormat("dd MMM", Locale.US);
System.out.println(output.format(input.parse("24 Dec")));

output:

24/12
dfa
I'm aware of Java's Dateformat. Unfortunately I can't change the Java parts of the system and all I can manipulate the regex that extracts the data.
stwissel
as you can see I've used DateFormat to parse (as regex)
dfa
A: 

It isnt a regex solution, but you could use SimpleDateFormat to help you with your final formatting. You should note in the JavaDoc that this is not a thread-safe option out of the box.

Alternatively, you could use DateFormatSymbols.getShortMonths() and iterate over the months to identify the index* and format your string manually.

*dont forget to add 1 ;)

edit:

I am not sure what you are looking for is possible in Java regex without the ablility to make code changes. The conditional constructs that Perl supports are not supported by Java because Java provides if-then-else support as a language feature.

akf
Sorry. I can't touch the java parts of that system, only the regex can be configured.
stwissel
Pity. Unless this is an application with sealed jars, you should be able to put a jar in front of the classpath, with a redefined version of the troublesome class. Just to give you one more weapon against the bad data :)
Thorbjørn Ravn Andersen
A: 

I would be very careful with doing this with regular expressions as they don't tell you how the conversion went.

Extract every bit of information manually. Sanity check everything, and then use the SimpleDateFormat parser to get a Date object you can use from there on.

Thorbjørn Ravn Andersen
yep. Regex are fun. They probably could solve world hunger if a sufficient number of people would understand them. My challenge: the regex is in a configuration setting of a system I otherwise can't touch.
stwissel
Regexps cannot keep recursion state. Just try doing HTML-validation with one. But to answer your question properly you must be very specific about WHAT you have available, and how this can be fixed. Notably, the regexp engine implementaiton is important - the one in the runtime library is a lot less capable than a full blown Perl regexp engine.
Thorbjørn Ravn Andersen
@ Thorbjørn: It is a product written in Java (and I don't have access to the source code). What I can do is to configure a Regex and map the resulting groups to fields in the target systems (e.g. a web form or a database table) So everything I can do needs to "happen" in the regex string.... and thx to spend time on this.
stwissel
Can you change the URL of the remote system? Then you could point it at your own implementation and get full control.
Thorbjørn Ravn Andersen
+1  A: 

In Perl syntax (s{pattern}{replacement}):

s{([0-9][0-9])JAN}{\1/01}
s{([0-9][0-9])FEB}{\1/02}
s{([0-9][0-9])MAR}{\1/03}
s{([0-9][0-9])APR}{\1/04}
s{([0-9][0-9])MAY}{\1/05}
s{([0-9][0-9])JUN}{\1/06}
s{([0-9][0-9])JUL}{\1/07}
s{([0-9][0-9])AUG}{\1/08}
s{([0-9][0-9])SEP}{\1/09}
s{([0-9][0-9])OCT}{\1/10}
s{([0-9][0-9])NOV}{\1/11}
s{([0-9][0-9])DEC}{\1/12}

(Yes this is long and ugly, but it would probably work).

user9876
wouldn't it be rather s{([0-3][0-9])Monthname}{.... ?
stwissel
the S{ }{ } id from Perl? (I don't know Perl). So there wouldn't be a "native" masking/replacing in RegEx?
stwissel
yes, this is 12 seperate search-and-replace operations. Each one does one month.
user9876