tags:

views:

32

answers:

1

This has probably been answered dozens of times, couldn't find the answer though...

Anyway, working in a Smalltalk environment I have a string composed of three parts like this: "ttm 4/6/97 00:08". The string is very variable, meaning some part may be missing (so it would be a string composed of less then three parts like "ttm 4/6/97") and the order might be different ("00:08 ttm 4/6/97"). Also, the date might be missing digits ("04/6/1997", "4/06/97", etc.). Luckily, the spaces as separators are constant plus, there are no leading or trailing spaces.

What I need is to find the date in the string and match two portions like so: (4/6/)(97). I've had success so far for certain combinations only with something like this:

^\S*\s?(\d\d?/\d\d?/)(\d\d\d?\d?)\s?\S*

Could somebody please point me in the right direction?

Edit: To make the problem simpler I tried to get a match for the entire date using the following expression: ^.*(\d\d?/\d\d?/\d\d\d?\d?).* (the matcher expects that the expression match the entire string). Applied to the string "10/4/97 00:08 di" the match returned is "0/4/97" (maybe the matcher isn't the greatest ever built...). Could the regex string somehow be modified to ensure that the matcher returns all of the date? (Thanks to SilentGhost for the editing).

+2  A: 

It seems you're trying to match too many elements. If date is the only thing you're interested in, why not go with the following:

(\d{,2}/\d{,2}/)(\d{2,4})

Note, that it would match 3-digit year too, in theory. I assume you won't have such an input in practice.

SilentGhost
Thanks mate. Unfortunately the regex library doesn't provide a mechanism like the one with the curly brackets. I'v thought about the problem a bit more and I think I might be able to first extract the entire date and then just split that string at the delimiters. I'll give that a try.
theseion
@theseion: you don't have to use braces quantifiers. You're fine with the way you were doing it. The point of my answer was that you don't need to try to match anything else but date. `(\d+/\d+/)(\d+)` will also do just fine.
SilentGhost
Ah, ok. No expert on regex :) Anyway, with that expression I still have the same problems described in the edit: the matcher needs the regex string to match the entire string, so I used `^.*(\d+/\d+/\d+).*`; the `1`is not part of the resulting match (otherwise your expression does of course match exactly the same as mine).This is driving me nuts...
theseion
@theseion: using `^(\d+/\d+/\d+).*` would match `"10/4/97 00:08 di"` returning `"10/4/"` and `"97"`. But you don't have to match the whole string, that's the point.
SilentGhost
@SilentGhost: The method I used expects that the expression match all of the string. There seems to be a bug in the matcher, as the only match returned for the first pair of brackets is `"0/4/97"`. I was able to use your regex string using another, similar method to extract the full date string though (sadly without the extra matches. There's something seriously wrong with the library...) and that lets me continue with my work. Thanks for bearing with me!
theseion