ansaurus

Question

Regex, how to remove all non-alphanumeric except colon in a 12/24 hour timestamp?

Answer 1

+1 A:

I assume you'd like to keep spaces as well, and this implementation is in python, but it's PCRE so it should be portable.

import re
x = u'Today, 3:30pm - Group Meeting to discuss "big idea"'
re.sub(r'[^a-zA-Z0-9: ]', '', x)

Output: 'Today 3:30pm Group Meeting to discuss big idea'

for a slightly cleaner answer (no double spaces)

import re
x = u'Today, 3:30pm - Group Meeting to discuss "big idea"'
tmp = re.sub(r'[^a-zA-Z0-9: ]', '', x)
re.sub(r'[ ]+', ' ', tmp)

Output: 'Today 3:30pm Group Meeting to discuss big idea'

Bryan McLemore 2009-11-17 02:14:09

What about "Today, 3:30pm - Group meeting: discuss big idea" - the colon after "meeting" won't be removed.

Greg Hewgill 2009-11-17 02:19:42

@Cadwag, this solution removes colons even when they are outside of timestamps. Surely you don't want this?

J-P 2009-11-17 02:26:59

Yea, in my excitement I seem to have acted prematurely. But it seems to act as Greg Hewgill says - leaving colons that are outside of timestamps

cadwag 2009-11-17 02:30:37

Not sure about Python, but my C# solution below might solve this problem with neg. look forward / backward. Also check Rubens Farias solution, that should work with Python too.

Abel 2009-11-17 02:35:22

Answer 2

+4 A:

Abel 2009-11-17 02:16:29

@Cadwag, you said you got an error about neg look forward/behind must be fixed width only. That's a restriction of many regex flavors (not .NET though). I'll update my answer with that in mind.

Abel 2009-11-17 02:43:39

Thanks alot for your help.I am trying your solution in Unix normal python using the python example given by 'Bryan McLemore'. So using your solution, it the 3rd line looks like `re.sub(r'(<![012]?\d):(>!\d\d(?:[ap]m)?)|[^A-Za-z\d: ]', '', x)`But it doesn't seem to do anything when I run it. I'm sorry for all the hassle. I'm just starting out with python and have never been very good with regex. Thanks again

cadwag 2009-11-17 02:45:59

Nice work Abel, just testing it though, it doesn't seem to match the colon in "3:3".

J-P 2009-11-17 02:48:08

@J-P: is that testing in Python or in .NET? And did you use my original design, because then: indeed, it would consider 3:3 as a time.

Abel 2009-11-17 02:51:00

@Cadwag: if you've trouble with regexes, check this list of online regex testers: http://www.undermyhat.org/blog/2009/09/overview-of-online-regular-expression-testers/. PCRE is what I believe Python uses internally.

Abel 2009-11-17 02:54:59

Hmm, I thought PCRE lookbehinds were constructed like `(?<!` ... not `(<!`

J-P 2009-11-17 03:02:24

Thanks, J-P, that was exactly my mistake! (and a few others, updating now to correct them)

Abel 2009-11-17 03:16:11

Abel, hats off to you. Really went above and beyond the call of duty there. Your latest solution seems to work perfectly!Seriously, thanks so much for all your help.

cadwag 2009-11-17 03:33:09

You're welcome, glad to be of help. Make sure to check the explanation, it may help ;-) This proofs very helpful when dealing with this kind of stuff (visualizer): http://regex.powertoy.org/ (turn it into Perl mode)

Abel 2009-11-17 03:39:14

Answer 3

+1 A:

You can try, in Javascript:

var re = /(\W+(?!\d{2}[ap]m))/gi;
var input = 'Today, 3:30pm - Group Meeting to discuss "big idea"';
alert(input.replace(re, " "))

Rubens Farias 2009-11-17 02:16:44

Interesting how many solutions are given. You replace any non-word character with a space, that means `discuss "big idea"` becomes `discuss big idea ` (i.e., extra spaces). Use something like `/(( )|\W)(?!\d{2})/g;` and `.replace(re, "$2")` (or was it `\1` in JS?). This will leave the spaces and remove the rest. I call this "conditional replacement".

Abel 2009-11-17 02:23:48

hmm, yet another Markdown in comments bug: the extra space in `discuss big` got lost...

Abel 2009-11-17 02:25:13

interesting approach, Abel, ty

Rubens Farias 2009-11-17 02:28:32

Answer 4

+1 A:

Python.

import string
punct=string.punctuation
s='Today, 3:30pm - Group Meeting:am to discuss "big idea" by our madam'
for item in s.split():
    try:
        t=time.strptime(item,"%H:%M%p")
    except:
        item=''.join([ i for i in item if i not in punct])
    else:
        item=item
    print item,

output

$ ./python.py
Today 3:30pm  Group Meetingam to discuss big idea by our madam

# change to s='Today, 15:30pm - Group 1,2,3 Meeting to di4sc::uss3: 2:3:4 "big idea" on 03:33pm or 16:47 is also good'

$ ./python.py
Today 15:30pm  Group 123 Meeting to di4scuss3 234 big idea on 03:33pm or 1647 is also good

NB: Method should be improved to check for valid time only when necessary(by imposing conditions) , but i will leave it as that for now.

ghostdog74 2009-11-17 02:56:19

Nice approach, but you need a few more tweaks to handle the 24-hour time stamp requirement ("15:30" instead of "3:30pm")

Ned Deily 2009-11-17 03:32:10

what do you mean? it doesn't matter right? %H is from 00 to 24 inclusive.

ghostdog74 2009-11-17 03:38:27

`16:47` becomes `1647` in your example, I think that's what Ned means.

Abel 2009-11-17 03:44:24

Btw, though it isn't specified in the q., my solution allows time in text, yours splits on word boundaries prior to that: "This12:40 is late" is silly of course, not sure how the OP would want to deal with that (my solution leaves the colon, yours will delete it).

Abel 2009-11-17 03:47:40

@abel, i see. anyway, there are much to take care of since we are only working on limited data in this case. I will just leave it as that.

ghostdog74 2009-11-17 03:53:23

+1 for the alternative non-regex approach anyway! Nice example.

Abel 2009-11-17 04:00:25

Answer 5

A:

s="Call me, my dear, at 3:30"

re.sub(r'[^\w :]','',s)

'Call me my dear at 3:30'

Jyotirmoy Bhattacharya 2009-11-17 04:39:23

ansaurus

tags:

views:

answers:

Regex, how to remove all non-alphanumeric except colon in a 12/24 hour timestamp?

related questions