ansaurus

Question

Extracting email addresses in an html block in ruby/rails

Answer 1

A:

Would this work?

/\b(?<!mailto:)[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i

The (?<!mailto:) is a negative lookbehind, which will ignore any matches starting with mailto:

I don't have Ruby set up at work, unfortunately, but it worked with PHP when I tested it...

John Yeates 2010-05-06 15:21:45

i tried it using rubular but it says Undefined (?...) sequence.i think the < is the culprit. what does it stand for again?

corroded 2010-05-06 15:25:44

Hmm, looks like Ruby doesn't support lookbehind according to http://www.ruby-doc.org/docs/ProgrammingRuby/html/language.html#UJ - that's annoying.The ?<! means that the string you're matching (the email address) mustn't be preceded by the lookbehind string (mailto:) in order for the match to succeed. In this case you'd probably be best off with serg555's suggestion.

John Yeates 2010-05-07 08:04:07

i'd up this since it is also helpful but i don't have the right priveleges. anyway, thanks for the help!

corroded 2010-05-11 08:05:43

Answer 2

A:

Another option if lookbehind doesn't work:

/\b(mailto:)?([A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4})\b/i

This would match all emails, then you can manually check if first captured group is "mailto:" then skip this match.

serg 2010-05-06 15:29:16

it works using rubular, but just another question, how do i check if the first captured group is mailto? I'll pass it to the function again?here is my current code for the obfuscator: (see above)

corroded 2010-05-06 15:44:24

Sorry, I am not familiar with Ruby. Usually when you do a regexp search it will return you an array of matched elements, which are split into captured groups.

serg 2010-05-06 16:31:52

I researched on that too but then again, you will have to know which group to pick. What im trying to aim at here is to 'replace on the fly', where something like this could happen:1. start parsing the block of text2. oh i see an email address, lemme invert that.3. oh i see another email address, but then this one has a mailto: before it, it must be a hyperlink. move on.4. i see an email again, this time with no mailto:, invert it again.5. back to step 2and so on.

corroded 2010-05-06 17:23:20

So do you have an ability to check every matched email and act differently based on what it contains?

serg 2010-05-06 17:42:35

i think so. does that mean each email i get should be checked against another regex? or maybe i can do a regex that returns an email address or an email address with a mailto: appended. then make an if statement that should decide whether or not to reverse it?

corroded 2010-05-07 04:58:16

I think this is what you have been suggesting yesterday and i kinda got lost(maybe because I have been at it for hours). I just got back on this today and I tried your regex then having an if-else statement that checks if the string has a mailto: and voila! Thanks!

corroded 2010-05-07 05:31:09

Answer 3

A:

Why not just store all the matched emails in an array and remove any duplicates? You can do this easily with the ruby standard library and (I imagine) it's probably quicker/more maintainable than adding more complexity to your regex.

emails = ["[email protected]", "[email protected]", "[email protected]"]
emails.uniq # => ["[email protected]", "[email protected]"]

Damien Wilson 2010-05-06 16:51:48

As said in the function above, i will just replace the emails with their inverted counterparts, meaning if i put them in an array i will have to remember from what part of the text blog i got them from.

corroded 2010-05-06 17:20:19

ansaurus

tags:

views:

answers:

Extracting email addresses in an html block in ruby/rails

related questions