views:

107

answers:

4

I need to hide phone numbers (and maybe other contact details) in user generated content to protect my users from anonymous web. Input is very random, therefore I'd be looking to replace anything that looks like a phone number (e.g.: string of 3 or more numbers) with just dots, and also perhaps remove some exotic notations of e-mail addresses.

What is the best way to do this? Nice and slick, reusable. Give away your secret regexes. Write in any language. Except perhaps COBOL :)

function privacy($str){
    // protect phone numbers

    // protect e-mail addresses

    // protect web addresses

}
+2  A: 

In Python, to replace three or more digits with three dots in string s:

import re
s = re.sub(r'\d{3,}', '...', s)

"Exotic notations of e-mail addresses" is hard for me to parse; maybe you mean s/thing like

s = re.sub(r'[\w.]+@[\w.]+', '<email redacted>', s)
Alex Martelli
Thank you, short and sweet, now just something that will nicely hide sloppy human web addresses. Hungry, but intelligent
Frank Malina
A: 

You can create a simple function that just replaces any alphanumeric character with a "." or any other character you want.

For example:

function HideInput($input) {
    $input = preg_replace("([a-zA-Z0-9])", "*", $input);
    return $input;
}
Cory
I think the problem is to actually **find** that part in the user submitted content. Every text consists of alphanumeric characters.
Felix Kling
+1  A: 

By web addresses I'm guessing you mean URLs. You could create an array that contains all possible domains (".ca",".com",".uk"....). You can then run a regex replace on any 'word' that contains one of the domains.

To do the replacement you can use Alec Martelli's code and instead of putting the '@' in your replacement put the join of the array of domains. The join function is explained on this site.

If in perl I would do the match like :

my $domainsString = join("|", @arrayOfPossibleDomains);
$s =~ s/\w+[$domainString]\w+/......./g;
Kyra
Yup, just what I needed to hide those "web addresses".
Frank Malina
A: 

Actually this was a terrible idea.

I ended up with something more elegant: Check if the input contains a lot of consecutive digits in a short shingle (phone) or something on a blacklist ('@', '[at]', '.com', '.co.uk'...) on each save and flag it for the admin to have a look.

Frank Malina