views:

687

answers:

4

Hi,

How to replace special characters using regular expressions? By special, what I mean is those symbolic characters that appear sometimes in text.

For example, in text below, I want to remove the bubble which is at the start of each line.

Passport Details

Name as on passport
Relationship
Passport Number
Date of Issue
Expiry Date
Place of Issue

Question edited : Sorry, the bubble at the start of line is no more visible.After submitting question, stackoverflow removed that special character.

Anyone knows how to replace those special characters? I dont want to replace characters like #, @ or !. These are trivial and can be typed with keyboard.

Sorry, I dont know how to put those special characters in my question.I will try to explain. In word file, we put bullets before text. I want to replace characters reprenting such characters. I have some text files which contain characters which look like bubble.

Finally, I found out the solution. This regular expression works for me

([^(A-Za-z0-9)+|\r|\n|\t|'|"|#|;|:|/|\|.|,| ])

+1  A: 

It would be possible to find all "special" characters with this regular expression and then just replace them with a space character:

/[<special_characters_here>]/

However, usually it is better to use whitelisting, thus mentioning all allowed characters and replacing everything that's not them with a space character:

/[^<allowed_characters_here>]/
Franz
But there are lots of special characters and it will be difficult to find ASCII codes of those characters and then inserting those codes into regular expression.Is there any class for such characters?
Shekhar
You could use something like Kinopiko mentioned. However, I can't tell you more, because I don't know which (kind of) characters you want to allow or prohibit...
Franz
Thanks Franz for help.
Shekhar
+1  A: 

(This was posted before the language had been specified.)

To replace non-ascii characters with a space in Perl,

 $string =~ s/[^[:ascii:]]/ /g;

See http://codepad.org/KTMvQiOz . Here the [^[:ascii:]] is a regex which matches any non-ascii character.

Kinopiko
oh, thanks Kinopiko.I will try to find how to do the same in C#
Shekhar
Thanks KinoPiko for help.
Shekhar
A: 

Do you mean replacing the carriage return and new line characters?

If that's what you're after, this would do it:

var source = "once\r\ntwice\r\nthrice";
var pattern = new Regex(@"\r\n");
var result = pattern.Replace(source, ",");
Assert.AreEqual("once,twice,thrice", result);
Adam
Sorry Adam, I dont want to replace new line or carriage returns. I dont know how to put those special characters in my question.I will try to explain. In word file, we put bullets before text. I want to replace characters reprenting such characters. I have some text files which contain characters which look like bubble.
Shekhar
A: 

I don't have enough time to flesh out a full example. But since you're using .NET you can match on any number of these character classes:

http://msdn.microsoft.com/en-us/library/20bw873z.aspx

Choose what you want to accept and replace anything that is not equal to that set.

Andrew Barrett
Thank you Andrew for help.
Shekhar