views:

72

answers:

1

Hi,

Is it possible to replace all the special characters in a matlab vector through a regular expression?

Thank you

*EDIT: *

Thank you for your responses. I'm trying to achieve the following. I have a text file that contains few paragraphs from a novel. I have read this file into a vector.

fileText = ['Token1,' 'token_2' 'token%!3'] etc.

In this case , _ % ! are the special characters and I would like to replace them with blanks (''). Can this be achieved through regular expressions? I can do this with javascript, but can't get it to work in Matlab.

Thank you

+5  A: 

If by "special characters" you mean less-frequently used Unicode characters like ¥, , or ¼, then you can use either the function REGEXPREP or set comparison functions like ISMEMBER (and you can convert the character string to its equivalent integer code first using the function DOUBLE if needed). Here are a couple examples where all but the standard English alphabet characters (lower and upper case) are removed from a string:

str = ['ABCDEFabcdefÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐ'];   %# A sample string
str = regexprep(str,'[^a-zA-Z]','');      %# Remove characters using regexprep
str(~ismember(str,['A':'Z' 'a':'z'])) = '';  %# Remove characters using ismember
                                             %#   (as suggested by Andrew)
str(~ismember(double(str),[65:90 97:122])) = '';  %# Remove characters based on
                                                  %#   their integer code

All of the options above produce the same result:

str =

ABCDEFabcdef


EDIT:

In response to the specific example in the updated question, here's how you can use REGEXPREP to replace all characters that aren't a-z, A-Z, or 0-9 with blanks:

str = regexprep(str,'[^a-zA-Z0-9]','');

This may be easier than trying to write a regex to match each individual "special" character, since there could potentially be many of them. However, if you were certain that the only special characters would be _, %, and !, this should achieve the same as the above:

str = regexprep(str,'[_%!]','');

Also, as mentioned in the comment by Amro, you could also use the function ISSTRPROP to replace all non-alphanumeric characters with blanks like so:

str(~isstrprop(str,'alphanum')) = '';
gnovice
+1. Note that 65:122 includes non-alpha characters like [ \ ] `. (Do "disp(char(65:122))" to confirm.) No need to convert to double: ismember() operates characterwise if both inputs are chars and not cellstrs. So "ismember(str, ['A':'Z' 'a':'z'])" works too and IMO is a little more readable than using numeric character codes.
Andrew Janke
@Andrew: You're right. I forgot to break the range into parts. Also, good suggestion on using ISMEMBER directly on the character strings. I didn't realize something like `'A':'Z'` would remain a character array instead of being converted automatically to double.
gnovice
I think that regexprep is what I'm after. Instead of doing '[A-Za-z]' I built a regexp for all special characters. This didn't work for me. I will run your code in a minute.
vikp
Thank you for your help!
vikp