views:

138

answers:

2

I have a function that I have used a bunch of times in various files which has a signature like:

Translate("English Message", "Spanish Message", "French Message")

and I am wanting to pull out the English, Spanish and French messages and then output them into a csv so that people who actually know these languages can tell me what I SHOULD have put in there.

Anyway, what I am running into is that some French and Spanish messages don't show up because of the accented characters and single quotes.

This is a vb.net program.

Edit

There was no problem with the language, my issue was actually the regular expression and my complete lack of understanding regular expressions.

+1  A: 

Depends on the regex library you are using. Sane regex implementations use UTF-8 and have no such problems, but more details would be helpful about what lang you are using, what regex library etc.

anselm
+1  A: 

If there is a DOTALL flag in your language's regex implementation, you might want to set it.

Alternatively, change the regex to capture a negated character class instead, like so:

([^your_delimiter]*?)

with your_delimiter being the character(s) immediately succeeding the string that you want to capture.

See this for further discussion:

http://en.wikipedia.org/wiki/Regular_expression#Unicode

prometheus