views:

2428

answers:

7

My post below asked what the curly quotation marks were and why my app wouldn't work with them, my question now is how can I replace them when my program comes across them, how can I do this in C#? Are they special characters?

curly-quotation-marks-vs-square-quotation-marks-what-gives

Thanks

+2  A: 

According to the Character Map application that comes with Windows, the Unicode values for the curly quotes are 0x201c and 0x201d. Replace those values with the straight quote 0x0022, and you should be good to go.

String.Replace(0x201c, '"');
String.Replace(0x201d, '"');
Mark Ransom
+1  A: 

Note that what you have is inherently a corrupt CSV file. Indiscriminately replacing all typographer's quotes with straight quotes won't necessarily fix your file. For all you know, some of the typographer's quotes were supposed to be there, as part of a field's value. Replacing them with straight quotes might not leave you with a valid CSV file, either.

I don't think there is an algorithmic way to fix a file that is corrupt in the way you describe. Your time might be better spent investigating how you come to have such invalid files in the first place, and then putting a stop to it. Is someone using Word to edit your data files, for instance?

Rob Kennedy
Erm, I don't think this question has anything to do with CSV files... :)
AR
No, but the OP referenced question does.
GalacticCowboy
+3  A: 

When I encountered this problem I wrote an extension method to the String class in C#.

public static class StringExtensions
{
    public static string StripIncompatableQuotes(this string s)
    {
        if (!string.IsNullOrEmpty(s))
            return s.Replace('\u2018', '\'').Replace('\u2019', '\'').Replace('\u201c', '\"').Replace('\u201d', '\"');
        else
            return s;
    }
}

This simply replaces the silly 'smart quotes' with normal quotes.

[EDIT] Fixed to also support replacement of 'double smart quotes'.

Matthew Ruston
Your code works with single quotes, you need to use \u201c and \u201d for double quotes.
Mark Ransom
Fixed. Thanks man.
Matthew Ruston
+2  A: 

I have a whole great big... program... that does precisely this. You can rip out the script and use it at your leasure. It does all sorts of replacements, and is located at http://www.codeplex.com/typografix

Dmitri Nesteruk
A: 

Try this for smart single quotes if the above don't work:
string.Replace("\342\200\230", "'") string.Replace("\342\200\231", "'")

Try this as well for smart double quotes:
string.Replace("\342\200\234", '"') string.Replace("\342\200\235", '"')

flurb
+3  A: 

A more extensive listing of problematic word characters

if (buffer.IndexOf('\u2013') > -1) buffer = buffer.Replace('\u2013', '-');
if (buffer.IndexOf('\u2014') > -1) buffer = buffer.Replace('\u2014', '-');
if (buffer.IndexOf('\u2015') > -1) buffer = buffer.Replace('\u2015', '-');
if (buffer.IndexOf('\u2017') > -1) buffer = buffer.Replace('\u2017', '_');
if (buffer.IndexOf('\u2018') > -1) buffer = buffer.Replace('\u2018', '\'');
if (buffer.IndexOf('\u2019') > -1) buffer = buffer.Replace('\u2019', '\'');
if (buffer.IndexOf('\u201a') > -1) buffer = buffer.Replace('\u201a', ',');
if (buffer.IndexOf('\u201b') > -1) buffer = buffer.Replace('\u201b', '\'');
if (buffer.IndexOf('\u201c') > -1) buffer = buffer.Replace('\u201c', '\"');
if (buffer.IndexOf('\u201d') > -1) buffer = buffer.Replace('\u201d', '\"');
if (buffer.IndexOf('\u201e') > -1) buffer = buffer.Replace('\u201e', '\"');
if (buffer.IndexOf('\u2026') > -1) buffer = buffer.Replace("\u2026", "...");
if (buffer.IndexOf('\u2032') > -1) buffer = buffer.Replace('\u2032', '\'');
if (buffer.IndexOf('\u2033') > -1) buffer = buffer.Replace('\u2033', '\"');
Nick van Esch
A: 

How would those escape characters be handled in VB.NET?

Corey