views:

1707

answers:

5

I have an existing ASP.NET 2.0 website, stored in Team Foundation Server 2005. Some of the pages/controls are encoded as ANSI (according to Notepad++) and the Content-Type header is set to:

<meta http-equiv="Content-Type" content="text/html; charset=windows-1252"/>

I would like to change all pages to UTF-8, and therefore the Content-Type header to:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

Other than changing the meta element, I assume I also need to change the encoding of all the files. I can do this in Notepad++ though if anyone has any quicker methods, please mention them.

What sort of problems might I face when it comes to merging/comparing in TFS?

+1  A: 

It depends on how much of the text in your codebase is using characters outside the ASCII range of 0..127.

You might want to scan for those first, to see how much impact it will have. If your codebase is primarily in English, then you probably don't have much to worry about.

Barry Kelly
Just to point out that it's not just the codebase he needs to worry about; if there's any dynamic content in the database, that would need to be converted as well.
Simon Howard
That's not going to affect merging / comparing in TFS, though; however, you're quite correct wrt. the composition of pages using data from the DB etc.
Barry Kelly
+2  A: 

I would write a Python script

for fn in os.listdir(srcdir):
    data = open(srcdir+"\\"+fn, "rb").read().decode("windows-1252")
    data = data.replace("charset=windows-1252", "charset=utf-8")
    open(srcdir+"\\"+fn, "wb").write(data.encode("utf-8"))

The update of the charset assumes that this specific string won't occur elsewhere; you can make it more robust by checking for a longer string, checking whether the old text actually exists in the file, doing proper XML parsing, etc.

You might need to put an UTF-8 signature in front of the UTF-8-encoded data; you find one in codecs.BOM_UTF8

I don't know what consequence this change has for TFS.

Martin v. Löwis
A: 

Something useful I just discovered is that you can right-click on a file on Source Control Explorer, then choose Properties. You can then see/modify the encoding as far as TFS is concerned.

tjrobinson
A: 

Pick a file that has a character above the 0-127 ASCII range. Open that with notepad, choose Save As and pick UTF-8 for the encoding. Then see if the character is successfully converted.

To automate the procedure, you could write an application that converts all the files from ASCII to UTF-8, using 1252 as code page. If you don't have characters above 127, you do not need to worry about all these.

kgiannakakis
A: 

This is not necessarily true. I don't know about ASP.net but we do all our php coding here in Ansi and serve the pages in UTF8. All our database information is stored in UTF8 as well.

smack0007