utf-8

Classic asp and utf 8

We have a website that uses classic asp. Part of our release process substitures values in a file and we found a bug in it where it will write the file out as UTF8. This then causes our application to start spitting out garbage. Apostrophes get returned as some encoded characters. If we then go an remove the BOM that says this file is...

How do I do a strtr on UTF-8 in PHP?

I'm looking for a UTF-8 compatible strtr for PHP. ...

Dummy's guide to Unicode

Could anyone give me a concise definitions of Unicode UTF7 UTF8 UTF16 UTF32 Codepages How they differ from Ascii/Ansi/Windows 1252 I'm not after wikipedia links or incredible detail, just some brief information on how and why the huge variations in Unicode have come about and why you should care as a programmer. ...

Visual Studio C# disable unicode or utf-8 as file encoding and use ASCII instead.

Hello *, I am currently working on some LaTeX document which embeds C# files generated by Visual Studio 2008. My problem is that these files are encoded in UTF-8 with BOM. This causes LaTeX to produce output similar to the output described in this post: Invalid characters in generated latex sources in Doxygen? I know that I can use a t...

Microsoft SQL server 2008 and UTF-8

A colleague tells me that there is no way to bulk insert UTF-8 encoded data into a Microsoft SQL server 2008. Can this be true? Is there any stuff you would recommend him to read or look at? ...

HTML encoding issues - "Â" character showing up instead of " "

Hey everyone, I've got a legacy app just starting to misbehave, for whatever reason I'm not sure. It generates a bunch of HTML that gets turned into PDF reports by ActivePDF. The process works like this: Pull an HTML template from a DB with tokens in it to be replaced (e.g. "~CompanyName~", "~CustomerName~", etc.) Replace the tokens...

character encoding problem - cross-domain scripting

Hello, I have an Asp.Net web app which users include a script tag in their web page, and get data from my server (using JsonP & a Generic handler (ashx)) The data is in hebrew, and I set the charset to utf-8 in the response. When the client web site (which displays the data) uses "windows-1255" I don't see the text properly. The sc...

Zend_Layout doesn't appear to be encoding content as UTF-8

I am using the MVC functionality in the Zend Framework 1.9, and it appears that Zend_Layout is not encoding the view content using UTF-8, despite this being set in the heading. The layout script is shown below. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt; <...

PHP DOMDocument with åäö (UTF-8)

Hi! I got a HTML/PHP5 page with a form, then when it gets posted, it creates a XML file with the form input as data. But all åäö looks like if I had used utf8_encode() on them. I can't utf8_decode() them, because then the "service" I send the XML files to, complains that is not UTF-8 (like it should). Parser failed. Reason :2: parser ...

Convert ANSI characters to UTF-8 in Java

Is there a way to convert an ANSI string to UTF using Java. I have a custom serializer that uses readUTF & writeUTF methods of the DataInputStream class to deserialize and serialze string. If i receive a string encoded in ANSI and is too long, ~100000 chars long i get the error; Caused by: java.io.UTFDataFormatException: encode...

Broken Accented Characters in a MailTo Link

I'm trying to create a mailto link that contains french accented characters as the subject and email body. Both HTML and URI encoding the chars does not work. Here is my code: <a href="mailto:%20?subject=ce%20titre%20est%20cass%C3%A9.&body=travaux%20deja!%20cesser%20d'%C3%AAtre%20t%C3%AAtu">SEND EMAIL</a> Same result occurs without U...

PHP: Checks to see if a string is utf8 encoded? How?

function seems_utf8($str) { $length = strlen($str); for ($i=0; $i < $length; $i++) { $c = ord($str[$i]); if ($c < 0x80) $n = 0; # 0bbbbbbb elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb ...

Detecting utf8 broken characters in MySQL

I've got a database with a bunch of broken utf8 characters scattered across several tables. The list of characters isn't very extensive AFAIK (áéíúóÁÉÍÓÚÑñ) Fixing a given table is very straightforward update orderItem set itemName=replace(itemName,'á','á'); But I can't get a way of detecting the broken characters. If I do something...

Generate random UTF-8 string in Python

I'd like to test the Unicode handling of my code. Is there anything I can put in random.choice() to select from the entire Unicode range, preferably not an external module? Neither Google nor StackOverflow seems to have an answer. Edit: It looks like this is more complex than expected, so I'll rephrase the question - Is the following co...

UTF-8 encoding issue

Hi, I am trying to fetch data from rss feed (feed location is http://www.bgsvetionik.com/rss/ ) in c# win form. Take a look at the following code: public static XmlDocument FromUri(string uri) { XmlDocument xmlDoc; WebClient webClient = new WebClient(); using (Stream rssStream = webClient.OpenRead(uri)...

UTF-8 coding question (what is the last unicode character)

Hello, we are opening up our application to allow support for multiple languages. one of the problems we have encountered along the way is a feature we provide our customers. Imagine for a moment the user is presented with 3 fields. All customers is a toggle From Customer Name is a field they can type in To Customer Name is a...

Delphi, charset detection ([Uni]SynEdit) - Utf8Decode problem

I'm using Unicode SynEdit, which (in theory) has basic file/stream encoding detection. It worked fine until I tried opening the file which was generated by my PHP script. The file I'm talking about is detected by UniSynEdit as utf8 with no BOM. Unfortunately, it doesn't open - the loaded string is empty. I debugged it, and it seems that ...

To do RegEx, what are the advantages/disadvantages to use UTF-8 string instead of unicode?

Usually, the best practice in python, when using international languages, is to use unicode and to convert early any input to unicode and to convert late to a string encoding (UTF-8 most of the times). But when I need to do RegEx on unicode I don't find the process really friendly. For example, if I need to find the 'é' character follow...

C# UTF-8 Encoding Problem

I've searched posts here on Stack Overflow, and read JoelOnSoftware's post on encoding, and now have a basic grasp of encoding issues. But I'm running into a problem with some character encoding coming from the Windows clipboard. The reproducible test is to use IE and select and copy the "Advertising Programs" text from the Google home...

Ñ not displayed in google app engine website

I'm using google app engine to build a website and I'm having problems with special characters. I think I've reduced the problem to this two code samples: request = urlfetch.fetch( url=self.WWW_INFO, payload=urllib.urlencode(inputs), method=urlfetch.POST, headers={'Content-Type': 'application/x-www-form-urlencode...