utf-8

Output DataTable XML in UTF8 rather than UTF16

Hi, I have a DataTable that I'm creating an XML file from using .WriteXML(..), although I have a problem with it exporting in UTF-16 encoding and there seems to be no apparent way of changing this. I understand that .NET uses UTF-16 internally within strings, is this correct? I'm then running the XML that DataTable.WriteXML() produces...

Problem with encode decode. Python. Django. BeautifulSoup

In this code: soup=BeautifulSoup(program.Description.encode('utf-8')) name=soup.find('div',{'class':'head'}) print name.string.decode('utf-8') error happening when i'm trying to print or save to database. dosnt metter what i'm doing: print name.string.encode('utf-8') or just print name.string Traceback (most recent ca...

Do I really need to encode '&' as ' &' ?

I'm using an '&' symbol with HTML5 and UTF-8 in my site's <title>. Google shows the ampersand fine on its SERPs, as do all the browsers in their titles. http://validator.w3.org is giving me this: & did not start a character reference. (& probably should have been escaped as &amp;.) Do I really need to do &amp;? I'm not fussed abo...

Funny characters in my db

My web app is breaking when I try edit a certain content type and I'm pretty sure it is because of some weird characters in my database. So when I do: SELECT body FROM message WHERE id = 666 it returns: <p>⢠<span></span></p><p><br /></p><p><em><strong>NOTE:</strong> Please remember to use your to participate in the discussion.</em>...

Javascript object serialisation with toSource() converts special chars to hex code - how to reverse?

If I'm converting a simple JavaScript object to a string, all special chars will be converted to hex code. function O() { this.name = "<üäö!"; } var myObject = new O(); console.log(myObject.toSource()); Result: {name:"<\xFC\xE4\xF6!"} How would I avoid this or convert all hex chars back to utf8 chars? ...

Problems with utf8, php, and mysql

I'm fetching the JSON timeline from twitter and parsing it through PHP. I then want to store the text in my database. The PHP script is in UTF8, I set the header to utf8 using this code, just in case: header('Content-type: text/html; charset=UTF-8'); The table in the database uses utf8_general_ci, ... Not even encoding the text usin...

DOMDocument encoding problems / characters transformed

I am using DOMDocument to manipulate / modify HTML before it gets output to the page. This is only a html fragment, not a complete page. My initial problem was that all french character got messed up, which I was able to correct after some trial-and-error. Now, it seems only one problem remains : ' character gets transformed into ? . Th...

Unicode issue with an HTML Title, question mark? 65533;

hi all, I'm trying to parse the title from the following webpage: http://kid37.blogger.de/stories/1670573/ When I use the apache.commons.lang StringEscapeUtils.escapeHTML method on the title element I get the following Das hermetische Caf&#65533;: Rock &amp; Wrestling 2010 however when I display that in my webpage with utf-8 encodin...

Does Zend Framework application need mbstring for UTF8 support?

I'm building a web app in zend framework that needs UTF8 support for all languages. This seems to work fine except for functions like stripslashes and such. On this URL, they talk about using MBSTRING http://developer.loftdigital.com/blog/php-utf-8-cheatsheet Is it necessary to use mbstring on my server and replace ALL occurences of U...

Problem writing UTF-8 encoded file in PHP

Hi all, I have a large file that contains world countries/regions that I'm seperating into smaller files based on individual countries/regions. The original file contains entries like: EE.04 Järvamaa EE.05 Jõgevamaa EE.07 Läänemaa However when I extract that and write it to a new file, the text becomes: EE.04 Järvamaa EE...

php & mysql converting non- to unicode

I have characters like these on our web site: Fémnyomó That is a street address, entered in another language (though I do not know which). Here's the db setup: mysql 4.1.2log charset cp1252 West European (latin1) I'm using PHP but without mbstrings() (though I do no string conversions on this address, just echo). If I changed...

Convert UTF8 string into numeric values in Perl

For example, my $str = '中國c'; # Chinese language of china I want to print out the numeric values 20013,22283,99 ...

What kind of utf8 encoding is being used in members of String class in Java?

String class has a constructor: new String(byte[] bytes, Charset charset) and a method: byte[] getBytes(Charset charset) Given that I define my charset as follows: Charset charset = Charset.forName("UTF-8"); What kind of encoding I will in fact use? More specifically is it a standard UTF-8 (as described in RFC 3629), or CESU-...

Insert a ♥ into MySQL (heart character) via PHP

I'm having a heck of a time getting ♥ type characters into my database using php. I've got UTF-8 setting on the page <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> and <?php $line = $_REQUEST['line']; $line = stripslashes($line); $line = htmlspecialchars($line); $line = trim($line); $line = mysql_real_escape...

strpos searching for unicode in PHP (and handling inline UTF-8)

I am having a problem dealing with a simple search for a two character unicode string (the needle) inside another string (the haystack) that may or may not be UTF-8 Part of the problem is I don't know how to specify the code for use in strpos, and I don't know if PHP has to be compiled with any special support for the code, or if I have...

How do I use Unicode Character Combining with Kanji/Hanzi ?

I'm trying to find a workaround to display old and rare characters in unicode using character combining. Currently I'm converting some dictionaries from EPWING into text and there are 36 different characters which cannot be reproduced using normal UTF-8. Below is the problem section of the epwing gaiji to unicode mappings for one of the ...

PHP DOM UTF-8 problem

First of all, my database uses Windows-1250 as native charset. I am outputting the data as UTF-8. I'm using iconv() function all over my website to convert Windows-1250 strings to UTF-8 strings and it works perfect. The problem is when I'm using PHP DOM to parse some HTML stored in the database (the HTML is an output from a WYSIWYG edi...

NSString and UTF8 Hex Conversion

In this function, I get the selected emoticon from NSTableView from the NSArrayController connected to an IBOutlet called emotes. The string in the selected NSArray contains UTF8 characters that is sent to the clipboard. // Get Selected Emoticon NSArray * selectedemote = [emotes selectedObjects]; NSLog(@"%@",[selectedemote valueForKey:@...

Confict between UTF-8 normalized-forms of encoding for accents

Hello, I've got a bug with UTF-8 normalizations: as far as I understood, there's (at least) two ways to write an 'é' in UTF-8 : CC 81 and C3 A9. [After a migration from Mac/OSX to a PC/Linux] I now have a conflict between the paths I store in my database and the actual file system structure, which prevents me from accessing correctly...

Internet Explorer 6 : Euro sign not displayed on ajax request

Hello, i'm using jquery 1.4.2 to send an ajax request to a php page then display the result. This works fine with FF3 and IE8, but in IE6 the character € is replaced by a square, i tried to force the character encoding of the php page using header() but it didn t work... I'm working on windows with Zend Studio for eclipse (projet enco...