character-encoding

Validator: How to settle this error?

I get this error in the validator: Line 47, Column 187: character "&amp;" is the first character of a delimiter but occurred as data …num, silver diamonds. cartier tiffany & Co. $18 WALKING LIBERTY DOLLARS $15… This message may appear in several cases: You tried to include the "<" character in your page: you should escape it as ...

One control returns contents as single-byte, another as double-byte?!

I have 2 CRichEditCtrls. One is part of a dialog template, created automatically. When I call GetSelText on it, the bytes returned are one byte per char, i.e I'll get back char *str={'a','n','d'}. The 2nd control is created dynamically using the Create method, and the data returned calling GetSelText is returned in 2-byte characters: cha...

Python: Convert Unicode to ASCII without errors

html = urllib.urlopen(link).read() html.encode("utf8","ignore") self.response.out.write(html) Traceback (most recent call last): File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/__init__.py", line 507, in __call__ ha...

unicode recognization is it utf-8 utf-16 or any thing else?

i m using postgre database it has encoding UTF-8 in that unicode for marathi word pimpri is like this \u092A\u093F\u0902\u092A\u0930\u0940 \u0935\u093E\u0918\u0947\u0930\u0947 and when at client side i wrote a code String tempString=Strings.toEscapedUnicode(strQueryString[1]); it generate unicode like this u00E0\u00A4\u00AA\u00E0\u0...

Decoding html entities issue

I have the following line of text that I need to display <ul><li>Complementary to cleansing with HY-ÖL®<li>Especially for irritated and sensitive skin<li>Noticeably calms and relaxes the skin</ul> if I do the following html_entity_decode('<ul><li>Complementary to cleansing with HY-ÖL®<li>Especially for irritated and sensitive skin<l...

Which pagecode was used to encode this DOC document?

Hello I got a bunch of .DOC documents. I'm not even positive they are Word documents, but even if they are, I need to open and parse them with eg. Python to extract information from them. Problem is, I couldn't figure out how they were encoded: UltraEdit's Conversion function wouldn't correct the text no matter which encoding I tried. ...

FileStream and Encoding

I have a program write save a text file using stdio interface. It swap the 4 MSB with the 4 LSB, except the characters CR and/or LF. I'm trying to "decode" this stream using a C# program, but I'm unable to get the original bytes. StringBuilder sb = new StringBuilder(); StreamReader sr = new StreamReader("XXX.dat", Enco...

[PHP] Correct character encoding

I'm currently scraping a website for various pieces of textual data (with permission, of course). The issue I'm seeing is that certain characters aren't correctly encoded in the process. This is particularly prominent with apostrophes ('): leading to characters such as: . Currently, I use the following code to convert various HTML entit...

getting marathi data from request.getParameter.

In my request Queryparameter="आकुर्डी". When I'm trying following String strstring = request.getParameter("Queryparameter"); it gives "à¤à¤à¥à¤°à¥à¤¡à¥" while I want the string "आकुर्डी". How to get it? What is the problem here? ...

Ruby String accent error: More than meet the eyes

I'm having a real trouble to get accents right, and I believe this may happen to most Latin languages, in my case, portuguese I have a string that come as parameter and I must get the first letter and upcase it! That should be trivial in ruby, but here is the catch: s1 = 'alow'; s1.size #=> 4 s2 = 'álow'; s2.size #=> 5 s1[0,1] #=> "a"...

How can I convert an integer into a Unicode string in C?

I am working on the Firmware for an embedded USB project. The production programmer I would like to use automatically writes the Serial Number into the device flash memory at a specified memory address. The programmer stores the serial number as Hex digits in a specified number of bytes. For example, if I tell it to store the serial numb...

weird character in DB resultset

So I'm running a query against my database and looping through the results and getting something like- F.�B.�Webster�Day� in most of the results. Those should be spaces but it must've been something weird during the import/conversion (damn you M$). Is there a quick query I can run against the DB to remove all of those and replace the...

Unicode character sets & encoding in browsers

I'm trying to find out how character sets/encoding are implemented in browsers, specifically Unicode. Are sets/encodings implemented separately in each browser or is it OS specific? Is it possible to find out what version of the Unicode Character Db (UCD) is being used? How are UCD updates pushed to each browser/OS? (Is it ever pushed ...

Curved Quotes and other Characters becoming ? in IIS7

Recently moved some web sites to new hardware running MS Server 2008. After the move, all of my "curved quotation marks" and other "different" characters are now displaying as question marks. I can do a simple find and replace, changing curved marks with normal ones, but as new content is added, I'm finding this stuff everywhere. I mu...

Python string decoding issue

I am trying to parse a CSV file containing some data, mostly numeral but with some strings - which I do not know their encoding, but I do know they are in Hebrew. Eventually I need to know the encoding so I can unicode the strings, print them, and perhaps throw them into a database later on. I tried using Chardet, which claims the stri...

Character encoding problem

Hi, I was recently editing a Unicode-encoded text file that also includes Thai characters (alongside "normal" characters). For some reason, after each sequence of Thai characters, a new line appeared. After some mucking around with C, trying to remove all newline characters, I fired up vim to inspect the file. Apparently, after each Th...

Programatically determine number of strokes in a chinese character?

Does Unicode store this information about characters? ...

VB.Net MailMessage text encoding issue

I have an ASP.Net app that allows a user to write text into a Telerik RadEditor control and then send an email. For some reason I'm sometimes getting strange characters showing up in the email that is generated. For example if I put the word Test’s into the RadEditor box and send it... the email shows up with the text changed to: Testâ...

Sending multilingual email. Which charset should I suse?

I want to send emails in a number of languages (en/es/fr/de/ru/pl). I notice that Gmail uses KOI8-R charset when sending emails contatining Cyrillic characters. Can I just use KOI8-R for ALL my emails, or is there any reason to select a particular charset for each language? ...

Java: How to detect (and change?) encoding of System.console ?

I have a program which runs on a console and its Umlauts and other special characters are being output as ?'s on Macs. Here's a simple test program: public static void main( String[] args ) { System.out.println("höhößüä"); System.console().printf( "höhößüä" ); } On a default Mac console (with default UTF-8 encoding), this prin...