character-encoding

CakePHP is truncating a text field, probably encoding related

Here's what I'm trying to do: I'm parsing incoming email, and using it to create posts in the system. This works almost completely, but there's a few bugs to work out. The one that's currently giving fits is coming up when an email contains certain characters (for example, ® – “ ”), the email body is being truncated at the special cha...

Using Unicode with PHP

How do I use Unicode with PHP? I want to store Unicode value in a PHP variable but it output some question marks. What is the solution? ...

Python - letter frequency count and translation.

Hi, I am using Python 3.1, but I can downgrade if needed. I have an ASCII file containing a short story written in one of the languages the alphabet of which can be represented with upper and or lower ASCII. I wish to: 1) Detect an encoding to the best of my abilities, get some sort of confidence metric (would vary depending on the len...

Applying utf8_encode to ob_end_flush()

I have a script which produces text output. That script grabs content from a MySQL database encoded as latin1_general_ci. Including that script in a HTML page marked as iso-8859-1 works fine. How do I capture the output of this script and include it in a HTML page encoded in utf-8? I have attempted to capture the output of the script u...

Special Characters in JavaScript not displaying properly on website

Hi, My IE and Chrome browsers are not displaying the French phrases correctly when I go from a French phrase (onload function) to a English phrase (onmousedown function) and back to a French phrase (onmouseup function). When I let up on the mouse of a particular phrase it goes back to French but the special characters for ô and é (which...

Lexers/tokenizers and character sets

When constructing a lexer/tokenizer is it a mistake to rely on functions(in C) such as isdigit/isalpha/... ? They are dependent on locale as far as I know. Should I pick a character set and concentrate on it and make a character mapping myself from which I look up classifications? Then the problem becomes being able to lex multiple chara...

Python returning the wrong length of string when using special characters

I have a string ë́aúlt that I want to get the length of a manipulate based on character positions and so on. The problem is that the first ë́ is being counted twice, or I guess ë is in position 0 and ´ is in position 1. Is there any possible way in Python to have a character like ë́ be represented as 1? I'm using UTF-8 encoding for the...

How to detect which character set encoding in Java?

Does anybody know if there is a simple way to detect character set encoding in Java? It seems to me that some programs have the ability to detect which character set a given piece of data uses, or at least make an aproximation. I suppose the underlying mechanism would have to decode the data in each character set and pick whichever one...

Strange occurence with string and special character

#include <iostream> #include <string> using namespace std; string mystring1, mystring2, mystring3 = "grové"; int main(){ mystring1 = "grové"; getline( cin, mystring2 ); //Here I type "grové" (without "") cout << "mystring1= " << mystring1 << endl; cout << "mystring2= " << mystring2 << endl; cout << "mystring3= " << mystring3...

How to write Cyrillic text in C++ console ?

For example, if I write: cout << "Привет!" << endl; //it's hello in Russian in console it would be something like "╧ЁштхЄ!" ok, I know that we can use: setlocale(LC_ALL, "Russian"); but after that not working command line arguments in russian (if I start my program through BAT file): StartProgram.bat chcp 1251 MyProgram.exe -use...

struts character encoding problem in response html

Hi Please consider the following scenario. I have a form with a property: class MyForm extends ActionForm{ String myProperty; ... // getter & setters here } I set this property in action class: class MyAction extends Action{ ... // execute method begins here myForm.setMyProperty("<b>Hello World</b>"); ... // execute...

How do I verify that a string is in English?

I read a string from the console. How do I make sure it only contains English characters and digits? ...

Haskell: Parsing escape characters in single quotes

I'm currently making a scanner for a basic compiler I'm writing in Haskell. One of the requirements is that any character enclosed in single quotes (') is translated into a character literal token (type T_Char), and this includes escape sequences such as '\n' and '\t'. I've defined this part of the scanner function which works okay for m...

Encoding Problems With ID3 Tags

I have an ID3v1 tag that shows up in iTunes like: "It's Been A While". But when I read the tags with the libtag library "It¹s Been A While" comes out. Now when I open the file with a hex editor, I can see that it actualy is 0xB9 which is ¹ on Latin-1 and UTF-8/16. So how does Itunes get a ’ from 0xB9? Any ideas? Is there any character en...

Are character encoding issue causing my Perl output to look like gibberish?

I'm running a Perl script (both with 5.8.4) on two different machines (one Solaris 5.10, the other OpenSolaris 5.11). The output of the two scripts differs in the following way: Solaris 5.10 $ perl myscript.pl is&#39; £ ä º &lt;ä ¼ sa ... ³ ä º žÃ ... ¬ å ¸ ç ¬ ¬ ä º ¤ § œâ is œâ ¡ä ¸ ‡ å ... æœ ¬ æœ ¬ å ¸ È, ¡ä »½ çš&quot; å ... ¬ ...

how to read the parameters and value from the querystring using java

Hi, I am using Ciui from google code and all the requests are only GET requests and not POST. The calls are made by the ajax (i am not sure) . I need to know how to read the "searchstring" parameter from this url. When i read this in my servlet using the getQueryString() method i am not able to properly form the actual text. This unicod...

java utf-8 encding problem

i am using an HTML parser called HTMLCLEANER to parse HTML page the problem is that each page has a different encoding than the other. my question Can i change from any character encoding to UTF-8? ...

What's the difference between encoding and charset?

I am confused about the text encoding and charset. For many reasons, I have to learn non-Unicode, non-UTF8 stuff in my upcoming work. I find the word "charset" in email headers as in "ISO-2022-JP", but there's no such a encoding in text editors. (I looked around the different text editors.) What's the difference between text encoding a...

ASP.NET requestEncoding and responseEncoding UTF-8 or ISO-8859-1

In a Microsoft Security Document, in the Code Review section ( http://msdn.microsoft.com/en-us/library/aa302437.aspx ), it suggests setting the globalization.requestEncoding and globalization.responseEncoding to "ISO-8859-1" opposed to "UTF-8" or another Unicode format. What are the downsides to using "ISO-8859-1", in the past I've set ...

Problem encoding string to ISO8859-1

Hi I'm using this code to convert string to ISO8859-1 baseurl = "http://myurl.com/mypage.php" client = New WebClient client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)") client.QueryString.Add("usuario", user) client.Qu...