Here's what I'm trying to do:
I'm parsing incoming email, and using it to create posts in the system. This works almost completely, but there's a few bugs to work out. The one that's currently giving fits is coming up when an email contains certain characters (for example, ® – “ ”), the email body is being truncated at the special cha...
How do I use Unicode with PHP?
I want to store Unicode value in a PHP variable but it output some question marks.
What is the solution?
...
Hi, I am using Python 3.1, but I can downgrade if needed.
I have an ASCII file containing a short story written in one of the languages the alphabet of which can be represented with upper and or lower ASCII. I wish to:
1) Detect an encoding to the best of my abilities, get some sort of confidence metric (would vary depending on the len...
I have a script which produces text output. That script grabs content from a MySQL database encoded as latin1_general_ci. Including that script in a HTML page marked as iso-8859-1 works fine.
How do I capture the output of this script and include it in a HTML page encoded in utf-8?
I have attempted to capture the output of the script u...
Hi,
My IE and Chrome browsers are not displaying the French phrases correctly when I go from a French phrase (onload function) to a English phrase (onmousedown function) and back to a French phrase (onmouseup function). When I let up on the mouse of a particular phrase it goes back to French but the special characters for ô and é (which...
When constructing a lexer/tokenizer is it a mistake to rely on functions(in C) such as isdigit/isalpha/... ? They are dependent on locale as far as I know. Should I pick a character set and concentrate on it and make a character mapping myself from which I look up classifications? Then the problem becomes being able to lex multiple chara...
I have a string ë́aúlt that I want to get the length of a manipulate based on character positions and so on. The problem is that the first ë́ is being counted twice, or I guess ë is in position 0 and ´ is in position 1.
Is there any possible way in Python to have a character like ë́ be represented as 1?
I'm using UTF-8 encoding for the...
Does anybody know if there is a simple way to detect character set encoding in Java? It seems to me that some programs have the ability to detect which character set a given piece of data uses, or at least make an aproximation.
I suppose the underlying mechanism would have to decode the data in each character set and pick whichever one...
#include <iostream>
#include <string>
using namespace std;
string mystring1, mystring2, mystring3 = "grové";
int main(){
mystring1 = "grové";
getline( cin, mystring2 ); //Here I type "grové" (without "")
cout << "mystring1= " << mystring1 << endl;
cout << "mystring2= " << mystring2 << endl;
cout << "mystring3= " << mystring3...
For example, if I write:
cout << "Привет!" << endl; //it's hello in Russian
in console it would be something like "╧ЁштхЄ!"
ok, I know that we can use:
setlocale(LC_ALL, "Russian");
but after that not working command line arguments in russian (if I start my program through BAT file):
StartProgram.bat
chcp 1251
MyProgram.exe -use...
Hi
Please consider the following scenario. I have a form with a property:
class MyForm extends ActionForm{
String myProperty;
... // getter & setters here
}
I set this property in action class:
class MyAction extends Action{
... // execute method begins here
myForm.setMyProperty("<b>Hello World</b>");
... // execute...
I read a string from the console. How do I make sure it only contains English characters and digits?
...
I'm currently making a scanner for a basic compiler I'm writing in Haskell. One of the requirements is that any character enclosed in single quotes (') is translated into a character literal token (type T_Char), and this includes escape sequences such as '\n' and '\t'. I've defined this part of the scanner function which works okay for m...
I have an ID3v1 tag that shows up in iTunes like: "It's Been A While". But when I read the tags with the libtag library "It¹s Been A While" comes out. Now when I open the file with a hex editor, I can see that it actualy is 0xB9 which is ¹ on Latin-1 and UTF-8/16. So how does Itunes get a ’ from 0xB9? Any ideas? Is there any character en...
I'm running a Perl script (both with 5.8.4) on two different machines (one Solaris 5.10, the other OpenSolaris 5.11). The output of the two scripts differs in the following way:
Solaris 5.10
$ perl myscript.pl
is' £ ä º <ä ¼ sa ... ³ ä º žÃ ... ¬ å ¸ ç ¬ ¬ ä º ¤ § œâ is œâ ¡ä ¸ ‡ å ... æœ ¬ æœ ¬ å ¸ È, ¡ä »½ çš" å ... ¬ ...
Hi,
I am using Ciui from google code and all the requests are only GET requests and not POST. The calls are made by the ajax (i am not sure) . I need to know how to read the "searchstring" parameter from this url. When i read this in my servlet using the getQueryString() method i am not able to properly form the actual text. This unicod...
i am using an HTML parser called HTMLCLEANER to parse HTML page
the problem is that each page has a different encoding than the other.
my question
Can i change from any character encoding to UTF-8?
...
I am confused about the text encoding and charset. For many reasons, I have to
learn non-Unicode, non-UTF8 stuff in my upcoming work.
I find the word "charset" in email headers as in "ISO-2022-JP", but there's no
such a encoding in text editors. (I looked around the different text editors.)
What's the difference between text encoding a...
In a Microsoft Security Document, in the Code Review section ( http://msdn.microsoft.com/en-us/library/aa302437.aspx ), it suggests setting the globalization.requestEncoding and globalization.responseEncoding to "ISO-8859-1" opposed to "UTF-8" or another Unicode format.
What are the downsides to using "ISO-8859-1", in the past I've set ...
Hi
I'm using this code to convert string to ISO8859-1
baseurl = "http://myurl.com/mypage.php"
client = New WebClient
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)")
client.QueryString.Add("usuario", user)
client.Qu...