character-encoding

How do I convert between ISO-8859-1 and UTF-8 in Java?

Does anyone know how to convert a string from ISO-8859-1 to UTF-8 and back in Java? I'm getting a string from the web and saving it in the RMS (J2ME), but I want to preserve the special chars and get the string from the RMS but with the ISO-8859-1 encoding. How do I do this? ...

MySQL error: "Column 'columnname' cannot be part of FULLTEXT index"

Hello, Recently I changed a bunch of columns to utf8_general_ci (the default UTF-8 collation) but when attempting to change a particular column, I received the MySQL error: Column 'node_content' cannot be part of FULLTEXT index In looking through docs, it appears that MySQL has a problem with FULLTEXT indexes on some multi-byte chars...

Detecting Characters in an XSLT

I have encountered some odd characters that do not display properly in Internet Explorer, such as these: “, –, and ’. I think they're carried over from copy-and-paste Word content. I am using XSLT to build the page content and it would be great to detect these characters in the XSLT and replace them with valid HTML codes. I alread...

C#: Cycle through encodings

I am reading files in various formats and languages and I am currently using a small encoding library to take attempt to detect the proper encoding (http://www.codeproject.com/KB/recipes/DetectEncoding.aspx). It's pretty good, but it still misses occasionally. (Multilingual files) Most of my potential users have very little understandi...

How to display Japanese characters on a php page?

I'm trying to display Japanese characters on a PHP page. No loading from the database, just stored in a language file and echo'ed out. I'm running into a weird scenario. I have the page properly setup with UTF-8 and I test a sample page on my local WAMP server and it works. The moment I tested it out our development and production ...

External javascript character encoding on Webshpere

Hello. How can one set character encoding on external JavaScript files using only Websphere (5.1)? I don't have Apache in front of it so I can't set it using "AddCharset UTF-8 .js". Or maybe there is some other way to force it on a web container via web.xml or similar magic? ...

French Text stored in SQL Server Appears Wrong, how to make it appear correctly

I've got a table that is comprised of the following structure. CREATE TABLE [dbo].[tblData]( [ID] [numeric](18, 0) NOT NULL, [QID] [varchar](25) NOT NULL, [Data] [nvarchar](255) NULL, CONSTRAINT [PK_tblData] PRIMARY KEY CLUSTERED ( [ID] ASC, [QID] ASC ) ) ON [PRIMARY] In the above example, the column...

php mysql character set: storing html of international content

i'm completely confused by what i've read about character sets. I'm developing an interface to store french text formatted in html inside a mysql database. What i understood was that the safe way to have all french special characters displayed properly would be to store them as utf8. so i've created a mysql database with utf8 specified ...

gedit cannot open shoes2.run

hello i've just downloaded shoes but can't get them out the box. double clicked on shoes2.run in ubuntu intrepid and gedit opened with the following message: " Could not open the file /home/mark/Marks files/2…ng/Programming/shoes2.run using the Unicode (UTF-8) character coding. Please check that you are not trying to open a binary fi...

How to guess the encoding of a file with no BOM in .NET?

I'm using the StreamReader class in .NET like this: using( StreamReader reader = new StreamReader( "c:\somefile.html", true ) { string filetext = reader.ReadToEnd(); } This works fine when the file has a BOM. I ran into trouble with a file with no BOM .. basically I got gibberish. When I specified Encoding.Unicode it worked fine...

Get a calculated length of a Core Foundation string given an encoding

Is there a way to get the length in bytes of a CFString given an arbitrary character encoding? It seems possible because the function CFStringGetSmallestEncoding must do some calculations already, but I don't want to use the smallest encoding, I want to find out how big a buffer I might need to allocate if I want the bytes in UTF-8 encod...

How to set standard encoding in Visual Studio

I am searching for a way to setup Visual Studio so it always saves my files in UTF-8. I have only found options to set this project wide is there a way to set it Visual Studio wide? ...

String Comparison, .NET and non breaking space

I have an app written in C# that does a lot of string comparrison. The strings are pulled in from a variety of sources (including user input) and are then compared. However i'm running into problems when comparing space '32' to non-breaking space '160'. To the user they look the same and so they expect a match. But when the app does the ...

What is ANSI format?

What is ANSI encoded format? Is it a system default format? In what way does it differ from ASCII? ...

File upload mojibake

How do you do a file upload in an HTML form without running into mojibake? I have a form that has three fields: a file field a required text field a text field which accepts Japanese characters I've set up my HTML form with the attribute enctype='multipart/form-data'. But when the form submission fails due to the missing required f...

How to do Latin1-UTF8 encoding change in C++ (maybe with Boost)?

My source base is mostly using UTF8, but some older library has Windows Latin1 encoded strings hardcoded within it. I was hoping Boost would have a clear conversion feature, but I did not find such. Do I really need to hand-code such a commonplace solution? Looking for a portable solution, running on Linux. (This Q is similar, but no...

Ajax messing up Norwegian characters

I'm trying to take the values from a <textarea> and pass it via XMLHttpRequest to a PHP page that adds the content to a database. However, when it reaches the database, the "å æ ø" characters are converted to "Ã¥ æ Ã". I've searched high and low and tried to change to UTF-8, tried to use JavaScript versions of htmlentities()/htmlsp...

How can a text file be converted from ANSI to UTF-8 with Delphi 7?

I written a program with Delphi 7 which searches *.srt files on a hard drive. This program lists the path and name of these files in a memo. Now I need convert these files from ANSI to UTF-8, but I haven't succeeded. Please help me... ...

How do you handle different character encodings?

I'm trying to understand the basics of practical programming around character encodings. A few things to consider: I know how to read a file whose encoding is different, and convert it to the console's encoding. But when I try to convert literal strings that appear in source code, for some reason, it doesn't always work: In IntelliJ'...

Newline control characters in multi-byte character sets

I have some Perl code that translates new-lines and line-feeds to a normalized form. The input text is Japanese, so that there will be multi-byte characters. Is it still possible to do this transformation on a byte-by-byte basis (which I think it currently does), or do I have to detect the character set and enable Unicode support? In ot...