character-encoding

Convert PHP entities like – or š to their applicable characters

Hello, is there a way how to convert HTML entities to their applicable characters. Something similar to html_entity_decode()? I'm trying to make ordinary text without HTML entities from TinyMCE output. ...

Hyphen encoding (minus) in Google Base RSS feed

I am trying to create an automatic feed generation for data to be sent to Google Base using utf-8 encoding. However I am getting errors whenever hyphens are found telling me that there is an encoding error in the relevant attribute (title, description, product_type). I am currently using: − but I have also tried: &#8722...

Replace diacritic characters with "equivalent" ASCII in PHP?

Related questions: http://stackoverflow.com/questions/2653739/how-to-replace-characters-in-a-java-string http://stackoverflow.com/questions/2393887/how-to-replace-special-characters-with-their-equivalent-such-as-a-for-a As in the questions above, I'm looking for a reliable, robust way to reduce any unicode character to near-equivalen...

Mysql storing quotes as '

I have some PHP code which stores whatever is typed in a textbox in the databse. If I type in bob's apples, it gets stored in the database as bob's apples. What can be the problem? The table storing this has the collation of latin1_swedish_ci. ...

How to convert non-Latin-based encoded text into UTF-8, or make them coexist on same page?

Good day, I have a script that scrapes the title/description of remote pages and prints those values into a corresponding charset=UTF-8 encoded page. Here is the problem, whenever a remote page is encoded with non-Latin characters encoding like (Arabic, Russian, Chinese, Japanese etc.) the imported values print as garbled text. I've tr...

Convert ISO/Windows charsets to UTF-8 in Javascript

I'm developing a firefox plugin and i fetch web pages to do some analysis for the user. The problem is when i try to get (XMLHttpRequest) pages that are not utf-8 encoded the string i see is messed up. For example hebrew pages with windows-1125 or Chinese pages with gb2312. I already tried the following: var uDecoder=Components.classe...

Ajax / Internet Explorer Encoding problem

Hi, I'm trying to use JQuery's autocomplete plug-in but for some reasons Internet Explorer is not compatible with the other browsers: When there is an accent in the "autocompleted" string it passes it with another encoding. IP - - [20/Apr/2010:15:53:17 +0200] "GET /page.php?var=M\xe9tropole HTTP/1.1" 200 13024 "http://site.com/page.php"...

Is GET Query String affected by content='text/html; charset=gb2312' html meta tag attribute ?

The Question is, In a regular HTTP Request to a server (non-ajax), Is the Query String passed by GET method to some server, get affected by the encoding specified by this : <meta http-equiv='Content-Type' content='text/html; charset=gb2312'> If the answer is no, How to define the encoding schema for the parameters of GET method ? e...

Determining default character set of platform in Java

I am programming in Java I have the code as: byte[] b = test.getBytes(); In the api it is specified that if we do not specify character encoding it takes the default platform character encoding. What is meant by "default platform character encoding" ? Does it mean the Java encoding or the OS encoding ? If it means OS encoding the ...

how to know which special character is there in a file?

My app needs to process text files during a batch process. Occassionally I receive a file with some special character at the end of the file. I am not sure what that special character is. Is there anyway I can find what that character is so that I can tell the other team which is producing that file. I have used mozilla's library to gue...

MySQL charset conversion

Hello, I have a database in which all text fields and tables have 'utf8' specified explicitly as default character set, however data in this database is stored as 'cp1257'. I can display symbols only when I use SET NAMES 'cp1257' or \C cp1257. Trying to display them without those instructions fails, because it tries to fetch data as 'utf...

special characters in "file_exists" problem (php)

I use special characters (swedish letters åäö). Now, I have some folders, which contains images for classifieds. The folders are named by category. for ($i=1; $i<=5; $i++){ if (file_exists($big_images.$i.'.jpg')){ echo "Inne"; unlink($big_images.$i.'.jpg'); } if (file_exists($thumb_images.$i.'.jpg')){ unlink...

Strange characters and encoding while using twitter API

I begun developing my own SIMPLE twitter client in my server (to bypass twitter.com blocking rule stablished by some dumbass at govt. office) Please check this image so you can see the accented characters converted into weird symbol: It is being developed with this class Twitter PHP class by Tijs Verkoyen This is my heading code, ...

Submitting Chinese characters results in XML entities

I am submitting a Chinese character to my form but once it is submitted it is coming as XML entity. For e.g. I am entering this 星洲 and the value going to my form is &#26143;&#27954; Any inputs how to convert this XML entity to the Chinese character equivalent. ...

Encoding SMS messages in Android

Hi All! My problem is that I want to send an SMS message of a certain Class and with a certain encoding. (Class 0 and 7-bit encoding). When checking the Android.Telephony.SmsManager and SmsMessage there is not so much you can do. The SmsManager offers the two functions SendTextMessage and SendDataMessage. The first one works fine if yo...

Remove special chars from a File

I'm trying to open a textfile and remove all the special chars ñ Ñ ' á í etc... the file its a Layout that the clients send to me and i parse it to send the file to an AS400 server but i have to remove all special chars. THE PROBLEM IS: some files with some special chars when i open it in c# it read the special chars and Two different...

What Character Encoding Is This?

I need to clean up some file containing French text. Problem is that the files erroneously contain multiple encodings within the same file. I think some sections are ISO8859-1 (Latin 1) but other parts have text encoded in single byte characters that look like 'extended' ASCII. In other words, it is UTF-7 encoding plus the following: ...

FTP server output and accents

I've written this little test class to connect up to an FTP server. import java.io.BufferedInputStream; import java.io.IOException; import java.io.InputStream; import java.net.MalformedURLException; import java.net.URL; import java.net.URLConnection; public class FTPTest { public static void main(String[] args) { URL url =...

Four byte encoding of U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS)?

Which character encoding (or combinations of encodings) represents the character ö (U+00F6, LATIN SMALL LETTER O WITH DIAERESIS or simply put chr(246) in ISO-8859-1) as the four octets combination chr(195) . chr(63) . chr(194) . chr(164)? ...

How to ensure that no non-ascii unicode characters are entered ?

Given a java.lang.String instance, I want to verify that it doesn't contain any unicode characters that are not ASCII alphanumerics. e.g. The string should be limited to [A-Za-z0-9.]. What I'm doing now is something very inefficient: import org.apache.commons.lang.CharUtils; String s = ...; char[] ch = s.toCharArray(); for( int i=0; i<...