character-encoding

What strategies are there for escaping character entities?

We are doing Natural Language Processing on a range of English language documents (mainly scientific) and run into problems in carrying non-ANSI characters through the various components. The documents may be "ASCII", UNICODE, PDF, or HTML. We cannot predict at this stage what tools will be in our chain or whether they will allow charact...

French characters are not displaying correctly inside javascript grid

We have translated one of our pages to french and all the html within the page displays flawlessly. That said, there is a javascript table (ext js) and the accented characters are not displaying correctly. The page is encoded UTF-8 in the HTML meta tags, but when I look inside FireBug, I see the following: Accept-Charset ISO-8859-1,...

Character Encoding Problem

I know this sounds really silly but what character encoding should I use for something that looks like this in UTF-8 â��â�¥ �¼���½�±�¼� The website is in English. This is something user generated content which is stored in the database that is utf_general_ci and displayed on the screen . I just want to display it ...

How do you remove illegal characters from an xml file?

I am using the PHP SimpleXML way of working with XML files on my server. I only need to read the contents of the XML (I have no need to modify it) so I stuck to the simple and easy to use SimpleXML. But SimpleXML is having problems reading a certain XML file because it has some very strange characters. I get the following errors: Warnin...

Character Encoding Problem

Hi, I need to save this onto database(mysql) and show it back. (my database is utf_general_ci) I αм iиvisibłє łiкє αiя--- I αм αs iмρøяŧαиŧ αs øxygєи--- I αм łiviиg iи ŧЋє wøяłd øƒ мy dяєαмz I αм αłwαys ŧЋєяє ŧø Ћєłρ øŧЋєяz--- I αм busy buŧ иєvєя igиøяє αиy øиє I αм ŧЋє øиє wЋø cαяєz--- I łøvє ŧø sєє øŧЋєя łαugЋiиg I αм ŧЋє øиє wЋø bøя...

How does "cut and paste" affect character encoding and what can go wrong?

I have a document A in encoding A displayed in tool A and a document B in encoding B displayed in tool B. If I cut and paste (part of) B into A what might be the resultant character encoding? I realise this depends on tool A and tool B and the information held in the paste buffer (which presumably can contain an encoding?) and the operat...

Is it OK to fix a character encoding error using SQL REPLACE?

I have a (Wordpress) blog and some of my older posts have a character encoding problem where £ displays as £ (i.e. a pound sign prepended with a capital 'A' with a hat on). The problem is at the DB level, so I was going to run the following SQL statement: update wp_posts set post_content = replace(post_content, ‘£’, ‘£’); Would thi...

Convert XML document from Latin1 to UTF8 using Java

I am trying to create an XML document (rss feed) and have worked out all the kinks in it except for one character encoding issue. The problem is that I am using a UTF-8 encoding like so <?xml version="1.0" encoding="UTF-8"?> except the document itself is not encoded to UTF-8. I am using the org.apache.ecs.xml package to create all the ...

Ruby 1.9: Regular Expressions with unknown input encoding

Is there an accepted way to deal with regular expressions in Ruby 1.9 for which the encoding of the input is unknown? Let's say my input happens to be UTF-16 encoded: x = "foo<p>bar</p>baz" y = x.encode('UTF-16LE') re = /<p>(.*)<\/p>/ x.match(re) => #<MatchData "<p>bar</p>" 1:"bar"> y.match(re) Encoding::CompatibilityError: incompa...

Storing UTF8 data in MySQL

When storing data in mysql using the UTF8 charset, does it make sense to escape entity characters when the data is being input or is it better to store it in raw form and transform it when pulling out? For instance, let's say someone enters a bullet () character into a text box. When saving that data, should it be converted to &#8226; b...

rss feed service request parameter charset

First of all, I'd consider myself a very beginner in services development so pardon my ignorance here... I've created the rss syndication feed service (rest) in wcf and have problems with the request parameter values character. I need to pass the name as the parameter which contains the characters from the ISO 8859-2..... the request loo...

unable to print euro symbol in a "C" program

I am unable to print the euro symbol. The program I am using is below. I have set the character set to codepage 1250 which has 0x80 standing for the euro symbol. Program ======= #include <stdio.h> #include <locale.h> int main() { printf("Current locale is: %s\n", setlocale (LC_ALL, ".1250")); printf("Euro character: %c\n", 0x...

Firefox setting to allow finding accented or other Unicode characters using a non-accented search term?

Howdy, I'm generating UTF-8 encoded web content that includes characters using diacritical marks, typically "accented" characters, e.g. "é". Firefox's Find (find in page) function requires that such characters be typed in order to find them, which makes sense, but makes for a usability problem. This is tricky for users who don't know ...

QTextCodec subclass - how to register my codec

I need to create my own codec, i.e. subclass of QTextCodec. And I'd like to use it via QTextCodec::codecForName("myname"); However, just subclass is not enough. QTextCodec::availableCodecs() does not contain my codec name. QTextCodec documentation does not cover the area of proper registration of a custom codec: Creating Your Own Co...

Custom Base32 encoding code C#

I am have written the following code below to encode a bitarray into custom base32 encoding string. My idea is user can mix the order of base32 array as per requirement and can add similar looking characters like I and 1 etc. My intention of asking the question is: Is the code written in an appropriate manner or it lacks some basics. A...

How to get the numbers of characters in a string in PHP?

It is UTF-8. For example, 情報 is 2 characters while ラリー ペイジ is 6 characters. ...

JSF Form and German Umlauts

Hi there, I'm facing a strange problem in one of my JSF (which is a facelet). I'm using Richfaces and on one page I got a normal form <h:form></h:form> My problem is when I submit the form all UTF-8 chars - like german umlauts (äöü) - are recieved encrypted. If I change the page to ISO-8859-1 on my browser it works. If I expand the ...

Chartset encoding when using Ajax ? JQuery

Hi, I have a web application (UTF-8) in which the following one can be used to send to the server side áéíóú àèìòù ÀÈÌÒÙ ÁÉÍÓÚ Ok. I use something like as follows to send data // Notice $("#myForm").serialize() $.get("/path?", $("#myForm").serialize(), function(response) { }); When i see my recordSet, i get (database charSet enco...

Post to twitter help - using php

I am trying to post a url to twitter but the url is user generated and dynamic... <a href="http://twitter.com/?status=[urlencode('I'd like to borrow this item @neighborrow')].">TWEET THIS REQUEST</a> i started with that but its not catching the actual url- then i tried a few others but they seem to be for static urls do i have to us...

A problem with passing Japanese characters(UTF-8) via json_encode

Hi, I'm having a trouble transferring Japanese characters from PHP to JavaScript via json_encode. Here is the raw data read from csv file. PRODUCT1,QA,テスト PRODUCT2,QA,aテスト PRODUCT3,QA,1テスト The problem is that when passing those data by echo json_encode($return_value), where $return_value is a 2-dimentional array containing above dat...