questions about utf-8 | ansaurus

utf-8

Can MySQL automatically specify `_utf8` for inserts to UTF-8 columns?

I have a table like this, where one column is latin1, the other is UTF-8: Create Table: CREATE TABLE `names` ( `name_english` varchar(255) character NOT NULL, `name_chinese` varchar(255) character set utf8 default NULL, ) ENGINE=MyISAM DEFAULT CHARSET=latin1 When I do an insert, I have to type _utf8 before values being inserted in...

win32 read java preference from c++ code

One of our program writes program information(window title, memory etc) in Java Preferences. On windows this is available under registry. How can I read the values written by Java program using c (or c++). Looks like API I should use is RegGetValue. Is this guaranteed to work on Windows XP 32 bit? The String written by java is UTF-8 ...

How can I convert a bunch of files from ISO-8859-1 to UTF-8 using Perl?

I have several documents I need to convert from ISO-8859-1 to UTF-8 (without the BOM of course). This is the issue though. I have so many of these documents (it is actually a mix of documents, some UTF-8 and some ISO-8859-1) that I need an automated way of converting them. Unfortunately I only have ActivePerl installed and don't know muc...

Is PHP's json_encode guaranteed to produce ASCII string?

Well, the subject says everything. I'm using json_encode to convert some UTF8 data to JSON and I need to transfer it to some layer that is currently ASCII-only. So I wonder whether I need to make it UTF-8 aware, or can I leave it as it is. Looking at JSON rfc, UTF8 is also valid charset in JSON output, although not recommended, i.e. som...

gcc, UTF-8 and limits.h

My OS is Debian, my default locale is UTF-8 and my compiler is gcc. By default CHAR_BIT in limits.h is 8 which is ok for ASCII because in ASCII 1 char = 8 bits. But since I am using UTF-8, chars can be up to 32 bits which contradicts the CHAR_BIT default value of 8. If I modify CHAR_BIT to 32 in limits.h to better suit UTF-8, what do I ...

Is there a way to enable Unicode characters in all browsers on Windows XP?

I'd like to use unicode symbols within my website (especially Dingbats). Is there any way to enable this inside all (or at least some) browsers in Windows XP, without having the user to adjust any of his settings? I use the HTML5 doctype with the charset configured to UTF-8: <!DOCTYPE html> <html> <head> <meta charset="utf-8" /> ...

utf8 and encoding

I have a sting in unicode is "hao123--我的上网主页", while in utf8 in C++ string is "hao123锛嶏紞鎴戠殑涓婄綉涓婚〉", but I should write it to a file in this format "hao123\uFF0D\uFF0D\u6211\u7684\u4E0A\u7F51\u4E3B\u9875", how can I do it. I know little about this encoding. Can anyone help? thanks! ...

pound sign in javascript

I wanna constrain to input special signs like £ ¬ ¦ in javascript,but they are always displayed in �� on Page source. How can i let them display correctly and page can be validated ? my page is using utf-8 thanks ...

php mail + utf-8 = problem in Internet explorer

I have a form on a page that sends data to php file via ajax request. The data is then collected into a single variable and sent to email specified in the php file. The data is in slovenian an uses a lot of letters that use diacritics (š,ć,ž). Everything works fine when the form is submitted from any browser that isn't Internet Explorer,...

internet-explorer

How to convert non-Latin-based encoded text into UTF-8, or make them coexist on same page?

Good day, I have a script that scrapes the title/description of remote pages and prints those values into a corresponding charset=UTF-8 encoded page. Here is the problem, whenever a remote page is encoded with non-Latin characters encoding like (Arabic, Russian, Chinese, Japanese etc.) the imported values print as garbled text. I've tr...

character-encoding

How to remove invalid UTF-8 characters from a JavaScript string?

I'd like to remove all invalid UTF-8 characters from a string in JavaScript. I've tried using the approach described here (link removed) and came up with the JavaScript: strTest = strTest.replace(/([\x00-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3})|./g, "$1"); It seems that the UTF-8 validation...

UTF-8 formatting in SPARQL

How can I "say" to SPARQL that ?churchname is in UTF-8 formatting? because response is like:PraÅ¾skÃ½ hrad PREFIX lgv: <http://linkedgeodata.org/vocabulary#> PREFIX abc: <http://dbpedia.org/class/yago/> SELECT ?churchname WHERE { <http://dbpedia.org/resource/Prague> geo:geometry ?gm . ?church a lgv:castle . ?church geo:g...

utf8 and utf16 conversion

Hi all, I have a wchar_t string, for example, L"hao123--我的上网主页", I can convert it to utf8 encoding, the output string is "hao123锛嶏紞鎴戠殑涓婄綉涓婚〉", but finally, I must write this string to a plain text file, its format is utf16 (I know this from others), "hao123\uFF0D\uFF0D\u6211\u7684\u4E0A\u7F51\u4E3B\u9875". Because I must save it in...

Convert ISO/Windows charsets to UTF-8 in Javascript

I'm developing a firefox plugin and i fetch web pages to do some analysis for the user. The problem is when i try to get (XMLHttpRequest) pages that are not utf-8 encoded the string i see is messed up. For example hebrew pages with windows-1125 or Chinese pages with gb2312. I already tried the following: var uDecoder=Components.classe...

character-encoding

Jena result in UTF-8 format

How can I get in Jena (Java language) result in UTF-8 format? My code: Query query= QueryFactory.create(queryString); QueryExecution qexec= QueryExecutionFactory.sparqlService("http://lod.openlinksw.com/sparql", queryString); ResultSet results = qexec.execSelect(); List<QuerySolution> list = ResultSetFormatter.toList(results); System....

display font with special characters - UTF-8

Hi, i am trying to display characters like £ on a device which runs under linux . it is using utf-8 charset format . when i get to display a string which contains special characters, it displays other characters too . if i print the string on the console it appears ok, but when i parse the string to load each letter font on the screen i...

Encoding MySQL text fields into UTF-8 text files - problems with special characters

I'm writing a php script to export MySQL database rows into a .txt file formatted for Adobe InDesign's internal markup. Exports work, but when I encounter special characters like é or umlauts, I get weird symbols (eg ChloÃ« Hanslip instead of Chloë Hanslip). Rather than run a search and replace for every possible weird character, I need...

Is there a way to store Unicode Text in an Oracle Database configured as 'US7ASCII'

We've recently hit a snag where a trademark symbol is being copied from one Oracle database to another, but have had it come across as a '?'. We've tracked the issue to the destination database being configured with a character set of 'US7ASCII'. Unfortunately, rebuilding the database to address this is not something we can do at the p...

Ruby 1.8 regexp: index of match in utf string

I'm trying to search a text for a match and return it with snippet around it. For this, I want to find match with regex, then cut the string using match index +- snippet radius (text.mb_chars[start..finish]). However, I cannot get ruby's (1.8) regex to return match index which would be multi-byte aware. I understand that regex is one ...

How to sort UTF-8 lines in Vim?

I have these lines in Vim: a c b e é f g and when I do :%sort, I get this: a b c e f g é Obviously, the "é" line should not be at the end, it should be after the "e" line. Is it possible to get Vim to sort these lines correctly? Not using the ASCCI key for the characters but the actual character. I also tried with :!sort (to use G...

1
...
40
41
42
43
44
...
69