questions about utf-8 | ansaurus

utf-8

Why does JDBC driver pad some blank characterS other a queried field, from an Oracle Database ?

So, here is the code which create the table in an Oracle 10g / UTF-8 database : CREATE TABLE TEST_SEMANTIC ( SEMANTIC_COLBYTE char(2 byte) , SEMANTIC_COLCHAR char(2 char) ); meaning, that I use two differents types of semantic for the two columns, byte and char. I then insert inside the database these corresponding data : insert in...

Character Encoding Problem

I know this sounds really silly but what character encoding should I use for something that looks like this in UTF-8 Ã¢ï¿½ï¿½Ã¢ï¿½Â¥ Ã�Â¼Ã�ï¿½Ã�Â½Ã�Â±Ã�Â¼Ã�Â The website is in English. This is something user generated content which is stored in the database that is utf_general_ci and displayed on the screen . I just want to display it ...

character-encoding

C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H

Hello, I know this has been asked before! I have googled on this topic and I have looked at every answer, but I still don't get it. Basically I need to convert UTF-8 string to ISO-8859-1 and I do it using following code: Encoding iso = Encoding.GetEncoding("ISO-8859-1"); Encoding utf8 = Encoding.UTF8; string msg = iso.GetString(utf8....

Can BSTR's hold characters that take more than 16 bits to represent?

I am confused about Windows BSTR's and WCHAR's, etc. WCHAR is a 16-bit character intended to allow for Unicode characters. What about characters that take more then 16-bits to represent? Some UTF-8 chars require more then that. Is this a limitation of Windows? Edit: Thanks for all the answers. I think I understand the Unicode aspec...

How to avoid inadvertent encoding of UTF-8 files as ASCII/ANSI?

In the process of editing a file encoded as UTF-8 w/o [spurious] BOM the content might become devoid of any Unicode characters outside the ASCII or ANSI ranges. At the next reopening of the file, some text editors (Notepad++) will interpret it as ASCII/ANSI encoded and open it as such. Unaware of the change the user will continue editing...

Re-encode url from utf-8 encoded to iso-8859-1 encoded

I have file:// links with non-english characters which are UrlEncoded in UTF-8. For these links to work in a browser I have to re-encode them. file://development/H%C3%A5ndplukket.doc becomes file://development/H%e5ndplukket.doc I have the following code which works: public string ReEncodeUrl(string url) { Encoding enc = Encodi...

MySQL UTF8 Database migration

Hi everyone, I'm having problems to migrate an utf8 database to another server... Each source and destination table has a "DEFAULT CHARSET=utf8". I use mysqldump to dump data and mysql < file.sql to import but when in the source table i have "España", in the destination i get "EspaÃ±a". I read some guides, i used --default-character-...

netbeans utf8 encoding mess - tool to search for source files according to its encoding and to fix them

I have edited several files ISO-8859-15 encoded php source files with netbeans 6.7.1, but it converted them (without asking me!!!!) to utf-8,and I lost several german characters in that process... I'm looking for a tool to find all the utf8 encoded files inside a directory (It's hard for me to tell which file has been broken). I'd also...

How to strip unicode chars (LEFT_TO_RIGHT_MARK) from a string in php

I'm trying to remove LEFT-TO-RIGHT-MARK (\u200e) and RIGHT-TO-LEFT-MARK (\u200f) from a string before encoding it as JSON. Neither of the following seems to work: $s = mb_ereg_replace("\u200e", '', $s); $s = preg_replace("#\u200e#u", '', $s); $s = preg_replace("#\u200e#", '', $s); Any help is appreciated! ...

MySQL and UTF-8

In MySQL, what is the difference between doing: SET NAMES 'utf8' And: SET CHARACTER SET 'utf8' I've taken a look at Connection Character Sets and Collations MySQL documentation page but I'm still a bit confused... Do both commands need to be issued in order to make MySQL UTF-8 aware? Or is SET NAMES enough? ...

Strange problem, Tomcat Webapp UTF-8 Character can't display correctly after each restart or each redeployment.

Hi all We have a strange problem, of web app on displaying the UTF-8 characters correctly, here are the facts : Tomcat 6.0.20, running on Ubuntu 9.04 We have follows advise here Get UTF-8 Working Our Webapp able to display UTF-8 Character correctly However, whenever our developer redeploy our webapp module, or when we restart the t...

UTF-8 £ sign not rendering correctly

HTML is stored in the DB. £ sign is stored as £ and renders correctly in 1252 however when I change the page encoding to utf-8 renders incorrectly ? I know its a simple issue.. ? ...

String class internals - caching character offset to byte relationship if using UTF-8

When writing a custom string class that stores UTF-8 internally (to save memory) rather than UTF-16 from scratch is it feasible to some extent cache the relationship between byte offset and character offset to increase performance when applications use the class with random access? Does Perl do this kind of caching of character offset t...

Storing UTF8 data in MySQL

When storing data in mysql using the UTF8 charset, does it make sense to escape entity characters when the data is being input or is it better to store it in raw form and transform it when pulling out? For instance, let's say someone enters a bullet () character into a text box. When saving that data, should it be converted to • b...

character-encoding

Can I avoid using CP1252 on Windows?

I would like all my toolkit to use UTF-8 but find that some tools on Windows seem to use CP1252 (which appears to be Windows-specific). Does this create output which is incompatible and if so at which codepoints? If so, can I do anything about it? (I don't completely understand the issues so I'd be grateful for basic education on these ...

Printing Unicode from Scala interpreter

When using the scala interpreter (i.e. running the command 'scala' on the commandline), I am not able to print unicode characters correctly. Of course a-z, A-Z, etc. are printed correctly, but for example € or ƒ is printed as a ?. print(8364.toChar) results in ? instead of €. Probably I'm doing something wrong. My terminal supports ut...

convert spanish characters in HTML doc

I have a HTML file and it has some information in spanish. I am using a third party control to convert this HTML file into RTF document. The third party software I am using is Subsystems HTML Addon. The HTML file has <META http-equiv="Content-Type" content="text/html; charset=utf-8"> I think the subsystems software is not able to re...

FtpWebRequest and foreign characters/utf-8 characters

When using FtpWebRequest to list files and folders, can I list names with foreign characters? A file name with 3 Chinese characters will come accross as "???" when enumerating files with FtpWebRequest: -rwxr-xr-x 1 user group 1800 Dec 22 16:13:10 ??? Am I doing something wrong, or does FtpWebRequest not support th...

Unicode aware CSV parser in Java

I'm looking for Java implementation of CSV (comma separated values) parser with proper handling of Unicode data, e.g. UTF-8 CSV files with Chinese text. I suppose such a parser should internally use code point related methods while iterating, comparing etc. Apache 2 license or similar would work the best. ...

chinese-characters

Rails: How to send emails to a recipient containing umlauts?

I'ld like to send an email with the following setup def registration_confirmation(user) recipients user.username + "<" + user.email + ">" from "Netzwerk Muensterland<[email protected]>" subject "Vielen Dank für Ihre Registrierung" body :user => user content_type "text/html" end The su...

1
...
26
27
28
29
30
...
69