utf-8

short Unicode \N{} names for Latin-1 characters in Python ?

Are there short Unicode u"\N{...}" names for Latin1 characters in Python ? \N{A umlaut} etc. would be nice, \N{LATIN SMALL LETTER A WITH DIAERESIS} etc. is just too long to type every time. (Added:) I use an English keyboard, but occasionally need German letters, as in "Löwenbräu Weißbier". Yes one can cut-paste them singly, L cutpaste ö...

Can I tell how many bytes were written via DataOutput.writeUTF?

If I call writeUTF(String) on a DataOutput object, is there a way to tell how many bytes were actually written? E.g.: public int write(DataOutput output) throws IOException { output.writeUTF(this.messageString); int numberOfBytesWritten = ???; return numberOfBytesWritten; } The only method that comes to mind is to create a ...

Truncate a UTF-8 string to fit a given byte count in PHP

Say we have a UTF-8 string $s and we need to shorten it so it can be stored in N bytes. Blindly truncating it to N bytes could mess it up. But decoding it to find the character boundaries is a drag. Is there a tidy way? [Edit 20100414] In addition to S.Mark’s answer: mb_strcut(), I recently found another function to do the job: grapheme...

How to display non english characters in php?

Hi guys I've a basic question in php: I've 2 files: An html form with a textarea and a php file. All I want is to print the text the user types after submit is pressed. It all goes well when only english characters are typed but I get gibberish when I type arabic or chinese for instance. Is there a way to display all the characters? ...

Get first UTF-8 char from string and save in DB

I have a problem with inserting a letter that is not an A-Z char. For example: $fullTag = 'świat'; A 'letter' should contains ś $data = array( 'full_tag' => $fullTag, 'count' => 1, 'letter' => $fullTag[0], ); But when I execute $table->insert($data);, it inserts me as letter an empty string. If I set instead of ...

Is there something magical about this? The file always turns to ANSI

I have created a file and saved it as UTF-8 I placed this code: <div class="top_pic"> <img src="<?php echo $this->images_dir ?>image.jpg" alt="doc ao fim do dia" width="632" height="320"/> </div> <div id="conteudo_menu"> <?php echo $this->conteudo_menu ?> </div> <div id="item_list"> <?php echo $this->vinhos_lista ?> </div> ...

Iphone SQLite Databse with german umlauts results in NULL value

Hi guys, I'm quite new to the Iphone development and after search for an answer for 3 hours now, I hope that you guys can give me a hand. My problem is that I have a SQLite Database with german umlauts. Looking at it with a SQLite browser tool shows me that the data is stored with german umlauts, correctly. But selecting fields with g...

mysql console (windows->linux), wrong character set?

When I make a query from the mysql console and it has accents or any character that needs to be utf-8 encoded, it gets mugged INSERT INTO users (userName) VALUES ("José Alarcón"); SELECT userName FROM users; José Alarcón SET NAMES utF8 changes nothing --default-character-set=utf8 as parameter changes nothing Keep in mind than this i...

jQuery.param and UTF-8

I have the following code: var words = new Object(); $("li.words").each(function(){ var thisId = $(this).attr("id"); words[thisId] = $(this).children('input#word').val(); }); The input with id #word contains words in Hebrew (i.e. UTF-8 chars). When I use: alert($.param(words)); the words look like this: %D7%9E%D7%AA%D7%A7%...

python appengine form-posted utf8 file issue

hi, i am trying to form-post a sql file that consists on many INSERTS, eg. INSERT INTO `TABLE` VALUES ('abcdé', 2759); then i use re.search to parse it and extract the fields to put into my own datastore. The problem is that, although the file contains accented characters (see the e is a é), once uploaded it loses it and either error...

how to send through ServletOutputStream characters in UTF-8 encoding

My servlet code looks like that: response.setContentType("text/html; charset=UTF-8"); response.setCharacterEncoding("UTF-8"); ServletOutputStream out = response.getOutputStream(); out.println(...MY-UTF-8 CODE...); ... then I get the error: java.io.CharConversionException: Not an ISO 8859-1 character: ש javax.servlet.ServletOutputSt...

How do I ensure that the text encoded in a form is utf8

I have an html box with which users may enter text. I would like to ensure all text entered in the box is either encoded in UTF-8 or converted to UTF-8 when a user finishes typing. Furthermore, I don't quite understand how various UTF encoding are chosen when being entered into a text box. Generally I'm curious about the following: H...

MySQL VARCHAR Lengths and UTF-8

In MySQL, if I create a new VARCHAR(32) field in a UTF-8 table does it means I can store 32 bytes of data in that field or 32 chars (multi-byte)? ...

Wrong encoding of text, in Django?

"query" = джазовыми For some reason...when I display it via: {{ query|safe }} I get this: %u0434%u0436%u0430%u0437%u043E%u0432%u044B%u043C%u0438 ...

How to get the numbers of characters in a string in PHP?

It is UTF-8. For example, 情報 is 2 characters while ラリー ペイジ is 6 characters. ...

In oracle, how do I change my session to display UTF8?

I can't figure out Oracle's encryptic syntax for the life of me. This is Oracle 10g My session's NLS_LANGUAGE is currently defaulting to AMERICAN. I need to be able to display UTF8 characters. Below are some of my attempts, all incorrect: ALTER SESSION SET NLS_LANGUAGE='UTF8' ALTER SESSION SET NLS_LANGUAGE='AMERICAN_AMERICA.UTF8' W...

How do i use Django and UTF-8 content-type for template?

When I do: return render_to_response() in Django. How do I set the content-type to UTF-8? So that everything displayed is UTF-8? ...

How to get DOMDocument to be nice to ASCII control characters?

The HTML document which I am parsing contains some ASCII control codes. I noticed that PHP's DOMDocument parser truncates text nodes when it finds ASCII control characters within the node, such as Device Control 0x13 End of Medium 0x19 File Separator 0x1C Group Separator 0x1D Is this a bug or a feature? Is there any...

What is the Explanation for DOMDocument's Inconsistent Behavior when Dumping a non-ASCII Character?

I've noticed different "dumping" behaviors when using PHP's DOMDocument's saveXML() and saveHTML() methods. Here is a simple example of dumping the copyright symbol (). <?$domDoc = new DOMDocument(); $domDoc->loadHTML("&copy;"); echo $domDoc->saveHTML(); echo $domDoc->saveXML(); echo $domDoc->saveXML($domDoc); ?> The three du...

Why does Perl's Text::Capitalize turn "Juvénal" into "JuvéNal"?

I'm using Text::Capitalize to try and title case some UTF-8 encoded names from a web page (downloaded using WWW::Mechanize, but I'm not getting the results I'm expecting. For example, the name on web page is "KAJELIJELI, Juvénal" but capitalize_title returns "Kajelijeli, JuvéNal" (notice the uppercase N). I've tried use utf8; and chang...