utf-8

Is there any benefit to adding accept-charset="UTF-8" to HTML forms, if the page is already in UTF-8?

For pages already specified (either by HTTP header, or by meta tag), to have a Content-Type with a UTF-8 charset... is there a benefit of adding accept-charset="UTF-8" to HTML forms? (I understand the accept-charset attribute is broken in IE for ISO-8859-1, but I haven't heard of a problem with IE and UTF-8. I'm just asking if there's a...

UTF-8 -> ASCII in C language

Hi guys. I have a simple question that I can't find anywhere over the internet, how can I convert UTF-8 to ASCII (mostly accented characters to the same character without accent) in C using only the standard lib? I found solutions to most of the languages out there, but not for C particularly. Thanks! EDIT: Some of the kind guys that c...

How do I encode a Binary blob as Unicode blob?

I'm trying to store a Gzip serialized object into Active Directory's "Extension Attribute", more info here. This field is a Unicode string according to it's oM syntax of 64. What is the most efficient way to store a binary blob as Unicode? Once I get this down, the rest is a piece of cake. ...

Rails fixtures encoding error "incompatible character encodings: ASCII-8BIT and UTF-8"

Using ruby 1.9.2 and Rails 3 I get an encoding error when I try to run this in seeds.rb: Fixtures.create_fixtures("#{Rails.root}/db/seed", "countries") I am sure the .csv file is encoded in UTF-8 and it can be read and parsed using ruby's CSV class. Is this a Rails 3 encoding issue with fixtures? ...

How to Determine "Lowest" Encoding Possible?

Scenario You have lots of XML files stored as UTF-16 in a Database or on a Server where space is not an issue. You need to take a large majority of these files that you need to get to other systems as XML Files and it is critical that you use as little space as you can. Issue In reality only about 10% of the files stored as UTF-16 ne...

Checklist for going the Unicode way with Perl

I am helping a client convert their Perl flat-file bulletin board site from ISO-8859-1 to Unicode. Since this is my first time, I would like to know if the following "checklist" is complete. Everything works well in testing, but I may be missing something which would only occur at rare occasions. This is what I have done so far (forgiv...

How can I convert CGI input to UTF-8 without Perl's Encode module?

Through this forum, I have learned that it is not a good idea to use the following for converting CGI input (from either an escape()d Ajax call or a normal HTML form post) to UTF-8: read (STDIN, $_, $ENV{CONTENT_LENGTH}); s{%([a-fA-F0-9]{2})}{ pack ('C', hex ($1)) }eg; utf8::decode $_; A safer way (which for example does not allow bog...

Python string formatting + UTF-8 strange behaviour

When printing a formatted string with a fixed length (e.g, %20s), the width differs from UTF-8 string to a normal string: >>> str1="Adam Matan" >>> str2="אדם מתן" >>> print "X %20s X" % str1 X Adam Matan X >>> print "X %20s X" % str2 X אדם מתן X Note the difference: X Adam Matan X X אדם מתן X Any i...

How to serialize object into UTF-8

Hi, I'm trying to insert into XML column (SQL SERVER 2008 R2), but the server's complaining: System.Data.SqlClient.SqlException (0x80131904): XML parsing: line 1, character 39, unable to switch the encoding I found out that the XML column has to be UTF-16 in order for the insert to succeed. The code I'm using is: XmlSerializer se...

How do I use UTF-8 entities in a Rails view?

Rails appears to be converting the ampersand at the beginning of the utf-8 entity to an HTML entity: &amp; So &#x25B2; becomes &amp;#x25B2; but I would like to display a downward arrow instead, which is what the utf-8 entity would normally be. I'm using Rails 2.3.8 and Ruby 1.8.7. Here is what the view looks like: <%= get_arrow_fro...

does utf-8 encoding messes file globbing and grep'ing?

I'm playing with bash, experiencing with utf-8 encoding. I'm new to unicode. The following command (well, their output) surprises me : $ locale LANG="fr_FR.UTF-8" LC_COLLATE="fr_FR.UTF-8" LC_CTYPE="fr_FR.UTF-8" LC_MESSAGES="fr_FR.UTF-8" LC_MONETARY="fr_FR.UTF-8" LC_NUMERIC="fr_FR.UTF-8" LC_TIME="fr_FR.UTF-8" LC_ALL= ...

How to remove funny characters in javascript?

On the following line: alert ( "Apenas os números 0, 1, 3, 5, 7 e 9 são permitidos." ); it prints like this: Apenas os n?meros 0, 1, 3, 5, 7 e 9 s?o permitidos. The problem is that the characters ú and ã are not showing correctly. In HTML I did something like: Apenas os n&uacute;meros 0, 1, 3, 5, 7 e 9 s&atilde;o permitidos. ...

Converting problem ANSI to UTF8 C#

I have a problem with converting a text file from ANSI to UTF8 in c#. I try to display the results in a browser. So I have a this text file with many accent character in it. Its encoded in ANSI, so I have to convert it to utf8 because in the browser instead of the accentchars appearing "?". No matter how I tried to convert to UTF8 it wa...

Sorting UTF-8 strings in Win32 program

My Win32/MFC program builds up a list of names, sorting them alphabetically as it puts them into the list. When it supported only ASCII strings, this worked by a simple char-by-char string comparison. But now that I want to accept UTF-8 strings, I need a more complex scheme since --for example -- all forms of the letter "a" should be equ...

Regex wordwrap with UTF8 characters in JS

Hi everybody, i've already read all tha articles in here wich touch a similar problem but still don't get any solution working. In my case i wanna wrap each word of a string with a span. The words contain special characters like 'äüö...' What i am doing at the moment is: var textWrap = text.replace(/\b([a-zA-Z0-9ßÄÖÜäöüÑñÉéÈèÁáÀàÂâŶĈĉĜ...

Why are Scandinavian characters converted to UTF-8?

I am trying to create an array with Danish characters - why are the characters converted to UTF-8 when output by PHP? Apache's httpd.conf? PHP.ini? // Fails $chars = array_merge(range("A","Z"),str_split("ÆØÅ")); // Observed result: (array) ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅ // Expected result: (array) ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅ // Wor...

Chrome not rendering properly [character set utf-8 problem]

ysdsdsdasasdasdadadadasdasdasdad ...

Working with files and utf8 in PHP

This is driving me crazy. Lets say I have a file called foo.txt encoded in utf8: aoeu qjkx ñpyf And I want to get an array that contains all the lines in that file (one line per index) that have the letters aoeuñpyf, and only the lines with these letters. I wrote the following code (also encoded as utf8): $allowed_letters=array("...

How to detect illegal UTF-8 byte sequences to replace them in java inputstream?

Hi! The file in question is not under my control. Most byte sequences are valid UTF-8, it is not ISO-8859-1 (or an other encoding). I want to do my best do extract as much information as possible. The file contains a few illegal byte sequences, those should be replaces with the replacement character. It's not an easy task, it think it...

Encoding problem downloading HTML using mechanize and Python 2.6

browser = mechanize.Browser() page = browser.open(url) html = page.get_data() print html It shows some strange characters. I suppose that it is UTF-8 string but Python doesn't know that and cannot show it properly. How can I convert this string to unicode string like u = u'test' ...