questions about utf-8 | ansaurus

utf-8

Why am I running out of bytes for the stream while performing an HTTP POST?

This is driving me nuts: WebRequest request = WebRequest.Create(url); request.Method = "POST"; request.ContentType = "application/x-www-form-urlencoded"; request.ContentLength = Encoding.UTF8.GetByteCount(data); Stream reqs = request.GetRequestStream(); StreamWriter stOut = new StreamWriter(reqs, Encoding.UTF8); ...

System.Text.Encoding isn't

I've tracked a problem I'm having down to the following inexplicable behaviour within the .NET System.Text.Encoding class: byte[] original = new byte[] { 128 }; string encoded = System.Text.Encoding.UTF8.GetString(original); byte[] decoded = System.Text.Encoding.UTF8.GetBytes(encoded); Console.WriteLine(original[0] == decoded[0]); Am ...

How do you think Google is handling this encoding issue?

I recently came across an encoding issue specific to how Firefox encodes URLs directly entered into the address bar. It basically looks like the default Firefox character encoding for URLs is NOT UTF-8, which is the case with most browsers. Additionally, it looks like they are trying to make some intelligent decisions as to what characte...

web-development

django.utils.encoding.DjangoUnicodeDecodeError

I got the following error when tried to add an entry to a Django model via generic relations. django.utils.encoding.DjangoUnicodeDecodeError: 'utf8' codec can't decode byte 0xb8 in position 24: unexpected code byte. You passed in 'ASL/60Styles_Timeless-3_\xb8 CaLe.asl' (<type 'str'>) The model is like this: class MD5(models.Model): ...

Why does stdout decoding fail when adding carriage return?

The following java code does exactly what is expected: 1 String s = "♪♬♪♪♬♪♪♬♪♪♬♪♪♬♪♪♬♪"; 2 for(int i=0; i < s.length(); i++) 3 { 4 System.out.print(s.substring(i,i+1)); 5 //System.out.print("\r"); 6 Thread.currentThread().sleep(500); 7 } But when I try to add carriage return by commenting i...

special-characters

Funny characters in Visual Studio output window

I have written an External Tool that uses plink.exe to execute gcc on a Linux system and then capture the output back on VS's output window (there is a checkmark in Tools/External Tools/Use Output Window). But Linux outputs with utf-8 and so I get some garbage. Is there any way to get VS to translate that utf-8 output to readable output?...

UTF8 Beginning of File characters are breaking serializer & readers

Okay, I'm trying to work with UTF8 text files. I'm constantly fighting the BOF chars that the writer drops in for UTF8, which blows up pretty much anything I need to use to read the file including serializers and other text readers. I'm getting a leading six bytes of data: 0xEF 0xBB 0xBF 0xEF 0xBB 0xBF (now that I'm looking at it...

xml-serialization

detect UTF-16 file content

Is it possible to know if a file has unicode (16-byte per char) or 8-bit ASCII content ? ...

Percent Encoded UTF-8 to Ascii(8-bit) conversion

Im reading in urls and they often have percent encoded characters. Example: %C3%A9 is actually é According to http://www.microsystools.com/products/sitemap-generator/faq/character-percentage-url-encoding/ , characters in the upper half of 8-Bit ASCII (128-255) are encoded as UTF-8, then their bytes are saved as hex. Now, when I get my ...

Switch website encoding from ISO-8859-1 to UTF-8

I am trying to convert my existing PHP webpage to use UTF-8 encoding. To do so, I have done the following things: specified UTF-8 as the charset in the meta content tag at the start of my webpage. change the default_charset to UTF-8 in the php.ini. specified UTF-8 as the iconv encoding in the php.ini file. specified UTF-8 in my .htacc...

Multilanguage UTF-8 website with Arabic

I will be coding a website that will have Arabic as a supported language. With UTF8 unicode I believe I can cover Arabic alphabet. I've also read that it reads right to left so I guess I should align right when displaying on Arabic. I'm asking the community for experience and possible pitfalls. utf-8 unicode css selector to swith text...

Should I happily stay with UTF-8 or should I be ready to change the encoding?

I've built (or I'm building) an application that supports a wide variety of languages. I'm using UTF-8 right now because as I see it, it supports all languages in this world. (?) However, after reading the article on Wikipedia, it states that while UTF-8 currently uses only 10% of its potential space, there's a possibility that in the f...

export and import users and database collation issue

Hi , I have mambo 4.6.5 on my source site and joomla 1.5 on destination site. I'm going to move users from first one to second. so I install userport component on joomla 1.5 and then went to mambo database and select my users with this Query : SELECT name, username, email, password FROM mos_users and export them to a CSV file which is...

Change Emacs Default Coding System

My problem stems from Emacs inserting the coding system headers into source files containing non-ascii characters: # -*- coding: utf-8 -*- My coworkers do not like these headers being checked into our repositories. I don't want them inserted into my files because Emacs automatically detects that the file should be UTF-8 regardless so ...

Invalid PHP JSON encoding

I'm working on a project in PHP (5.3.1) where I need to send a JSON string to a webservice (in python), but the result I get from json_encode does not pass as a valid JSON (i'm using JSLint to check validity). I should add that the structure I'm trying to encode is fairly big (13K encoded), and consists partially of UTF8 data, and while...

UTF-8, PHP and XML Mysql

I am having great problems solving this one: I have a mysql database encoding latin1_swedish_ci and a table that stores names and addresses. I am trying to output a UTF-8 XML file, but I am having problems with the following string: Otivägen it is being outputted as OtivÃ¤gen when i vim the file. Also when opened it IE i get "An inv...

What is the fool proof way to convert some string (utf-8 or else) to a simple ASCII string in python

Inside my python scrip, I get some string back from a function which I didn't write. The encoding of it varies. I need to convert it to ascii format. Is there some fool-proof way of doing this? I don't mind replacing the non-ascii chars with blanks or something else... ...

Case fold UTF-8 without knowing the language

I'm trying to evaluate different strategies for case insensitive UTF-8 string comparison. I've read some material from the Unicode consortium, experimented with ICU and tried to come up with various quality-of-implementation alternatives. On multiple occasions I've seen texts differ between Simple Case Mapping and Full Case Mapping, an...

case-insensitive

re: UTF-8, PHP and XML Mysql

This is relating to http://stackoverflow.com/questions/1791082/utf-8-php-and-xml-mysql, which I am still trying to get my head around. I Have a couple of separate questions that will hopefully help me understand how to resolve the issues I am having. I am trying to read values from a database and output into a file in UTF-8 format. But...

What problems should I expect when moving legacy Perl code to UTF-8?

Until now, the project I work in used ASCII only in the source code. Due to several upcoming changes in I18N area and also because we need some Unicode strings in our tests, we are thinking about biting the bullet and move the source code to UTF-8, while using the utf8 pragma (use utf8;) Since the code is in ASCII now, I don't expect to...

1
...
23
24
25
26
27
...
69