encoding

How to display a non-ascii filename in the file download box in browsers?

There doesn't seem to be an accepted way of sending down a header parameter in non ascii format. The header for file download usually looks like Content-disposition: attachment; filename="theasciifilename.doc" Except if you smash a utf8 encoded string in the filename parameter, Firefox will handle it fine, whereas IE will throw up. T...

Send file using POST from a Python script

This is an almost-duplicate of http://stackoverflow.com/questions/68477/send-file-using-post-from-a-python-script, but I'd like to add a caveat: I need something that properly handles the encoding of fields and attached files. The solutions I've been able to find blow up when you throw unicode strings containing non-ascii characters into...

Looking for a regular expression including aplhanumeric + "&" and ";"

Here's the problem: split=re.compile('\W*') works fine when dealing with regular words, but there are occasions where I need the expression to include words like käyttäjauml;. What should I add to the regex to include the & and ; characters? ...

Loading UTF-8 encoded dump into MySQL

Hi, I've been pulling my hear out over this problem for a few hours yesterday: I've a database on MySQL 4.1.22 server with encoding set to "UTF-8 Unicode (utf8)" (as reported by phpMyAdmin). Tables in this database have default charset set to latin2. But, the web application (CMS Made Simple written in PHP) using it displays pages in u...

Setting ISO-8859-1 encoding for a single Tapestry 4 page in application that is otherwise totally UTF-8

I have a Tapestry application that is serving its page as UTF-8. That is, server responses have header: Content-type: text/html;charset=UTF-8 Now within this application there is a single page that should be served with ISO-8859-1 encoding. That is, server response should have this header: Content-type: text/html;charset=ISO-8859-1 ...

Setting the character encoding in form submit for Internet Explorer

I have a page that contains a form. This page is served with content type text/html;charset=utf-8. I need to submit this form to server using ISO-8859-1 character encoding. Is this possible with Internet Explorer? Setting accept-charset attribute to form element, like this, works for Firefox, Opera etc. but not for IE. <form accept-cha...

Why do my files need dos2unix? only in eclipse though

When I open a file in eclipse it shows with the improper line spacing showing an extra line break between every line. When I open the file with notepad or wordpad it doesn't show these extra line breaks that only eclipse shows. How do I get eclipse to read these files like notepad and wordpad without those line breaks? -edit: I don't ha...

Microsoft Excel mangles Diacritics in .csv files?

I am programmatically exporting data (using PHP 5.2) into a .csv test file. Example data: Numéro 1 (note the accented e). The data is utf-8 (no prepended BOM) When I open this file in MS excel is displays as Numéro 1 I am able to open this in a text editor (UltraEdit) which displays it correctly. UE reports the character is decim...

Is there a standard encoding for NEEDED entries in ELF?

I'm trying to make some of my code a bit more friendly to non-pure-ascii systems and was wondering if there was a particular character encoding used for NEEDED entries in ELF binaries, or is it rather unstandard and based on the creating system's filesystem encoding (or even just directly the bytes that were passed to whatever created th...

How to encode characters from Oracle to XML?

In my environment here I use Java to serialize the result set to XML. It happens basically like this: //foreach column of each row xmlHandler.startElement(uri, lname, "column", attributes); String chars = rs.getString(i); xmlHandler.characters(chars.toCharArray(), 0, chars.length()); xmlHandler.endElement(uri, lname, "column"); The XM...

Best way to encode text data for XML

I was looking for a generic method in .Net to encode a string for use in an Xml element or attribute, and was surprised when I didn't immediately find one. So, before I go too much further, could I just be missing the built-in function? Assuming for a moment that it really doesn't exist, I'm putting together my own generic EncodeForX...

Why is it that UTF-8 encoding is used when interacting with a UNIX/Linux environment?

I know it is customary, but why? Are there real technical reasons why any other way would be a really bad idea or is it just based on the history of encoding and backwards compatibility? In addition, what are the dangers of not using UTF-8, but some other encoding (most notably, UTF-16)? Edit : By interacting, I mostly mean the shell a...

Help localizing application in Mac

Hi, I have an application which is supposed to work on both windows and Mac and is localized in Portuguese, Spanish and German. I have an ini file from where the localized strings are read from. But the ini file doesn't work with same encoding for the files on both platforms. For Windows I have to have the file in ANSI format or else t...

Theory: "Lexical Encoding"

I am using the term "Lexical Encoding" for my lack of a better one. A Word is arguably the fundamental unit of communication as opposed to a Letter. Unicode tries to assign a numeric value to each Letter of all known Alphabets. What is a Letter to one language, is a Glyph to another. Unicode 5.1 assigns more than 100,000 values to th...

How do I convert a file's format from Unicode to ASCII using Python?

I use a 3rd party tool that outputs a file in Unicode format. However, I prefer it to be in ASCII. The tool does not have settings to change the file format. What is the best way to convert the entire file format using Python? ...

Configuring Tomcat 6 to support Russian cp1251 encoding

I am migrating a struts application from Websphere to Tomcat 6 and my application has support for Russian language. In Websphere we use to pass the JVM param -Dclinet.encoding.override=cp1251 but when I tried this with tomcat by passing the JVM argument -DFile.encoding=cp1251, the system doesnt accept input (I an any text box like in sea...

How to tell if a URL parameter needs to be encoded in Java

I'm writing a Java app that is accepting URL parameter values that may or may not be encoded. I need an easy way to tell whether or not I need to encode the parameter string. In other words, I want a function boolean needsEncoding(String param), which will return true if I pass in the String "[email protected]", and false if I pass in "foo%...

iPhone "Web Site Error"

I'm writing server-side programs in PHP for an iPhone app. And I have no iPhone. :P The iPhone app requests XML files from the site whenever a user runs the iPhone app. You may visit http://www.appvee.com/iphone/ads or http://www.appvee.com/iphone/latest for the XML files. And a message box will show up with the following error message...

How do I set the character set using XMLHttp Object for a POST in classic ASP?

I have to use the XMLHttp object in classic ASP in order to send some data to another server via HTTP from server to server: sURL = SOME_URL Set oXHttp = Server.CreateObject("Msxml2.XMLHTTP") oXHttp.open "POST", sURL, false oXHttp.setRequestHeader "Content-Type", "application/x-www-form-urlencoded;charset:ISO-8859-1;" sPost = SOME_F...

How to convert a file to utf-8 in Python?

I need to convert a bunch of files to utf-8 in Python, and I have trouble with the "converting the file" part. I'd like to do the equivalent of: iconv -t utf-8 $file > converted/$file # this is shell code Thanks! ...