encoding

Changing the default encoding for String(byte[])

Is there a way to change the encoding used by the String(byte[]) constructor ? In my own code I use String(byte[],String) to specify the encoding but I am using an external library that I cannot change. String src = "with accents: é à"; byte[] bytes = src.getBytes("UTF-8"); System.out.println("UTF-8 decoded: "+new String(bytes,"UTF-8")...

Best he-aac encoder on linux ?

I need an encoder that can convert mp3 files to he-aac (aka aac+). So far the best one I have found is nero aac encoder . I have two problemes with it : - Only one input format : wav . It is a little bit slow to transform mp3 files to wav and then to he-aac. - a free license for non commercial use. Too bad ffmpeg does not support h...

Does C# have an equivalent to JavaScript's encodeURIComponent()?

In JavaScript: encodeURIComponent("©√") == "%C2%A9%E2%88%9A" Is there an equivalent for C# applications? For escaping HTML characters I used: txtOut.Text = Regex.Replace(txtIn.Text, @"[\u0080-\uFFFF]", m => @"&#" + ((int)m.Value[0]).ToString() + ";"); But I'm not sure how to convert the match to the correct hexadecimal format t...

Can you Distribute a Ruby on Rails Application without Source?

I'm wondering if it's possible to distribute a RoR app for production use without source code? I've seen this post on SO, but my situation is a little different. This would be an app administered by people with some clue, so I'm cool with still requiring an Apache/Mongrel/MySQL setup on the customer end. All I really want is for the s...

C++ strings: UTF-8 or 16-bit encoding?

I'm still trying to decide whether my (home) project should use UTF-8 strings (implemented in terms of std::string with additional UTF-8-specific functions when necessary) or some 16-bit string (implemented as std::wstring). The project is a programming language and environment (like VB, it's a combination of both). There are a few wish...

What causes java.io.CharConversionException with EOF or isHexDigit messages in Tomcat?

This exception peppers our production catalina logs on a simple 'getParameter()' call. WARNING: Parameters: Character decoding failed. Parameter skipped. java.io.CharConversionException: EOF at org.apache.tomcat.util.buf.UDecoder.convert(UDecoder.java:82) at org.apache.tomcat.util.buf.UDecoder.convert(UDecoder.java:48) at ...

Add non-ASCII file names to zip in Java

What is the best way to add non-ASCII file names to a zip file using Java, in such a way that the files can be properly read in both Windows and Linux? Here is one attempt, adapted from https://truezip.dev.java.net/tutorial-6.html#Example, which works in Windows Vista but fails in Ubuntu Hardy. In Hardy the file name is shown as abc-ЖДФ...

Why unicode() uses str() on my object only with no encoding given?

I start by creating a string variable with some non-ascii utf-8 encoded data on it: >>> text = 'á' >>> text '\xc3\xa1' >>> text.decode('utf-8') u'\xe1' Using unicode() on it raises errors... >>> unicode(text) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0...

.Net 8-bit Encoding

I'm working on serial port, transmitting and receiving data to some hardware at 8bit data. I would like to store it as string to facilitate comparison, and preset data are stored as string or hex format in xml file. I found out that only when using Encoding.Default which is ANSI encoding then the 8bit data is converted properly and easil...

URL encode in Erlang

I'm using erlang http:request to post some data to a remote service. I have the post working but the data in the body() of the post comes through as is, without any url encoding which causes the post to fail when parsed by the remote service. Is there a function in Erlang that is similar to CGI.escape in Ruby for this purpose? ...

Conversion from 32 bit integer to 4 chars

What is the best way to divide a 32 bit integer into four (unsigned) chars in C#. ...

GetPrivateProfileString Oddity

I was just tinkering around with calling GetPrivateProfileString and GetPrivateProfileSection in kernel32 from .NET and came across something odd I don't understand. Let's start with this encantation: Private Declare Unicode Function GetPrivateProfileString Lib "kernel32" Alias "GetPrivateProfileStringW" ( _ ByVal lpApplication...

Transferring extended ascii characters with unknown encoding to a Twisted XMLRPC from C#

Basically I want to pass a string which contains Spanish text that could be in one of several encodings (Latin-1, CP-1252, or UTF-8 to name a few). Once it gets to the XMLRPC I can detect the encoding, but I won't know it before then. C#, by default seems to be killing any characters outside of ASCII. I've gotten around the problem by...

How do I correct the character encoding of a file?

I have an ANSI encoded text file that should not have been encoded as ANSI as there were accented characters that ANSI does not support. I would rather work with UTF-8. Can the data be decoded correctly or is it lost in transcoding? What tools could I use? Here is a sample of what I have: ç é I can tell from context (café should...

Type double byte character into vbscript file

I need to convert (&rarr) to a symbol I can type into a ANSI VBScript file. I am writing a script that translates a select set of htmlcodes to their actual double byte symbols using a regex. Many languages accomplish this using "\0x8594;"... what is the equivelent in VBScript? ...

How can I set what encoding must be used by a site?

On some systems it is UTF-8, on others latin-1. How do you set this? Is it something in php.ini? (I know you can set the encoding/charset for a given page by setting HTTP headers, but this is not what I am looking for.) Alex ...

newline character(s)

Does your software handle newline characters from other systems? Linux/BSD linefeed ^J 10 x0A Windows/IBM return linefeed ^M^J 13 10 x0D x0A old Macs return ^M 13 x0D others? For reasons of insanity, I am going with using the Linux version of the newline character in my text files. But, when...

How to get UTF-8 working in java webapps?

I need to get UTF-8 working in my Java webapp (servlets + JSP, no framework used) to support äöå etc. for regular Finnish text and Cyrillic alphabets like ЦжФ for special cases. My setup is the following: Development encironment: Windows XP Production encironment: Debian Database used: MySQL 5.x Users mainly use Firefox2 but also O...

Writing XML files using XmlTextWriter with ISO-8859-1 encoding (C#)

I'm having a problem writing Norwegian characters into an XML file using C#. I have a string variable containing some Norwegian text (with letters like æøå). I'm writing the XML using an XmlTextWriter, writing the contents to a MemoryStream like this: MemoryStream stream = new MemoryStream(); XmlTextWriter xmlTextWriter = new XmlTextW...

Character reading from file in Python

In a text file, there is a string "I don't like this". However, when I read it into a string, it becomes "I don\xe2\x80\x98t like this". I understand that \u2018 is the unicode representation of "'". I use f1 = open (file1, "r") text = f1.read() command to do the reading. Now, is it possible to read the string in such a way that wh...