encoding

Encoding arbitrary data into numbers?

Is there a common method to encode and decode arbitrary data so the encoded end result consists of numbers only - like base64_encode but without the letters? Fictitious example: $encoded = numbers_encode("Mary had a little lamb"); echo $encoded; // outputs e.g. 12238433742239423742322 (fictitious result) $decoded = numbers_decode("12...

ruby 1.9: invalid byte sequence in UTF-8

I'm writing a crawler in ruby (1.9) that consumes lots of HTML from a lot of random sites. When trying to extract links, I decided to just use .scan(/href="(.*?)"/i) instead of nokogiri/hpricot (major speedup). The problem is that I now receive a lot of "invalid byte sequence in UTF-8" errors. From what I understood, the net/http library...

Ruby character encoding problems in netbeans and command wíndow

I use netbeans as development IDE and runs the application from cmd but have problems to display ISO 8859-1 characters like åäö correct in both cmd window and when I run the application from netbeans Question: What is best practice to set it up Right now I do @output.puts indent + "V" + 132.chr + "lkommen till Ruby Camping!" to get...

Autodetect console output encoding in perl

I have a perl script that prints some information to console in Russian. Script will be executed on several OSes, so console encoding can be cp866, koi8-r, utf-8, or some other. Is there a portable way to detect console encoding so I can setup STDOUT accordingly so the text is printed correctly? ...

Dealing with wacky encodings in Python

I have a Python script that pulls in data from many sources (databases, files, etc.). Supposedly, all the strings are unicode, but what I end up getting is any variation on the following theme (as returned by repr()): u'D\\xc3\\xa9cor' u'D\xc3\xa9cor' 'D\\xc3\\xa9cor' 'D\xc3\xa9cor' Is there a reliable way to take any four of the abov...

Changing character encoding in MySQL, PHP scripts, HTML

So, I have built on this system for quite some time, and it is currently outputting Latin1 (ISO-8859-1) to the web browser, and this is the components: MySQL - all data is stored with the Latin1 character set PHP - All PHP text files are stored on disk with Latin1 encoding HTML - The output has the http-equiv="content-type" content="...

Python + PostgreSQL + strange ascii = UTF8 encoding error

I have ascii strings which contain the character "\x80" to represent the euro symbol: >>> print "\x80" € When inserting string data containing this character into my database, I get: psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0x80 HINT: This error can also happen if the byte sequence does not match the encodi ng ...

HTML Encoding with ASP.NET

I am currently html encoding all user entered text before inserting/updating a db table record. The problem is that on any subsequent updates, the previously encoded string is reencoded. This endless loop is starting to eat up alot of column space in my tables. I am using parameterized queries for all sql statements but am wondering wou...

Ruby custom class to and from YAML;

Hi. I'm having trouble deserializing a ruby class that I wrote to YAML. Where I want to be I want to be able to pass one object around as a full 'question' which includes the question text, some possible answers (For multi. choice) and the correct answer. One module (The encoder) takes input, builds a 'question' class out of it and app...

How to encode JavaScript text inside an XML attribute?

I have a piece of JavaScript string, coming from an untrusted source, embedded inside of an onclick tag and I'm not sure what the correct way of encoding this string is. Here is a simplification of the HTML: <input type="button" onclick="alert([ENCODED STRING HERE]);" value="Click me" /> I use the Microsoft AntiXss library which c...

youtube - video upload failure - unable to convert file - encoding the video wrong?

I am using .NET to create a video uploading application. Although it's communicating with YouTube and uploading the file, the processing of that file fails. YouTube gives me the error message, "Upload failed (unable to convert video file)." This supposedly means that "your video is in a format that our converters don't recognize..." I h...

How to copy a node set into a resultant attribute using XSLT without white space and tags encoded?

Given this XML... <?xml version="1.0" encoding="UTF-8"?> <root> <item> <this> <that>one</that> </this> </item> <item> <this> <that>two</that> </this> </item> <item> <this> <that>three</that> </this> </item> </root> I want to make copies of the items into a new for...

encoding changes when retrieving data with php from mysql table

Hi all! I have a very strange problem when retrieving data with php from a mysql table. Basically, two php files with the EXACT same content are given data with different encodings and i dunno why. Here's the code: $dbhost = 'localhost'; $dbuser = 'myuser'; $dbpass = 'mypass'; $conn = mysql_connect($dbhost, $dbuser, $dbpass) or die ('Er...

UTF8 issues on Linux

Hi, I have some code that fetches some data from the database, database codepage is UTF8. When I run the code on a linux box, some characters come out as question marks (?) but when I run the same code on a windows server, all characters appear correctly. When I do: $> $LANG Following is returned en_SG.UTF-8 en_SG is something that d...

How to configure encoding in maven

When I run maven install on my multi module maven project I always get the following output: [WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent! So, I googled around a bit, but all I can find is that I have to add <properties> <project.build.sourceEncoding>UTF-8</project.buil...

VB.NET - Convert Unicode in one TB to Shift-JIS in another TB

Trying to develop a text editor, I've got two textboxes, and a button below each one. When the button below textbox1 is pressed, it is supposed to convert the Unicode text (intended to be Japanese) to Shift-JIS. The reason why I am doing this is because the software VOCALOID2 only allows ANSI and Shift-JIS encoding text to be pasted in...

Getting latest version from Visual SourceSafe changes character encoding

When I use the Visual SourceSafe (2005) Explorer to get the latest version of a file to my client (Win 7) machine, and then diff my newly gotten local copy with the one in the repository, VSS tells me that the files have different character encodings. What gives? ...

How to convert UTF-8 to text in HTML entity ?

I have a downloader program that download pages from internet . the encoding of each page is different , some are in UTF-8 and some are Unicode. For example : &#97; that shows 'a' character ; pages full of this characters .We should convert this encodings to normal text . I used the UnicodeEncoding class in c# , but they do not help me ...

Encoding a string as an integer .NET

I have a string that I would like represented uniquely as an integer. For example: A3FJEI = 34950140 How would I go about writing a EncodeAsInteger(string) method. I understand that the amount of characters in the string will make the integer increase greatly, forcing the value to become a long, not an int. Since I need the value to ...

unknown data encoding

Hi, While i was working with an old application with existing database which is in ms-access contains some strange data encoding such as 48001700030E0F465075465A56525E1100121D04121B565A58 as email address What kind of data encoding is this? i tried base64 but it dosent seems that. Can anybody with previous experience with ms-access co...