encoding

Where can I get started with Unicode-friendly programming in C?

So, I’m working on a plain-C (ANSI 9899:1999) project, and am trying to figure out where to get started re: Unicode, UTF-8, and all that jazz. Specifically, it’s a language interpreter project, and I have two primary places where I’ll need to handle Unicode: reading in source files (the language ostensibly supports Unicode identifiers a...

NIhongo Values From struts form are garbage

Hi. I have a j2ee web application running on tomcat. Im using Struts here and I don't know why the values passed from my struts forms are garbage. I tried printing the values on the business logic and it is already garbage so that problem can't be in the database. I already tried setting charset for the page to shift_jis and UTF-8 but it...

How do I properly work with unicode characters in python to keep from getting errors?

I'm working on a python plugin for Google Quick Search Box, and it's doing some odd things with non-ascii characters. It seems like the code works fine up until I try constructing a string containing the non-ascii characters (ü has been my test character). I am using the following code snippet for the construction, with new_task as the v...

Django Haystack : search for a term with and without accents

I'm implementing a search system onto my django project, using django haystack. The problem is that some fields in my models have some french accents, and I would like to find the entries which contents the query with and without accents. I think the best Idea is to create a SearchIndex with both the fields with the accents, and the sam...

What is Unicode, UTF-8, UTF-16?

What's the basis for Unicode and why the need for UTF-8 or UTF-16? I have researched this on Google and searched here as well but it's not clear to me. In VSS when doing a file comparison, sometimes there is a message saying the two files have differing UTF's. Why would this be the case? Please explain in simple terms. ...

Encoding Audio (to AAC) in Silverlight 4 (on the client)?

OK so Silverlight 4 is adding support for capturing from microphones (and webcams), however for this facility to be useful (in my case at least) I'd need to upload this captured data to a server to save. The AudioCaptureDevice will record PCM audio on the client, and as we all know PCM is not the most efficient encoding... the data woul...

Django not translating Bittorrent query string properly

Hello, I'm writing a small Bittorrent tracker on top of the Django framework, as part of a larger project. However, I'm having problems with decoding the "info_hash" parameter of the announce request. Basically, uTorrent takes the SHA1 hash of the torrent in question and URL encodes the hex representation of it, which is then sent to t...

Why does the Integer.parseInt throw NumberFormatException on input that seems valid ?

I'm doing a simple exercise from a book and I'm a little bit confused with how the java function parseInt works. I have read a line from an input file, used the StringTokenizer to split it and now I want to parse each part as an integer. I have checked in the watch window that the input of the parseInt function is indeed a string which ...

encoding problem on file_get_contents

i'm using a script for getting a url's content then it calculates keyword destiny etc. but my problem is that, there is problem about turkish characters like "ı","ş" i tried iconv for converting utf-8 to iso-8859-9 but it didn't work. you can see the code on http://www.gazihanisildak.com/keyword/code.txt thx in advance. ...

How does PDF417 barcode decoding recover from damaged labels?

I recently learned about PDF417 barcodes and I was astonished that I can still read the barcode after I ripped it in half and scanned only a fragment of the original label. How can the barcode decoding be that robust? Which (types of) algorithms are used during encoding and decoding? EDIT: I understand the general philosophy of introdu...

php5 encoding problem turkish characters

i have a php script which detects keyword density on given url. my problem is, it doesn't detect turkish characters or deletes them.. i'm getting contents of url by file_get_contents method. this method works perfect and gets all content with turkish characters. you can see my code on http://www.gazihanisildak.com/keyword/code.txt ...

Why is Java BufferedReader() not reading Arabic and Chinese characters correctly?

I'm trying to read a file which contain English & Arabic characters on each line and another file which contains English & Chinese characters on each line. However the characters of the Arabic and Chinese fail to show correctly - they just appear as question marks. Any idea how I can solve this problem? Here is the code I use for readin...

Extract correct text from a wifstream regardless of encoding.

Here is the program: http://codepad.org/eyxunHotThe encoding of the file is UTF-8. I have a text file named "config.ini" with the following word in it: ➑ball If I use notepad to save the file with "UTF-8" encoding, then run the program, according to the debugger the value of eight_ball is: âball If I use notepad to save the file wi...

How to make a text file have more than one encoding?

I have a file which is ANSI encoded. However it shows Arabic letters inside it. this text file was generated by some program (I have no info on) but it seems like there is some kind of internal encoding (if I might say and if it's possible) for the Arabic letters to make appear. Is there such a thing? If not, how can the ANSI file show ...

How to write Cyrillic text in C++ console ?

For example, if I write: cout << "Привет!" << endl; //it's hello in Russian in console it would be something like "╧ЁштхЄ!" ok, I know that we can use: setlocale(LC_ALL, "Russian"); but after that not working command line arguments in russian (if I start my program through BAT file): StartProgram.bat chcp 1251 MyProgram.exe -use...

ruby: unknown encoding name: undecided

I've actually figured out what causes this error, but Googling for it was unsuccessful so I thought I'd write it down here to help out other people. This error pops up when you've got an # -*- coding: undecided -*- comment at the top of one of your files. Emacs added this automatically for me, but re-saving the file caused it to be chang...

Apostrophe issue in RTF

I have a function within a custom CRM web application (old VB.Net circa 2003) that takes a set of fields from a database and merges them with palceholders in a set of RTF based template documents. These generate merged letters and documentation. The code essentially loops through each line of the RTF template file and replaces any instan...

PHP E-Mail Encoding?

Hi, I am having some trouble with foreign characters when sending an e-mail. Could someone advise me on what to do. I think the problem could be one of three things. 1) The html page encoding is incorrect. (Would this affect the POST data from the form?) 2) The mail function doesn't have any encoding. Thus the program doesn't know how ...

Do I need to encode encrypted bytes before sending?

Do I need to encode my encrypted bytes when sending it through java data streams for instant messaging? Sending encrypted bytes don't look very safe to me. If so, should I use Hex or Base64 encoding? Thanks ...

XML Carriage return encoding

Hi there, I was looking to represent a carriage return within an xml node. I have tried a whitespace preserve, hex entity with no luck- and a \n. viewing via a browser. Example <Quote> Alas, poor Yorick! I knew him </Quote> Thanks Joe ...