character-encoding

Dumping text files

I'm writing a shell script that will create a textual (i.e. diffable) dump of an archive. I'd like to detect whether or not each file is printable in some given character set, and if it is printable, I'd like to convert to that character set from whatever one it's in, if this is possible, and make its contents part of the dump. I've co...

Where can I find an UTF8 bits to char table to convert for instance "ñ" into "ñ"?

Hello.. I have been looking thoroughly through the Web and I cannot seem to find a table with those kind of conversions. The ones I find have some mistakes and are not too reliable, so I have looked for some official table or alike, but unfortunately I haven't.. so here I am.. As mentioned in the title, what I want to do is for instance...

Flex: Write data to file to open in Excel

Hello, I have a Flex application with a couple of DataGrids with data. I'd like to save the data to a file so that the user can keep working with them in Excel, OpenOffice or Numbers. I'm currently writing a csv file straight off, which opens well in OpenOffice or Numbers, but not in Excel. The problem is with the Swedish characters Å...

Are binary characters legal in MIME headers?

I work on a server that processes email, and as part of that, we do some MIME parsing/encoding. I've recently had an issue arise for a message that is valid otherwise, but contains a Latin-1 character in a MIME header. Someone entered an e-mail address to multiple recipients containing a Latin-1 character, so the SMTP envelope only con...

xcode std::wcout with wchar_t or std::wstring!

Hi, I am trying to print a wstring/wchar_t in xcode to the console but unfortunatelly it only works with basic chars (i think ascii) chars, everything else gets displayed in numbers, for instance the following: std::cout << "äöüu"<< std::endl; std::wcout << L"äöüu" << std::endl; while the cout version prints "äöüu" as expected I get t...

Encoding for Return value

I´ve got this javascript: <a href="javascript:addtext('q');">q</a> When it is clicked it writes text on a textarea. I went through the encoding and found can do things like this: This will add a " " (Space) <a href="javascript:addtext('%20');">Space</a> And this will add an "á" <a href="javascript:addtext('&aacute;');">á</a> N...

Strings and character encoding in C++

I read a few posts about best practices for strings and character encoding in C++, but I am struggling a bit with finding a general purpose approach that seems to me reasonably simple and correct. Could I ask for comments on the following? I'm inclined to use UTF-8 and UTF-32, and to define something like: typedef std::string string8;...

JSF Encode UTF - 8 ?

Hello All , now i work with my friend , he is Vietnamese and he want create website with Vietnamese Language, but we have problem with Encode UTF 8 i was write class Filter follow: import java.io.IOException; import javax.servlet.Filter; import javax.servlet.FilterChain; import javax.servlet.FilterConfig; import javax.servlet.ServletExc...

how to create Postgres conversion from big5 to utf8

i use postgreSQL,in my server encoding is utf8 and at client_encoding is big5. when i insert chinese character always failed.. any idea? thanks guys ...

Anyone understands this PHP function,why it's guranteed to output chinese characters only?

function getChnRandChar($length) { mt_srand((double)microtime() * 1000000); $hanzi = ''; for ($i = 0; $i < $length; $i++) { $number = mt_rand(16, 56) * 100 + mt_rand(1, 19); $tmpHanzi = chr(mb_substr($number, 0, 2) + 160); $tmpHanzi .= chr(mb_substr($number, 2, 2) + 160); $hanzi .= mb_convert_e...

Converting utf8_general_ci tables and fields to utf8_unicode_ci

Hi Everyone, I have made a mistake when designing my application database several years ago and collation settings of my tables and table fields are mixed. Some of them are utf8_general_ci and some of them are utf8_unicode_ci. This causes problems when joining tables with different collations. Now, I am planning to change collation set...

HTML decoding in C/C++

I'm using libcurl for getting HTML pages. I have some problems with Hebrew characters. for example this: &#1505;&#1500;&#1511;&#1493;&#1501; gets gibberish. How do I get Hebrew characters and not gibberish? Do I need some HTML decoder? Does libcurl support such operation? Does libiconv support such operation? I appreciate any help...

IE6 : Download html without executing the contained javascript, in ISO-8859-1 charset

Hi ! Here is a code (using jquery) i use to download a html file from which I extract a table and its content. var url = $('#url').val(); // url to access if ($.browser.msie && $.browser.version.substr(0,1)<7) { var frame = $('<iframe/>').css('display', 'none').attr('src', url ); frame.appendTo('body') .load(function() { ...

String class based on graphemes?

I'm wondering why we don't have some string classes that represent a string of Unicode grapheme clusters instead of code points or characters. It seems to me that in most applications it would be easier for programmers to access components of a grapheme when necessary than to have to organize them from code points, which appears necessa...

How to convert ISO-8859-1 to UTF-8 using libiconv in C++

I'm using libcurl to fetch some HTML pages. The HTML pages contain some character references like: &#1505;&#1500;&#1511;&#1493;&#1501; When I read this using libxml2 I'm getting: ׳₪׳¨׳˜׳ ׳¨ is it the ISO-8859-1 encoding? If so, how do I convert it to UTF-8 to get the correct word. Thanks EDIT: I got the solution, MSalters was right...

File name encoding at Knoppix

I am getting file list in my Java program using list() method of File class. When I run my program on Knoppix I get ???? instead of Cyrillic file names. It seems that problem is in knoppix, not java. I tried to use options for mounting file system, such as nls and iocharset, but it has no effect (or may be I use it in wrong way). Somebod...

JSON character encoding

Hi, My Java web application submits an AJAX request that returns JSON such: {'value': 'aériennes'} When 'aériennes' is displayed in the webpage, it appears as 'a�riennes', so I guess there's some kind of character encoding problem. The AJAX response headers include Content-Type application/json which doesn't appear to include a...

Recover from using bad code page in C#

I have read string "ńîôč˙" from file by using code page windows-1251, instead of using iso-8859-2. It should be some Cyrillic string. How to implement function that will do following in C#: string res = Recover("ńîôč˙"); string Recover(string input) { ??? } Where res is Cyrillic string that I would have got if I used good page wh...

FCKEditor can not send Turkish characters in UTF-8.

FCKEditor can not send Turkish characters in UTF-8. It converting turkish characters to html encoding. E.g. : Original FCKEditor Source "öçşiğüı" -> &ouml;&ccedil;şiğ&uuml;ı How can I prevent this conversion. Thx... ...

In base64 what happens if the character you want to encode isn't A-Z, a-z , + or /

In base64 what happens if the character you want to encode isn't A-Z, a-z , + or /? If I wanted to encode a URL in base64 which has a colon (:) in it what would happen since its not in the base64 index. ...