encoding

when using System.Text.UnicodeEncoding.Unicode.GetString(byte[]) reverse encoding to byte array fails intermittently

Can someone tell me why the following code intermittently throws an exception ? I am running Vista Ultimate 32 bit and VS2010 .NET4 byte[] saltBytes = new byte[32]; RNGCryptoServiceProvider.Create().GetBytes(saltBytes); string salt = System.Text.UnicodeEncoding.Unicode.GetString(saltBytes); byte[] saltB...

What kind of utf8 encoding is being used in members of String class in Java?

String class has a constructor: new String(byte[] bytes, Charset charset) and a method: byte[] getBytes(Charset charset) Given that I define my charset as follows: Charset charset = Charset.forName("UTF-8"); What kind of encoding I will in fact use? More specifically is it a standard UTF-8 (as described in RFC 3629), or CESU-...

Insert a ♥ into MySQL (heart character) via PHP

I'm having a heck of a time getting ♥ type characters into my database using php. I've got UTF-8 setting on the page <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> and <?php $line = $_REQUEST['line']; $line = stripslashes($line); $line = htmlspecialchars($line); $line = trim($line); $line = mysql_real_escape...

UPS API - What encoding should I use?

Hi! I'm implementing shipping into my application and I have problems with polish chars. On generated label they appear as '?'. Client is written in C# so all my strings are Unicode. Maybe you know in what encoding I should send data to get polish chars? I send: gęśla jaźń And on the .gif label there is: g??la ja?? It seems that U...

Rails - email subject is gibberish in hotmail

Hi, Sending emails works perfectly for all major email clients, except for hotmail (and some other), it shows as: =?windows-1255?Q?Z33=30_=F9=22=E7=20=F2=E1=E5=F8_=F9=E5=E1=F8=20=E1=F9=E5=E5=E9=20=36=30_=F9=22=E7=20=EC=22=EE=F8=E2=E5=E6=E4=22=2C_=E1=E9=FA_=F7=F4=E4=20=E5=EE=E0=F4=E9=E9=E4_=EE=F9=F4=E7=FA=E9=FA=2C=20=E1=EE=FA=E7=ED=20=F...

Html entity encoding in webapplication

Hi folks, Iam looking for all your valuable suggestions for avoiding a vunerbility through form data in a web application. Which characters needs to be encoded to avoid such injection attacks as part of html entity encoding?.Injection of which chars into our form data will prone to HTML Injections? As of now we are vaidating \",/,\,:,*,?...

Turkish Character Encoding Java Spring MVC

Hi all, i am working on an Spring MVC project and as always turkish characters cause a problem (I hate encoding =)) My DBMS is oracle express 10 g i use spring source tool as IDE -- i am not sure these are useful information The problem causes from a single 'ı' character for now. Pages show ı's as ?s. I checked out the database it is ...

How to remove � character while parsing xml file in android

Respected All, I have to read xml files from server and display data from all files. some data contains character '�' which gives me SAXException while parsing. I have tried to convert UTF-8 format. but it gives me out of application as soon as that char is found in file. I have used SAXParser to parse xml file. If you have any solutio...

Treat unicode character plus diacritic as a single character?

In my VB.NET application I compare words that are recorded using IPA, many of which have many diacritic marks. In one of the comparisons, I compare the words character by character. But when I iterate over the characters, the diacritic marks come out as separate characters (as I would expect since this is unicode): o`ku`ku` However,...

How to deliver different charactersets over sms?

We have an aggregator application written in java which delivers xml based contents from third parties to different SMSC's which in turn deliver it to handsets.The xml content is mostly in plain english other than payload which could be GSM 03.38, IA5, ISO-8859-1, Unicode, binary etc.We have only one handset for testing.We find other tha...

Confict between UTF-8 normalized-forms of encoding for accents

Hello, I've got a bug with UTF-8 normalizations: as far as I understood, there's (at least) two ways to write an 'é' in UTF-8 : CC 81 and C3 A9. [After a migration from Mac/OSX to a PC/Linux] I now have a conflict between the paths I store in my database and the actual file system structure, which prevents me from accessing correctly...

Defining data encoding of SMS messages in Android

I'm working on an application using the SMS apis for android. The receiving end is an embedded unit that only supports 7-bit encoded SMS and the string I'm sending consists only of symbols from this particular alphabet which makes you think that Android is going to send it encoded as 7 bit. But that is not the case. Therefore I'm search...

Command-line arguments as bytes instead of strings in python3

Hello, I'm writing a python3 program, that gets the names of files to process from command-line arguments. I'm confused regarding what is the proper way to handle different encodings. I think I'd rather consider filenames as bytes and not strings, since that avoids the danger of using an incorrect encoding. Indeed, some of my file name...

Iconv.conv in Rails application to convert from unicode to ASCII//translit

We wanted to convert a unicode string in Slovak language into plain ASCII (without accents/carons) That is to do: č->c š->s á->a é->e etc. We tried: cstr = Iconv.conv('us-ascii//translit', 'utf-8', a_unicode_string) It was working on one system (Mac) and was not working on the other (Ubuntu) where it was giving '?' for accented chara...

Django utf-8 and django-mailer strangeness

Latest django mailer from trunk http://github.com/jtauber/django-mailer/tree/master/docs/ Tested with Postgresql 8.4, sqlite3 template {{ title }} forms.py #-*- coding: utf-8 -*- if "mailer" in settings.INSTALLED_APPS: from mailer import send_mail else: from django.core.mail import send_mail ... body_txt = render...

Non English characters appear as question marks on my php page - appear fine in database - please help

Hi guys, I have a mysql database tabel populated with non english data. When I view the data in Navicat MySQL browser the data appears fine. However when I run a php script to seelct and display the data on a web page it displays question marks instead. The page encoding is set to utf8 and even the mysql collation is set to utf8 - someth...

Encoding of folders name on windows server and html

Hi, I opened a new account on an American Web-host and I didn't take into account that his server will not support Hebrew chars(on his file system): Any way, after coping the gibberish path : <pic src="..\_images\gallery\smallPictures\2010-08-02 âìøéä àåâåñè 2010\" width="150" height="120"></pic> to my XML file and saving it with...

Python regex against Latin-1 character encoding?

I have a file which contains (I believe) latin-1 encoding. However, I cannot match regexes against this file. If I cat the file, it looks fine: However, I cannot find the string: In [12]: txt = open("b").read() In [13]: print txt <Vw_IncidentPipeline_Report> In [14]: txt Out[14]: '\x00 \x00 \x00<\x00V\x00w\x00_\x00I\x00n\x0...

Converting chinese character to Unicode

Let's say I have a random Chinese character, 玩. I want to convert it to Unicode, which would be U+73A9. How could I do this in C#? ...

Server.UrlEncode vs Uri.EscapeDataString

What exactly is the difference between the 2 , the output seems similar except the Uri.EscapeUriString encodes spaces to %20 and Server.UrlEncode encodes them as a + sign. And the final question which should be used preferably ...