utf-8

How convert PHP value from windows-1257 to UTF-8

How convert PHP value from windows-1257 to UTF-8? I tried many ways, but they was not successful. I have lttu�s and I wanna convert this to littūs. utf8_encode(); iconv_set_encoding("windows-1257", "UTF-8"); mb_convert_encoding() Doesn't work. :( Can anybody help me? ...

Python unicode issues (2.6)

I'm currently working on a irc bot for a multi-lingual channel, and I'm encountering some issues with unicode which are proving nearly impossible to solve. No matter what configuration of unicode encoding I seem to try, the list function which the below code sits within just flat out does nothing (c.notice is a class function which sen...

Strange xml/html accent issue

I have an XML file that contains a message with html tags in it. The XML file is read by a java class that mails it to people. When the mail is received, the accents do not show. For example é doesn't show. I have tried é in the xml but it gives an error in eclipse saying that the entity has not been declared. I also tried sim...

\w in PHP preg_replace covers only second byte of UTF-8 chars

we have this code: $value = preg_replace("/[^\w]/", '', $value); where $value is in utf-8. After this transformation first byte of multibyte characters is stripped. How to make \w cover UTF-8 chars completely? Sorry, i am not very well in PHP ...

Forcing a mixed ISO-8859-1 and UTF-8 multi-line string into UTF-8 in Perl

Consider the following problem: A multi-line string $junk contains some lines which are encoded in UTF-8 and some in ISO-8859-1. I don't know a priori which lines are in which encoding, so heuristics will be needed. I want to turn $junk into pure UTF-8 with proper re-encoding of the ISO-8859-1 lines. Also, in the event of errors in th...

.NET Weird character encoding issue

Our globalization mechanism stores error messages in a SQL 2005 DB. Some of the error messages are used as subjects on email messages sent to the development team. Recently, with no clear reason, we started receiving emails with strangely encoded subjects, such as: =?utf-8?B?Qm1mQm92ZXNwYS5Qb3NUcmFkaW5nRXNwZWNpZmljYWNhbyAtIFN1Y2Vzc...

Get correct output from UTF-8 stored in VarChar using Entity Framework or Linq2SQL?

Borland StarTeam seems to store its data as UTF-8 encoded data in VarChar fields. I have an ASP.NET MVC site that returns some custom HTML reports using the StarTeam database, and I would like to find a better solution for getting the correct data, for a rewrite with MVC2. I tried a few things with Encoding GetBytes and GetString, but I...

UTF-8 BOM signature in PHP files

I was writing some commented PHP classes and I stumbled upon a problem. My name (for the @author tag) ends up with a ș (which is a UTF-8 character, ...and a strange name, I know). Even though I save the file as UTF-8, some friends reported that they see that character totally messed up (È™). This problem goes away by adding the BOM sign...

Python UTF-8 can't decode byte on 32-bit machine

it works fine on 64 bit machines but for some reason will not work on python 2.4.3 on a 32-bit instance. i get the error 'utf8' codec can't decode bytes in position 76-79: invalid data for the code try: str(sourceresult.sourcename).encode('utf8','replace') except: raise Exception( repr(sourceresult.sourcename ) ) ...

UTF-8 BOM Problem

Hi, I am using Komodo Edit. I have to encode some files as UTF-8 without BOM in Komodo. In my localhost and site there is no problem but on some sites i am seeing BOM sign and this is a terrible problem for AJAX-JSON response. Any advices? Thanks. ...

Consuming RSS Feed In PHP

I'm trying to use an RSS feed from my blog on the news section on another site. Everything seems to be working fine until I use something like an ellipsis on my blog. The expected output is: One more time…less fail Although this is no joking matter… The actual output is: One more time?less fail Although this is no joking matter… ...

validating utf-8 in htaccess rewrite rule

i validate urls with utf-8 characters with a rewrite rule RewriteRule ^([a-z]{2})/([a-z0-9-]{1,256})/([[:print:]]{1,256})$ index.php?language=$1&categories=$2&get_query=$3 [L] $get_query is the point, this accepts: test!?!'"<>*+ but fails for accented chars as àèéìòù, or other utf-8 for example in wikipedia this works great: http://en...

Efficient way to ASCII encode UTF-8

I'm looking for a simple and efficient way to store UTF-8 strings in ASCII-7. With efficient I mean the following: all ASCII alphanumeric chars in the input should stay the same ASCII alphanumeric chars in the output the resulting string should be as short as possible the operation needs to be reversable without any data loss the resul...

UTF-8 emails to Mac Mail and Gmail

I'm using Pear mail_mime to send HTML emails out, and first the UTF-8 characters were messed up in Gmail, but not in Mac Mail. I discovered that I needed to add parameters to the get() function to correct the character set used in the HTML portion of the MIME message. It was defaulting to ISO. So, I've corrected this problem, the email ...

Site doesn't show up. Instead a bunch of weird characters?

‹�����혱jÃ0†w=Å=AÜ ÂЃ)ÅKGÅ:¢En%¹©ß¾²Ý 7xèpußøãŸ~ÝöÇ®Ömót¨•îŸû®©îao‚½‘Í:ºR†æk@´huõÃ(]­;z:¼•Íö¾þ{¥•‚¾ímwi£_±Ä1)–ÄÇ�‡‘,‰%Ž#YKF²Ä²Ä8ŒèKF²$–88ŒdI,qpÉ’Xâà0’%±Ä1Àaþe–TïÆOŒ@ 2^ßÇh"ù¦`Î!뜄yœ"Dü˜0e°Ó:ËË>e„ñʈfp.à(U®<œv¿ì;xñhRY3˜‹¡�ÞdŒ;Uºõ×R°WkÑ^Z÷¥¯Wß.Ò¤·�� That's exactly what shows up instead of my website in the web browser. Though on local...

PHP Streaming CSV always adds UTF-8 BOM

The following code gets a 'report line' as an array and uses fputcsv to tranform it into CSV. Everything is working great except for the fact that regardless of the charset I use, it is putting a UTF-8 bom at the beginning of the file. This is exceptionally annoying because A) I am specifying iso and B) We have lots of users using tools ...

converting html entity to utf-8 character

Hello, I am having this problems in grails where I am writing a string from the database into an xml file using StreamingMarkUpBuilder. The xml file displays the string as htmlentities &#x30b3 &#x30d4 &#x30fc, how can I convert them to be printed as コピー? Thanks! ...

get content from website with utf8 format

i want how to get the content from websites with utf8 format,, i have writing the following code is try { String webnames = "http://pathivu.com"; URL url = new URL(webnames); URLConnection urlc = url.openConnection(); //BufferedInputStream buffer = new BufferedInputStream(urlc.getInputStream()); ...

create an UTF-8 string with BOM

Hi guys, I'm using MD5 function and Base64 Encoding to generate a User Secret (used to login to data layer of the used API) I did the code in javascript and it's fine, but in Objective C I'm strugling with the BOM my code is: NSString *str = [[NSString alloc] initWithFormat:@"%@%@%@%d", [auth up...

Japanese text garbled while passing to a http restlet service

I have a Perl client which is calling an http restlet service (put method). Some of the parameters in this call contain japanese text. When I printed the contents of these request parameters in the restlet service I found these chars garbled ! This is my PERL client code: my %request_headers = ( 'DocumentName' => $document_name...