utf-8

PHP filter for non standard characters

I get text as user input and somewhere in the text there are no standard characters, like this. The text is stored to a database. Everything is in UTF-8 and it works well, only it displays strange signs for the non standard characters. How can I filter these characters in PHP? Edit: I discovered that the text with the wrong charact...

Do I really need to switch from VARCHAR to VARBINARY for UTF-8 in Mysql & PHP?

Do I really need to switch from VARCHAR to VARBINARY and TEXT to BLOB for UTF-8 in Mysql & PHP? Or can I stick with CHAR/TEXT fields in MySQL? ...

How to detect UTF-8 in plain C?

I am looking for a code snippet in plain old C that detects that the given string is in UTF-8 encoding. I know the solution with regex, but for various reasons it would be better to avoid using anything but plain C in this particular case. Solution with regex looks like this (warning: various checks omitted): #define UTF8_DETECT_REGEXP...

Unicode characters not showing in Zend_Pdf?

require_once 'Zend/Pdf.php'; $pdf = new Zend_Pdf(); $page = $pdf->newPage(Zend_Pdf_Page::SIZE_A4); $pdf->pages[] = $page; $page->setFont(Zend_Pdf_Font::fontWithName(Zend_Pdf_Font::FONT_HELVETICA), 10); $page->drawText("Bogus Russian: это фигня", 100, 400, "UTF-8"); $pdfData = $pdf->render(); header("Content-Disposition: inline; filename=...

Converting a UCS2 string into UTF8 in Ruby

How to convert a string that is in UCS2 (2 bytes per character) into a UTF8 string in ruby? ...

What causes my XML to break?

I have the following XML code. <firstname> <default length="6">Örwin</default> <short>Örwin</short> <shorter>Örwin</shorter> <shortest>�.</shortest> </firstname> Why does the content of the "shortest" node break? It should be a simple "Ö" instead of the tedious �. XML is UTF-8 encoded and the function which processes the output of...

How do I ensure the British Pound Sterling Sign (£) appears correctly?

I have a PHP script that reads in data from an XML file, returns it via AJAX to a page which then places the data in to the relevant text area. The Content-Type of the page is as follows: <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" /> The XML heading looks like this: <?xml version="1.0" encoding="UTF-8"?> <!D...

Why would I use a Unicode Signature Byte-Order-Mark (BOM)?

Are these obsolete? They seem like the worst idea ever -- embed something in the contents of your file that no one can see, but impacts the file's functionality. I don't understand why I would want one. ...

How prevalent is UTF-8 really?

How wide-spread is the use of UTF-8 for non-English text, on the WWW or otherwise? I'm interested both in statistical data and the situation in specific countries. I know that ISO-8859-1 (or 15) is firmly entrenched in Germany - but what about languages where you have to use multibyte encodings anyway, like Japan or China? I know that a...

How can I specify a special character in PHP

I am trying to output a sigma character () in a label in a FusionChart graph. How can I specify that character in a PHP string? I have tried the htmlentity &sigma;, but it is not interpreted correctly by the graph. Is there any way to specify the character in PHP using some sort of character code? ...

German Umlaute in Mysql/Phpmyadmin

I have Flex application with UT8-encoding. It is sending back to the Server (PHP), and the data gets written in to Mysql (UT8 charset, utf8_general_ci). I have no problems at all writing/reading Umlaute from/to the database. I only realized, by looking at the data with PHPmyadmin that the Umlaute get somehow converted to: ö => ö ü => ...

how to control charset of directory listing via .htaccess?

Hi all, I've enabled directory listing of a folder under public_html, by adding: Options +Indexes in the .htaccess file. However, some files are not listed correctly by default, as some filenames are in Chinese (UTF-8 encoded). I can see the filenames if the change the browser's charset encoding to UTF-8. How can I let the browser ...

HTML - force webbrowser to enter form text as UTF8?

I want to standardise on UTF8 on our web browser. All our databases and internet stufff is in UTF8. All our web servers SAre sending the charset=utf-8 HTTP header. However I've discovered that my changing the encoding on my Firefox (View -> Character Encoding) to something else I can enter Latin-9 character into a form and PHP just treat...

Passing an UTF8 string via java to a .NET web service

Hi %, in order to 'feed' a .NET web service from java I do pass xml strings via a direct socket connection over to the server. Everything works wunderbar as long as I don't include any 'wierd' characters in my xml strings. Ä or ß for examples sake. I scripted around and figured that in php5 the problem is solved by utf8_encode(myXmlS...

SetThreadLocale and UTF8

So I want to use SetThreadLocale to set a threads codepage to UTF8. Up to now, I've been using the second parameter of atl string conversion macros like "CT2A(szBUF, CP_UTF8)" to do this. But I want to be able to set the thread codepage once in the beginning with SetThreadLocale() and never have to use the second parameter of the conver...

MBCS to UTF-8 C++

I'm working on a project in VS2008 that I'm compiling in MBCS but I need to work with some UTF-8 strings to interact with some web services. I wrote a function that works perfectly with Unicode but not MBCS. Is there any way I can convert a MBCS string to UTF-8 or to Unicode? Thanks! ...

UTF Encoding in java

I need to encode a message from request and write it into a file. Currently I am using the URLEncoder.encode() method for encoding. But it is not giving the expected result for special characters in French and Dutch. I have tried using URLEncoder.encode("msg", "UTF-8") also. Example: Original message: Pour gérer votre GSM After encod...

how to set input charset to unicode in VB.net or VC++.net

Hi there fellow programmers.. i am using Web Browser control in VB.net 2005, the application i wrote shows a webpage on my computer which has 2 text areas, one for input, and the other for output. my problem is, i need the charset of the whole program to be unicode, coz the charset of the webpage is utf8. and right now, when i process ...

Are special characters in e-mail address possible?

I'm using Apache Commons e-mail validator and it refuses to accept email address like: ąźóęł@email.com so I would like to ask if it's right to not allow them or I should change validator? ...

Looking for great character set/encoding resources or tools for PHP webapp development

Hi guys, I've been having a lot of trouble with character sets/encoding while writing a multi-lingual web app in PHP in different places such as the shell, inside PHP itself, and in the database. I want the whole application to be UTF-8 throughout, so that I won't have to worry about converting anything back and forth anymore. Does any...