utf-8

UTF-8 and X11 iso10646-1 fonts with ImageMagick

Trying to write UTF-8 text to an image without any luck. I've spent several hours reading docs and googling. /usr/bin/xlsfonts shows: ... -adobe-helvetica-medium-r-normal--10-100-75-75-p-56-iso10646-1 -adobe-helvetica-medium-r-normal--10-100-75-75-p-56-iso8859-1 ... (BTW, this seems like an excellent page for these fonts: http://www.cl...

Handing Non-UTF8 content in my Rails application appropriately

I have a Rails application that allows users to import information from various sources using RSS feeds and such. My default encoding on the database is UTF8 and I've been receiving a lot of exceptions in regards to non-UTF8 data that is coming through the system and crashing once it hits the database. I'm to appropriately detect the n...

simplest xml editor for non programmers

Hi I need the simplest editor with utf-8 support for editing xml files in windows; something like wordpad is perfect. It's for a non programmer, to edit existing files (up to now he used wordpad, but now that I translated the files in utf-8 encoding a lot of italian accents are obviously unreadable). Any suggestion? Thanks, this would ...

How do I read UTF-8 with diamond operator (<>)?

I want to read UTF-8 input in Perl, no matter if it comes from the standard input or from a file, using the diamond operator: while(<>){...}. So my script should be callable in these two ways, as usual, giving the same output: ./script.pl utf8.txt cat utf8.txt | ./script.pl But the outputs differ! Only the second call (using cat) see...

Should I still use html entities? Why?

Is html entities still useful or should I simply create UTF-8 encoded html documents? Please explain why. ...

Tool to convert code source from a codepage to UTF-8?

I'm working on an open source project. The original project contains comments in russian and is using codepage 1251. I'm using codepage 1252 and the russian comments aren't displayed correctly in Visual Studio Express 2008, not nice but anyway I can't read russian. Someone using codepage 950 (traditional chinese) tried to compile the pro...

encoding discrepancy between "main" page and dojo dialog

I have a strange encoding situation, whereby the html page itself displays as it should (with all the accénted chäracters properly displayed), but all the popup dojo dialogs fail to use the correct encoding. Here is the setup: Java web project with Hibernate/Spring/Struts2 running on Tomcat6.0.18; the pages are generated as JSP tiles,...

C programming: How to program for Unicode?

What prerequisites are needed to do strict Unicode programming? Does this imply that my code should not use char types anywhere and that functions need to be used that can deal with wint_t and wchar_t? And what is the role played by multibyte character sequences in this scenario? ...

Easiest way to format rtf/unicode/utf-8 in a RichTextBox?

I'm currently beating my head against a wall trying to figure this out. But long story short, I'd like to convert a string between 2 UTF-8 '\u0002' to bold formating. This is for an IRC client that I'm working on so I've been running into these quite a bit. I've treid regex and found that matching on the rtf as ((\'02) works to catch it,...

Started Process from .NET but RedirectedStandardOutput doesn't support UTF-8

I am trying to call php's HTML purifier from .NET using this code: Process myProcess = new Process(); myProcess.StartInfo.FileName = "C:\Path\to\php.exe"; myProcess.StartInfo.Arguments = "C:\Path\to\purify.php"; myProcess.StartInfo.UseShellExecute = false; myProcess.StartInfo.RedirectStandardOutput = true; myPro...

How do I determine file encoding in OSX?

I'm trying to enter some UTF-8 characters into a LaTeX file in TextMate (which says its default encoding is UTF-8), but LaTeX doesn't seem to understand them. Running cat my_file.tex shows the characters properly in Terminal. Running ls -al shows something I've never seen before: an "@" by the file listing: -rw-r--r--@ 1 me user...

How to write UTF8 text to MySQL from ASP.NET via ODBC?

I'm using MySQL 5 on shared hosting, connecting from ASP.NET 3.5 using the MySQL 5.1 ODBC driver. I'd like to store UTF8 strings. My tables used to be all in "latin1_swedish_ci", but I converted the the database, table, and column to UTF8 using: ALTER DATABASE `my_db` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci ; ALTER TABLE `my_...

Get non-UTF-8-form fields as UTF-8 in PHP?

I have a form served in non-UTF-8 (it’s actually in Windows-1251). People, of course, post there any characters they like to. The browser helpfully converts the unpresentable-in-Windows-1251 characters to html entities so I can still recognise them. For example, if user types an →, I receive an &#8594;. That’s partially great, like, if I...

HttpUtility.HtmlEncode doesn't encode everything

I am interacting with a web server using a desktop client program in C# and .Net 3.5. I am using Fiddler to see what traffic the web browser sends, and emulate that. Sadly this server is old, and is a bit confused about the notions of charsets and utf-8. Mostly it uses Latin-1. When I enter data into the Web browser containing "special"...

Compiling UTF-8 encoded source with Unicode line separators

Using the latest version of the Microsoft Compiler (included with the Win7 SDK), I'm attempting to compile a source file that's encoded using UTF-8 with unicode line separators. Unfortunately, the code will not compile -- even if I include the UTF-8 signature at the start of the file. For example, if I try to compile this: #include <s...

escaping characters in as3 as javascript does

I am having trouble escaping special characters in as3. trace( escape("who are ü?") ); returns who%20are%20%uFFFD%3F or trace( encodeURIComponent("who are ü?") ); returns who%20are%20%EF%BF%BD%3F while in javascript this alert( encodeURIComponent("who are ü?") ); returns who%20are%20%C3%BC%3F and alert( escape("who are ü?")...

How to check real names and surnames - PHP

Hi everybody, here's my problem: I want to check if a user insert a real name and surname by checking if they have only letters (of any alphabet) and ' or - in PHP. I've found a solution here (but I don't remember the link) on how to check if a string has only letters: preg_match('/^[\p{L} ]+$/u',$name) but I'd like to admit ' and - t...

How to write files with (readable) UTF8 characters in C?

I read a file that has utf8 characters like this: FILE *FileIN,*FileOUT; FileIN=fopen("filename","r"); char string[600]; WideChar C[600],S[100]; fgets(string,600,FileIN); wcscpy(C,UTF8Decode(string).c_bstr()); // widechar copy And it reads it perfectly (this is shown in the Editbox when running the program): Edit1->Text=C; Result ==...

PHP UTF-8 questions - If I create a string in PHP... is it in UTF-8?

In PHP, if I create a string like this: $str = "bla bla here is my string"; Will I then be able to use the mbstring functions to operate on that string as UTF8? // Will this work? $str = mb_strlen($str); Further, if I then have another string that I know is UTF-8 (say it was a POSTed form value, or a UTF-8 string from a databa...

SMTP and Unicode/UTF-8 characters...? How do I send them? base64 everything?

Using SMTP, how do you send unicode/UTF-8 e-mails? Am I expected to base64 encode the UTF-8 body and specify that in the MIME header or...? How about the headers? I'm sure there's a standard somewhere the describes this... but apparently I'm too tired/still too sick to find it... Thanks! ...