utf-8

Unexpected output on file include

I've been working on a custom CMS in PHP and haven't run into any problems until recently. When testing, I've noticed that the string  has started appearing at the top of only the index page. Testing with some die statements throughout the code, it seems the output is coming between a file include. File A <?php if (!defined('IN_CM...

Which dbExpress ServerCharSet do I need for utf8 data in MySQL 5?

We tried ServerCharSet=utf8 and ServerCharSet=UTF8, based on information found in newsgroups - but still, special characters / Umlauts do not appear correctly in the client data. We use Delphi 2009 and the built-in dbExpress driver. Field data is retrieved using AsWideString. ...

Character/URI encoding in JavaScript getting out of sync?

I have a question about encoding special/extended UTF-8 characters in URLs in JavaScript. The same question applies to many characters like the Registered R-circle, but my example uses an umlaut: ü = %C3%BC in UTF-8 (four rows from bottom of http://www.utf8-chartable.de/) If the url contains an umlaut represented as UTF-8 (ü = %C3%BC),...

What encoding scheme should be used in a web project?

We are building a (Java) web project with Eclipse. By default Eclipse uses Cp1252 encoding on Windows machines (which we use). As we also have developers in China (in addition to Europe), I started to wonder if that is really the encoding to use. My initial thought was to convert to UTF-8, because "it supports all the character sets". ...

Convert utf-8 std::string to std::wstring on iPhone

Hi, I have a UTF-8 string (created an std::string from a byte array) I understand that the encoding means that the size()/length() won't give me the actual number of glyphs if the text is chinese for instance... I understand that in order to get the unicode character code of each glyph I need to convert it to wstring (or any UTF>8 repres...

Tomcat server file download problem with encoding

I am sending a response using the following code: response.setHeader("Content-Encoding","UTF-8"); response.setContentType("text/plain charset=UTF-8"); PrintWriter outWriter = response.getWriter(); String returnString = new String(dataController.translateFile(documentBuffer).getBytes(), "UTF-8"); outWriter.print(returnString); When I r...

Utf-8 with sqlalchemy on a database with init connect

I am trying to use sqlalchemy to connect with mysql database. I have set up charset=utf-8$use_unicode=0. This worked with almost all databases, but not with a particular one. I believe it is because it has 'init-connect' variable set to 'SET NAMES latin2;' I have no privileges to change that. It works for me if I send explicit query SET...

Flex and full UTF-8 Support?

Hi, Doing some software review for a RIA project - I was hoping to use Flex but need to make sure it has full UTF-8 support - I'm talking all fonts for all languages - everything from English, to Finish, to Russian, to Japanese to Thai to Sanskrit... I haven't worked with Flash/Flex/ActionScript in years - but I seem to remember it's ...

Java: Detect non-displayable chars for a given Character Encoding

Hello! I'm currently working on an application to validate and parse CSV-files. The CSV files have to be encoded in UTF-8, although sometimes we get files in a false encoding. The CSV-files most likely contain special characters of the German alphabet (Ä, Ö, Ü, ß) as most of the texts within the CSV file are in German language. For the ...

Why haven't ASCII and ISO-8859-1 encoding been relegated to history?

It seems to me if UTF-8 was the only encoding used everywhere ever, there would be a lot less issues with code: Don't even need to think about encoding issues. No issues with mixed 1-2-byte character streaming, because everything uses 2 bytes. Browsers don't need to wait for the <meta> tag specifying encoding before they can do anythin...

UTF-8 not working in HTML forms

I have this form: <form method="post" enctype="multipart/form-data" accept-charset="UTF-8"> But when I submit an é character, it turns it into é. Why doesn't this work? Yes, the MySQL database has all the character-sets set up correctly. (Database, tables.) If I manually put it in the database with Navicat it shows up fine on the we...

PHP function iconv character encoding from iso-8859-1 to utf-8

I'm trying to convert a string from iso-8859-1 to utf-8. But when I find these two charachter € and • the function returns a charachter that is a square with two number inside. How can I solve this issue? ...

regular expression to detect numbers written as words - UTF-8 input

Hi all, thanks for the answers to : "regular expression to detect numbers written as words" : http://stackoverflow.com/questions/3608159/regular-expression-to-detect-numbers-written-as-words I now have this working, however I have the same requirement but the numbers as words are in Arabic (or any other UTF-8) and not English, so : i...

How to compare UTF-8 strings in Javascript?

When I wrote in JavaScript "Ł" > "Z" it returns true. In Unicode order it should be of course false. How to fix this? My site is using UTF-8. ...

Can I set CharSet for every page load? (Classic ASP)

I have made some changes to a Classic ASP application which breaks foreign letters unless "Response.Charset = "utf-8"" is set in every page... And it's a lot of pages... Could I force the Charset to utf-8 for every page without having to set it in each page? ...

Strange UTF8 string comparison

Hello guys, I'm having this problem with UTF8 string comparison which I really have no idea about and it starts to give me headache. Please help me out. Basically I have this string from a xml document encoded in UTF8: 'Mina Tidigare anställningar' And when I compare that string with the exactly the same string which I typed myself: 'Mi...

is form charset required?

Hi, My website is set to UTF-8, do i have to set my forms also to utf-8 using the accept-charset for forms? My guestbook for example allows multilanguage so my guestbook database table is utf8_unicode_ci and all my webpages use the same template so the encoding for all pages is utf-8, because i set the charset for my webpages as utf-8...

mb_strlen() is it enough ?

Hi, When counting the length of a utf-8 string in php i use mb-strlen() , example: if (mb_strlen($name, 'UTF-8') < 3) { $error .= 'Name is required. Minimum of 3 characters required'; } As the text fields can accept any language (multilanguage) i want to make sure that php will count mutltilanguage utf-8 characters correctly. ...

MySQL uft8_unicode_ci

Hi, I want to make sure something is write. My database tables are utf8_unicode_ci and my site encoding and header is utf-8 etc and so on. I done a test and in my guestbook i entered this: á ʵßăāÇϢϞﻨ☺ ▓ ▓ẻ ▓ẻṎ ۞ ݤ Now great it displays on the webpage like it should and i tested other languages to etc, but on checking this in p...

Varchar for UTF-8 ?

Hi, I found a similar post about this but still not sure. As i am making my guestbook and so forth multilanguage i changed the collation to uft8_unicode_ci in mysql, everything works as it should, something that i did not think of was the type i use, my guestbook is multilanguage and for the name field a user cannot enter more than 50 ...