utf-8

why my foreign language(malayalam) characters are stored as html characters in database

hello, in my web site, using google language api , i type malayalam language in text box and text area , ഇതു ഒരു നല്ല സിനിമ ആണ് like this, but when i look in to the mySQL database, in the table, it is ഇതു ഒരു നല്ല സിനിമ ആണ&#...

Best Type for UTF-8 data?

What is the best type, in C++, for storing UTF-8 string? I'd like to avoid rolling my own class if possible. My original thought was std::string -- however, this uses char as the underlying type. char may be unsigned or signed - it varies. On my system, it's signed. UTF-8 code units, however, are unsigned octets. This seems to indicate ...

Input string compressed as string

Hi, I want to compress/transform a string as new string. i.e.: input string: USERNAME/REGISTERID output string after compress: <some-string-in-UTF8-format> output string after decompress: USERNAME/REGISTERID There are some compress or hash method for this transformation? I prefer some solution using Java or an algorithm with ...

Right-to-Left Email

Hi I'm trying to generate email from my code that will read correctly for people using right-to-left-reading languages such as Arabic. My question is: what are my options for acheiving this? I am aware that I can create a multipart email and encode the message body as "text/html", then specify a text direction in the <html> tag (e.g. <...

Encoding UTF8 string to ISO-8859-1 String (VB.NET)

Hi I need to convert UTF8 string to ISO-8859-1 string using VB.NET. Any example? Thanks in advance. ...

Utf-8 in c++: quick & dirty tricks

Hi, I am aware that there are been various questions about utf-8, mainly about libraries to manipulate utf-8 'string' like objects. However, I am working on an 'internationalized' project (a website, of which I code a c++ backend... don't ask) where even if we deal with utf-8 we don't acutally need such libraries. Most of the times the...

How to test webservice for unicode handling

Are there test tools available to test if a webservice can handle unicode utf-8 encoded posts? How do I generate utf-8 encoded data? ...

Coding in UTF-8 problem

I am using notepad++ for php coding. I don't have any problem with format set up using Encode in ANSI. However when I use Encode in UTF-8, either I have a strange character at the top or not showing anything. Q1. Am I supposed to use ANSI? Q2. Why do I am not able to display anything when I use UTF-8 My sourse code for the header is ...

Can seek and tell work with UTF-8 encoded documents in Python?

I have an application that generates some large log files > 500MB. I have written some utilities in Python that allows me to quickly browse the log file and find data of interest. But I now get some datasets where the file is too big to load it all into memory. I thus want to scan the document once, build an index and then only load th...

Croatian diacritic signs in MySQL db (utf-8)

So, symbols belows display title should be displayed that way. UTF-8 entities are listed below HTML (utf-8) title (here is list: LINK) And last line shows what is stored in my database. Collation of db table is utf8_unicode_ci. I suppose that symbols in db shouldn't be as they are in my case? They are displaying correctly on page when ...

Ensuring valid utf-8 in PHP

Hello, I'm using PHP to handle text from a variety of sources. I don't anticipate it will be anything other than UTF-8, ISO-8859-1, or perhaps WINDOWS-1252. If it's anything other than one of those, I just need to make sure the text gets turned into a valid UTF-8 string, even if characters are lost. Does the //TRANSLIT option of icon...

How do I use filesystem functions in PHP, using UTF-8 strings?

I can't use mkdir to create folders with UTF-8 characters. <?php $dir_name = "Depósito"; mkdir($dir_name ); ?> But, when I browse this folder in Windows Explorer, the folder name looks like this: Depósito What should I do? ...

How to deal with query parameter's encoding?

I assumed that any data being sent to my parameter strings would be utf-8, since that is what my whole site uses throughout. Lo-and-behold I was wrong. For this example has the character ä in utf-8 in the document (from the query string) but proceeds to send a B\xe4ule (which is either ISO-8859-1 or windows 1252) when you click submit. ...

How to convert XML file in UTF-8 using Groovy builder StreamingMarkupBuilder

Hi, Even if the question subject seems complicated, the issue is quite simple. I create an XML file with the following script: def xmlFile = new File("file-${System.currentTimeMillis()}.xml") mb = new groovy.xml.StreamingMarkupBuilder() mb.encoding = "UTF-8" new FileWriter(exportXmlFile) << mb.bind { mkp.xmlDeclaration() out <...

problem saving to mysql database php mysqli

Hi all i'm trying to save data to database and i get an error i never saw before i have a hunch it has something to do with the db collation but I'm not sure whats wrong, here is the query: $query1 = "INSERT INTO scape.url (url,normalizedurl,service,idinservice) VALUES (url, normalizedurl, 4, 45454)"; $query = "INSERT INTO ...

Fixing Unicode Oops

It seems that we have managed to insert into our database 2 unicode characters for each of the unicode characters we want, For example, for the unicde char 0x3CBC, we've inserted the unicode equivalents for each of it's components (0xC383 AND 0xC2BC) Can anyone think of a simple solution for fixing this? I've come up with something li...

Getting UTF-8 data from MySQL to the Linux C++ application

I have a big troubles with display of UTF-8 data retrieved from the MySQL to the Linux-based C++ application. UTF text is shown as question marks. The application uses the MySQL C API. So I passed the UTF-8 option after mysql_init() and before mysql_real_connect(): mysql_options(&mysql, MYSQL_SET_CHARSET_NAME, 'utf8'); and mysql_op...

Please help me trace how charsets are handled every step of the way

We all know how easy character sets are on the web, yet every time you think you got it right, a foreign charset bites you in the butt. So I'd like to trace the steps of what happens in a fictional scenario I will describe below. I'm going to try and put down my understanding as well as possible but my question is for you folks to correc...

What's the appropriate Unicode character to flag users on the website?

I run a quiz-like website at slagalica.tv (content is not in English). We often have users that try to cheat the system, so we flag those accounts and they get special treatment. Now I'd like to add some character beside their name to be visible everywhere across the website, so that everyone knows those accounts are flagged. I'm curren...

How does UTF-8 "variable-width encoding" work?

The unicode standard has enough code-points in it that you need 4 bytes to store them all. That's what the UTF-32 encoding does. Yet the UTF-8 encoding somehow squeezes these into much smaller spaces by using something called "variable-width encoding". In fact, it manages to represent the first 127 characters of US-ASCII in just one...