utf-8

Removing diacritic symbols from UTF8 string in C

Hi all, I am writing a C program to search a large number of UTF-8 strings in a database. Some of these strings contain English characters with didactics, such as accents, etc. The search string is entered by the user, so it will most likely not contain such characters. Is there a way (function, library, etc) which can remove these ...

STL and UTF-8 file input/output. How to do it?

I use wchar_t for internal strings and UTF-8 for storage in files. I need to use STL to input/output text to screen and also do it by using full Lithuanian charset. It's all fine because I'm not forced to do the same for files, so the following example does the job just fine:#include <io.h> #include <fcntl.h> #include <iostream> _set...

I get dual results from mysql query when using international charachters, i.e Å/Ä=A & Ö=O,

For example if I search for the name Åsa i only want to get the name Åsa and not Asa, same with Björn instead of Bjorn $query="select * from users where username like 'Björn'"; $result=mysql_query($query); $num=mysql_num_rows($result); echo"$num"; $i=0; while($i<$num){ $id=mysql_result($result,$i,"id"); $name=mysql_result($result,$...

filename encoding issue

I am getting a file with a faroese name and trying to save it in a PHP script: 2010_08_Útflutningur.xls In Ubuntu 10.04 LTS is saving it as : 2010_08_�tflutningur.xls (invalid encoding) I've installed and run utf8-migration-tool, but with no effect. Is this a ubuntu error that I can fix or I just have to give up and modify the nam...

How can I store UTF8 in MySQL with PHP, sanitize it, echo it with XML and transform it with XSLT?

I am developing a MVC application with PHP that uses XML and XSLT to print the views. It need to be fully UTF-8 supported. I also use MySQL right configured with UTF8. My problem is the next. I have a <input type="text"/> with a value like àáèéìíòóùú"><'@#~!¡¿?. This is processed to add it to the database. I use mysql_real_escape_string...

How to save text file in UTF-8 format using pdftotext

I am using pdftotext opensource tool to convert the PDF to text files. How can I save the text files in UTF-8 format so that I can retain all the accent characters in text files. I am using the below command to convert which extracts the content to text file but not able to see any accented characters. pdftotext -enc UTF-8 book1.pdf boo...

Django FileField encoding

I have a django model as follows: class ExportFile(BaseExportFile): created_timestamp = models.DateTimeField(auto_now=True, editable=False) data = models.FileField(upload_to='exports') and a view function that renders a template to create a csv file: def create_csv(request): context = Context({'data': MyModel.object...

How bad is mb_internal_encoding("UTF-8"); ??

Hi , After Answering this question http://stackoverflow.com/questions/4041968/zend-cache-after-loading-cached-data-character-encoding-seems-messed-up/4043064#4043064 I use it to change the PHP's internal encoding , its originally ISO-8859-1 , so i need to change the encoding of every none English input value , bu using it i force ...

Android: UTF-8 encoded HTTP response to String

Hi I am trying to convert a UTF-8 data to String. The UTF-8 data is obtained by HTTP connection. My problem is the converted String does not display UTF-8 Characters properly. Here is my code {extra bits removed } URLConnection urlconn = url.openConnection(); httpConn = (HttpURLConnection) urlconn; httpConn.connect(); InputStream in= ...

UTF-8 in java applet in browser

Hi! I have a problem with encoding in java applet. When I am running it in NetBeans, russian characters in applet are ok. No encoding problems. But, when I am running the same applet through browser, then my russian characters are shown as squares(encoding problem). Where is the problem? I have russian translations in .properties fil...

problem passing Japanese characters(UTF-8) via json_encode

Having problems returning a list of Japanese terms from an MSSql database as JSON. If I return them as a bunch of list items all is ok, but I can not seem to get json encode to work for me. Any pointers much appreciated. $prefs = array(); while($row = mssql_fetch_array($result)) { $prefs[] = mb_convert_encoding($row["Pref"] , "UTF-8", ...

Convert an escaped unicode String to its chars in ruby 1.8

I have to read some text files with the following content: \u201CThe Pedlar Lady of Gushing Cross\u201D In ruby 1.9 terminal, when I create a string with this content: ruby-1.9.1-p378 > "\u2714 \u2714 my great string \u2714 \u2714" => "✔ ✔ my great string ✔ ✔" In ruby 1.8, I don't get the unicode codes converted to their character...

Checking Unicode string for whitespace - byte for byte!

Quick & dirty Q: Can I safely assume that a byte of a UTF-8, UTF-16 or UTF-32 codepoint (character) will not be an ASCII whitespace character (unless the codepoint is representing one)? I'll explain: Say that I have a UTF-8 encoded string. This string contains some characters that take more than one byte to store. I need to find out ...

PHP: How to create a file encoded as "UTF-8 without BOM"

Dear programmers: As I guess, most of you know that we have the following encodings for files: ANSI UTF-8 UTF-8 is recognized by adding three chars at the beginning of the file but those chars causes some troubles in PHP Language as you know So we use UTF-8 Without BOM (Instead of UTF-8) Here is my question: How can we write a n...

Saxon and character encoding: experiences and errors

Recently I ran some tests on xslt transformations with Saxon. My main focus was file encoding and character sets. But I was interested also in impact of different Saxon versions and Java VM x86 vs. x64. The insights are not spectacular still I'd like to share them and ask for comments. On xml file encoding: In general, you have to disti...

Converting UTF-8 with C++ standard libraries (no /clr)

I have a string like this: "These are Pi (\u03a0) and Sigma (\u03a3).". How can i convert this to contain and print effective characters, using C++ standard libraries? This solution http://msdn.microsoft.com/en-us/library/system.text.encoding.utf8(VS.80).aspx, use .NET framework (/clr compiling), that i want to avoid preferring C++ stan...