utf-8

How can I compile LaTeX in UTF8?

I did my document in an ISO-standard. It does not support umlaut alphabets, such as ä and ö. I need them. The document gets compiled without UTF8, but not with UTF8. More precisely, the document does not get compiled with the line at the beginning of my main.tex: \usepackage[utf8]{inputenc} How can I compile my LaTeX document in UTF8?...

How can I determine the storing standards, such as UTF-8, for files?

The question arises from the reply. How can I change the storing from an ISO-standard to UTF-8? Some details: I used Mac with some ISO-standard. I formatted it, so I cannot know the exact ISO-standard. Now, I use Ubuntu, and I try to switch my Mac-latex-files from the ISO to UTF-8. ...

3 byte UTF-8 String replacement in .NET (Convert 3-byte UTF-8 to String or Char)

I have a UTF-8 encoding string I am getting from reading a PDF, and I am trying to strip out some characters that represent spaces but are not encoded as the standard 0x20 space. My problem is that the characters are represented by 3-bytes of UTF-8 and I can't figure out how to get that into a string or character so I can do a replace. T...

UTF-8 or ISO-8859-1 in XML

We have an application this takes a text string entered by a user into a web form and packages it in XML. Just to confuse matters a little, the XML is send as the body of on Outlook email message. Because the users can paste almost anything into the web form (typically from Word), the text string can contain non-ASCII (7 bit) characters...

PHP: UTF 8 characters encoding

Hey I am scraping a list of RSS feeds by using cURL, and then I am reading and parsing the RSS data with SimpleXML. The sorted data is then inserted into a mySQL database. However, as notice on http://dansays.co.uk/research/MNA/rss.php I am having several issues with characters not displaying correctly. Examples: ‘Guitar Hero: Van ...

Is UTF-8 acceptable for reading/writing Asian languages?

I am accepting user input via a web form (as UTF-8), saving it to a MySQL DB (using UTF-8 character set) and generating a text file later (encoded as UTF-8). I am wondering if there is any chance of text corruption using UTF-8 instead of something like UCS-2? Is UTF-8 good enough in this situation? ...

Allowed characters in submit forms (including UTF-8)

Hi, Suppose I allow my users to submit a form containing some text fields (I'm not talking about passwords). My users would occasionally use non-ASCII characters like Russian, Chinese, etc. so I use UTF-8 charsets in my database. The question is, should I really allow all of the possible UTF-8 characters? I had a look at the ASCII table...

€ char is shown as ? in UTF8 Output

I have reworked a website and now it is xhtml valid etc and using UTF8. Everything is fine, but if anywhere in the Database is a Euro-char it is just displayed as a questionmark. What would be the right way to fix this? As output is done by Typo3 i cant change much about that. ...

Java UTF-8 strange behaviour

Hello, I am trying to decode some UTF-8 strings in Java. These strings contain some combining unicode characters, such as CC 88 (combining diaresis). The character sequence seems ok, according to http://www.fileformat.info/info/unicode/char/0308/index.htm But the output after conversion to String is invalid. Any idea ? byte[] utf8 = {...

Problem encoding UTF8 data from Rails app to Mysql

I'm having trouble saving UTF8 data in a form and having it correctly saved in mysql. In particular, via my ruby application I'm post a form that includes the following: Gerhard Tröster Which in my terminal I see is being updated in the database as: UPDATE `xxxx` SET `updated_at` = '2009-08-13 14:22:33', `description` = '<p><s...

Problem with libxml character enconding on win32

While parsing some html files with libxml the function xmlParseFile() returns that the code includes non UTF-8 characters How can i modify the default charset of the library to ISO-8859-1 ? Is there any other way to solve this ? PS: The entire development is based on libxml and works in most cases so I can't switch to another library. ...

Converting UTF-8 to ISO-8859-1 in Java

I am reading an XML document (UTF-8) and ultimately displaying the content on a Web page using ISO-8859-1. As expected, there are a few characters are not displayed correctly, such as “, – and ’ (they display as ?). Is it possible to convert these characters from UTF-8 to ISO-8859-1? Here is a snippet of code I have written to attempt...

UTF-8 validation in PHP without using preg_match()

Hi, I need to validate some user input that is encoded in UTF-8. Many have recommended using the following code: preg_match('/\A( [\x09\x0A\x0D\x20-\x7E] | [\xC2-\xDF][\x80-\xBF] | \xE0[\xA0-\xBF][\x80-\xBF] | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} | \xED[\x80-\x9F][\x80-\xBF] | \xF0[\x90-\xBF][\x80-\xBF]{2} | [\xF...

Unicode characters in MySQL returning different character code values in PHP and ASP

Hi there, I have a MYSQL database which needs to be accessed by both PHP and MySQL scripts, this works fine in most cases, but some "special" characters e.g. double quotes, apostrophes don't display correctly in the ASP scripts. E.g the MySQL database is from a Drupal installation and contains a table with a field containing the text...

utf8 problem with swedish characters from command line

I have a script that gets a string from the database, splits it into words and writes the words to the database. It works perfectly when i call the script via http (using apache web server). It also works to run it from a windows command line. However, when i try to run it from the command line (shell) in ubuntu all swedish chars ÅÄÖ is ...

Example invalid utf8 string?

I'm testing how some of my code handles bad data, and I need a few series of bytes that are invalid utf8. Can you post some, and ideally, an explanation of why they are bad/where you got them? Thanks! ...

Apache FOP: Displaying UTF-8 Characters in PDF (without embed?)

Hi, I'm trying to use FOP to export a PDF with UTF-8 characters, preferably without needing to embed the font. The following code: <fo:block font="10pt Helvetica" text-align="justify" space-after="10pt" space-before="8pt" keep-with-previous="auto" keep-together.within-page="auto"> <fo:block font-weight="bold" color="gray">Summary</fo...

utf-8 decoding problem in php

I got a .vcf file with parts encoded as UTF-8: CATEGORIES;CHARSET=UTF-8:Straße & –dienste Now "–" should be a "-" and "Straße" should convert to "Straße". I tried utf8_decode() iconv() mb_convert_encoding() And have been playing with several output encoding options like header('content-type: text/html; charset=utf-8'); mb...

What's a good heuristic to see if a set of bytes are encoded as UTF-8 in Java?

I have a byte stream that may be UTF-8 data or it may be a binary image. I should be able to make an educated guess about which one it is by inspecting the first 100 bytes or so. However, I haven't figured out exactly how to do this in Java. I've tried doing things like the following: new String( bytes, "UTF-8").substring(0,100).matc...

English site on Japanese language Operating System

A browser base application which intends to show data in English and capture data in English need to have a UTF-8 database? Is there any problem if the site is accessed on a Japanese language Operating System? If user types only in English do we need to take any extra care? If user types in Japanese then how system can detect and throw ...