questions about utf-8 | ansaurus

utf-8

Working with foreign symbols in python

I'm parsing a JSON feed in Python and it contains this character, causing it not to validate. Is there a way to handle these symbols? Can they be converted or is they're a tidy way to remove them? I don't even know what this symbol is called or what causes them, otherwise I would research it myself. EDIT: Stackover Flow is strippin...

Python Unicode UnicodeEncodeError

Hi, I am having issues with trying to convert an UTF-8 string to unicode. I get the error. UnicodeEncodeError: 'ascii' codec can't encode characters in position 73-75: ordinal not in range(128) I tried wrapping this in a try/except block but then google was giving me a system administrator error which was one line. Can someone sugges...

google-app-engine

Do I need to make sure output data is valid UTF-8?

Hi, I have a website that tells the output is UTF-8, but I never make sure that it is. Should I use a regular expression or Iconv library to convert UTF-8 to UTF-8 (leaving invalid sequences)? Is this a security issue if I do not do it? ...

UTF-8 Problem, no Idea

Hi, I have a strange problem with some documents on my webpage. My data is stored in a MYSQL Database, UTF8 encoded. If read the values my webbpage displays Rezept : Gem�se mal anders (Gem�selaibchen) I need ü / ü! Content in the database is "Gemüse ... " .. The raw data in my error_log looks like this [title] => Rezept...

web-development

Problems with XML encoding in perl xml Lib

Replacing the Special Character using Perl while i am doing this . I got this error . I just try to merging the 2 xml file using XML::Lib. parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xA3 0x32 0x33 0x6B �23 to c�27 . What is the issue and how to resolve this this I thought before going to XML Parser , I...

UTF8 Filenames in PHP and Different Unicode Encodings

I have a file containing Unicode characters on a server running linux. If I SSH into the server and use tab-completion to navigate to the file/folder containing unicode characters I have no problem accessing the file/folder. The problem arises when I try accessing the file via PHP (the function I was accessing the file system from was st...

How can I convert a complex binary Perl regular expression to C# or PowerShell?

Hello, This Perl binary regex found at http://www.w3.org/International/questions/qa-forms-utf-8.en.php matches UTF-8 documents without the UTF-8 BOM header: $field =~ m/\A( [\x09\x0A\x0D\x20-\x7E] # ASCII | [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte | \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs ...

Unicode problem Django-Python-URLLIB-MySQL

I am fetching a webpage (http://autoweek.com) and trying to process it but getting encoding error. Autoweek declares "iso-8859-1" encoding and has the word "Nürburgring" (u with umlaut) I do: # -*- encoding: utf-8 -*- import urllib webpage = urllib.urlopen(feed.crawl_url).read() webpage.decode("utf-8") it gives me the following err...

How to safely parse multibyte feeds in Ruby/Rails?

(Sorry if a newb question...I've done quite a bit of research, honestly...) I'm writing some Ruby on Rails code to parse RSS/ATOM feeds. My code is throwing-up on on a pesky '£' symbol. I've been trying the approach of normalizing the description and title fields of the feeds before doing anything else: descr = self.description.mb_ch...

How to exclude U+2028 from line separators in Python when reading file?

I have a file in UTF-8, where some lines contain the U+2028 Line Separator character (http://www.fileformat.info/info/unicode/char/2028/index.htm). I don't want it to be treated as a line break when I read lines from the file. Is there a way to exclude it from separators when I iterate over the file or use readlines()? (Besides reading t...

Converting Unicode code points to UTF-8

Currently I have something like this \u4eac\u90fd and I want to convert it to UTF-8 so I can insert it into a database. ...

Listings in Latex with UTF-8 (or at least german umlauts)

Trying to include a source-file into my latex document using the listings package, i got problems with german umlauts inside of the comments in the code. Using \lstset{ extendedchars=\true, inputencoding=utf8x } Umlauts in the source files (encoded in UTF-8 without BOM) are processed, but they are somehow moved to the beginning of the...

What are the limitations of primitive character types in D?

I am currently exploring the specification of the Digital Mars D language, and am having a little trouble understanding the complete nature of the primitive character types. The book Learn to Tango With D is similarly vague on the capabilities and limitations of the language in this area. The types are given on the website as: char; ...

primitive-types

Cannot get Servlet to process request content as UTF-8

I'm converting a legacy app from ISO-8859-1 to UTF-8, and I've used a number of resources to determine what I need to set to get this to work. However, after several configuration, code, and environment changes, my Servlet (in Tomcat 5) doesn't seem to process submitted HTML form content as UTF-8. Here's what I've set up for configurati...

internationalization

PHP UTF-8 encoding problem of U+009A

I have problems displaying the Unicode character of U+009A. It should look like "š", but instead looks like a rectangular block with the numbers 009A inside. Converting it to the entity "" displays the character correctly, but I don't want to store entities in the database. The encoding of the webpage is in UTF-8. The character...

Alternative XML parser for ElementTree to ease UTF-8 woes?

I am parsing some XML with the elementtree.parse() function. It works, except for some utf-8 characters(single byte character above 128). I see that the default parser is XMLTreeBuilder which is based on expat. Is there an alternative parser that I can use that may be less strict and allow utf-8 characters? This is the error I'm gett...

Encode String with UTF-8 in GWT

Is there a way to encode a String with UTF-8 in GWT? In other words, is there a GWT-compatible equivalent to java.net.URLEncoder.encode(toEncode, "UTF-8")? ...

Emacs, xterm, mousepad, C, Unicode and UTF-8: Trying to make sense of it all

Disclaimer: My apologies for all the text below (for a single simple question), but I sincerely think that every bit of information is relevant to the question. I'd be happy to learn otherwise. I can only hope that, if successful, the question(s) and the answers may help others in Unicode madness. Here goes. I have read all the usually ...

multipart PHP mail - accentuations problems![SOLVED]

I'm trying to send a multipart/alternative MIME e-mail via PHP script ... all works fine but I've some problems with the encoding! The accentuated characters, in the e-mail body, are displayed wrongly in the mail client! How can encode the body to solve this problem? ... I've tried to use .. utf8_encode($body) Without good results! I...

How to force XPath to use UTF8?

I have an XHTML document being passed to a PHP app via Greasemonkey AJAX. The PHP app uses UTF8. If I output the POST content straight back to a textarea in the AJAX receiving div, everything is still properly encoded in UTF8. When I try to parse using XPath $dom = new DOMDocument(); $dom->loadHTML($raw2); $xpath = new DOMXPath($dom); ...

1
...
13
14
15
16
17
...
69