utf-8

How can I substitute Unicode characters with ASCII in Perl?

I can do it in vim like so: :%s/\%u2013/-/g How do I do the equivalent in Perl? I thought this would do it but it doesn't seem to be working: perl -i -pe 's/\x{2013}/-/g' my.dat ...

Change File Encoding to utf-8 via vim in a script

Hi, i just got knocked down after our server has been updated from Debian 4 to 5. We switched to UTF-8 environment and now we have problems getting the text printed correctly on the browser, because all files are in non-utf8 encodings like iso-8859-1, ascii, etc. I tried many different scripts. The first one i tried is "iconv". That o...

Is it possible to reliably auto-decode user files to Unicode? [C#]

I have a web application that allows users to upload their content for processing. The processing engine expects UTF8 (and I'm composing XML from multiple users' files), so I need to ensure that I can properly decode the uploaded files. Since I'd be surprised if any of my users knew their files even were encoded, I have very little hop...

Encoding Unicode URLs in UTF-8 XHTML documents

I would like to include the URI http://beispiel.de/schnäppchen into a link in a XHTML document, which is encoded in UTF-8. Should I percent-encode the URL and write <a href="http://beispiel.de/schn%C3%A4ppchen"&gt;foobar&lt;/a&gt; ? "ä" is a legal character in UTF-8 and therefore should be legal in XML/XHTML, no? ...

UTF-8, CString and CFile? (C++, MFC)

Hello! I'm currently working on a MFC program that specifically has to work with UTF-8. At some point, I have to write UTF-8 data into a file; to do that, I'm using CFiles and CStrings. When I get to write utf-8 (russian characters, to be more precise) data into a file, the output looks like Ðàñïå÷àòàíî: Ñèñòåìà Ïðîèçâîäñòâî and et...

Are there delimiter bytes for UTF8 characters?

If I have a byte array that contains UTF8 content, how would I go about parsing it? Are there delimiter bytes that I can split off to get each character? ...

Is MySQL converting UTF-8 characters in my query (not the results), stripping accents?

I've some records in a DB where one of the VARCHAR fields may contain accented letters. If I do the following query using the CLI MySQL client I get 1 row returned, which is correct: SELECT site_id, site_name FROM tbl_site WHERE site_name LIKE '%ém%' However, using some PHP (PDO) to do the same query returns all the rows that contain ...

PostgreSQL: Log query only on error

I'm getting the error message: "Invalid byte sequence for encoding "UTF8": 0x9f Ok, now I know somewhere my php app is trying to query using that 0x9f character. But I have no idea WHERE. I checked postgresql.conf but I didn't find anything like "log_on_error". There's only the log_statement parameter which causes postgres to log all s...

 Appears at the beginning of my utf-8 text file when view as ANSI

I have a text file and it uses utf-8, but when the users view it in ANSI unknown characters appear at the very beginning. I am using C#. Thanks. ...

How do I save a file as UTF-8 from Perl?

I'm trying to create/save HTML files in Perl in UTF-8, but nothing I have done so far works. A previous answer here on SO said to use binmode, so I tried that. Here is my code: open (OUT, ">$sectionfilename"); binmode(OUT, ":utf8"); print OUT $section; close OUT; When I open these files in a text editor like Notepad they are still in ...

How can I know in Javascript if character is part of alphabet (not just English alphabet)?

I need to analyze pressed key if it is alphabet (for all languages) in UTF-8 encoding. Is that possbile in anyway? ...

Parse XML with special characters (UTF-8)

I'm starting out with some XML that looks like this (simplified): <?xml version="1.0" encoding="UTF-8"?> <alldata> <data name="Forsetì" /> </alldata> </xml> But after I've parsed it with simplexml_load_string the special character (the i) becomes: ì which is obviously pretty mangled. Is there a way to prevent this from happening?...

ruby string encoding

So, I'm trying to do some screen scraping off of a certain site using nokogiri, but the site owners failed to specify the proper encoding of the page in a <meta> tag. The upshot of this is that I'm trying to deal with strings that think they're utf-8, but really aren't. (If you care, here are the files I was using to test this: main ...

Choosing a W3C valid DOCTYPE and charset combination?

I have a homepage with the following: <DOCTYPE html> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> My choice of the DOCTYPE "html" is based on a recommendation for html pages using jQuery. My choice of charset=utf=8 is based on a recommendation to make my pages readable on most browsers. But these choices may ...

Problem with greek url characters in IE

Hi all, I'm using the following script in my website in order to create pagination "next-previous" functionality. It's a actually a Dreamweaver's code. The script uses the url to get some values and then it re-creates it. The result url in IE7 and IE8 contains non-readable characters and at the end the page does not work properly. ...

Dealing with eacute and other special characters using Oracle, PHP and Oci8

Hi I am trying to store names into an Oracle database and fetch them back using PHP and oci8. However, if I insert the é directly into the Oracle database and use oci8 to fetch it back I just receive an e Do I have to encode all special characters (including é) into html entities (ie: &eacute;) before inserting into database ... or am ...

Dealing with UTF-8 numbers in Python

Hi, I have read many similar questions, apologies if this is considered a duplicate. Suppose I am reading a file containing 3 comma separated numbers. The file was saved with with an unknown encoding, so far I am dealing with ANSI and UTF-8. If the file was in UTF-8 and it had 1 row with values 115,113,12 then: with open(file) as f: ...

UTF-8 to ISO-8859-1 mapping / lossless conversion libraries in Java

I need to perform a conversion of characters from UTF-8 to ISO-8859-1 in Java without losing for example all of the UTF-8 specific punctuation. Ideally would like these to be converted to equivalents in ISO (e.g. there are probably 5 different single quotes in UTF-8 and would like them all converted to ISO single quote character). Str...

Python: Convert Unicode to ASCII without errors

html = urllib.urlopen(link).read() html.encode("utf8","ignore") self.response.out.write(html) Traceback (most recent call last): File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/__init__.py", line 507, in __call__ ha...

Can't find out where my Ruby 1.9 string encoding is getting messed up.

Somewhere along the line from the DB to the application, this: sauté is getting turned into this: sauté I'm using Ramaze + Rack + MySQL. I've got a force_encoding plugin set up, so the encoding on the string is UTF-8. If I view the record in the database shell, it's looks fine. The default charset on the table is utf8, and the fie...