questions about utf-8

How can I substitute Unicode characters with ASCII in Perl?

I can do it in vim like so: :%s/\%u2013/-/g How do I do the equivalent in Perl? I thought this would do it but it doesn't seem to be working: perl -i -pe 's/\x{2013}/-/g' my.dat ...

perl

vim

unicode

utf-8

Change File Encoding to utf-8 via vim in a script

Hi, i just got knocked down after our server has been updated from Debian 4 to 5. We switched to UTF-8 environment and now we have problems getting the text printed correctly on the browser, because all files are in non-utf8 encodings like iso-8859-1, ascii, etc. I tried many different scripts. The first one i tried is "iconv". That o...

Is it possible to reliably auto-decode user files to Unicode? [C#]

I have a web application that allows users to upload their content for processing. The processing engine expects UTF8 (and I'm composing XML from multiple users' files), so I need to ensure that I can properly decode the uploaded files. Since I'd be surprised if any of my users knew their files even were encoded, I have very little hop...

Encoding Unicode URLs in UTF-8 XHTML documents

I would like to include the URI http://beispiel.de/schnäppchen into a link in a XHTML document, which is encoded in UTF-8. Should I percent-encode the URL and write <a href="http://beispiel.de/schn%C3%A4ppchen">foobar</a> ? "ä" is a legal character in UTF-8 and therefore should be legal in XML/XHTML, no? ...

url

xhtml

utf-8

UTF-8, CString and CFile? (C++, MFC)

Hello! I'm currently working on a MFC program that specifically has to work with UTF-8. At some point, I have to write UTF-8 data into a file; to do that, I'm using CFiles and CStrings. When I get to write utf-8 (russian characters, to be more precise) data into a file, the output looks like Ðàñïå÷àòàíî: Ñèñòåìà Ïðîèçâîäñòâî and et...

Are there delimiter bytes for UTF8 characters?

If I have a byte array that contains UTF8 content, how would I go about parsing it? Are there delimiter bytes that I can split off to get each character? ...

c++

unicode

utf-8

Is MySQL converting UTF-8 characters in my query (not the results), stripping accents?

I've some records in a DB where one of the VARCHAR fields may contain accented letters. If I do the following query using the CLI MySQL client I get 1 row returned, which is correct: SELECT site_id, site_name FROM tbl_site WHERE site_name LIKE '%ém%' However, using some PHP (PDO) to do the same query returns all the rows that contain ...

PostgreSQL: Log query only on error

I'm getting the error message: "Invalid byte sequence for encoding "UTF8": 0x9f Ok, now I know somewhere my php app is trying to query using that 0x9f character. But I have no idea WHERE. I checked postgresql.conf but I didn't find anything like "log_on_error". There's only the log_statement parameter which causes postgres to log all s...

ï»¿ Appears at the beginning of my utf-8 text file when view as ANSI

I have a text file and it uses utf-8, but when the users view it in ANSI unknown characters appear at the very beginning. I am using C#. Thanks. ...

c#

utf-8

ansi

How do I save a file as UTF-8 from Perl?

I'm trying to create/save HTML files in Perl in UTF-8, but nothing I have done so far works. A previous answer here on SO said to use binmode, so I tried that. Here is my code: open (OUT, ">$sectionfilename"); binmode(OUT, ":utf8"); print OUT $section; close OUT; When I open these files in a text editor like Notepad they are still in ...

perl

utf-8

How can I know in Javascript if character is part of alphabet (not just English alphabet)?

I need to analyze pressed key if it is alphabet (for all languages) in UTF-8 encoding. Is that possbile in anyway? ...

javascript

utf-8

alphabet

Parse XML with special characters (UTF-8)

I'm starting out with some XML that looks like this (simplified): <?xml version="1.0" encoding="UTF-8"?> <alldata> <data name="Forsetì" /> </alldata> </xml> But after I've parsed it with simplexml_load_string the special character (the i) becomes: Ã¬ which is obviously pretty mangled. Is there a way to prevent this from happening?...

ruby string encoding

So, I'm trying to do some screen scraping off of a certain site using nokogiri, but the site owners failed to specify the proper encoding of the page in a <meta> tag. The upshot of this is that I'm trying to deal with strings that think they're utf-8, but really aren't. (If you care, here are the files I was using to test this: main ...

Choosing a W3C valid DOCTYPE and charset combination?

I have a homepage with the following: <DOCTYPE html> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> My choice of the DOCTYPE "html" is based on a recommendation for html pages using jQuery. My choice of charset=utf=8 is based on a recommendation to make my pages readable on most browsers. But these choices may ...

jquery

utf-8

doctype

Problem with greek url characters in IE

Hi all, I'm using the following script in my website in order to create pagination "next-previous" functionality. It's a actually a Dreamweaver's code. The script uses the url to get some values and then it re-creates it. The result url in IE7 and IE8 contains non-readable characters and at the end the page does not work properly. ...

Dealing with eacute and other special characters using Oracle, PHP and Oci8

Hi I am trying to store names into an Oracle database and fetch them back using PHP and oci8. However, if I insert the é directly into the Oracle database and use oci8 to fetch it back I just receive an e Do I have to encode all special characters (including é) into html entities (ie: é) before inserting into database ... or am ...

Dealing with UTF-8 numbers in Python

Hi, I have read many similar questions, apologies if this is considered a duplicate. Suppose I am reading a file containing 3 comma separated numbers. The file was saved with with an unknown encoding, so far I am dealing with ANSI and UTF-8. If the file was in UTF-8 and it had 1 row with values 115,113,12 then: with open(file) as f: ...

UTF-8 to ISO-8859-1 mapping / lossless conversion libraries in Java

I need to perform a conversion of characters from UTF-8 to ISO-8859-1 in Java without losing for example all of the UTF-8 specific punctuation. Ideally would like these to be converted to equivalents in ISO (e.g. there are probably 5 different single quotes in UTF-8 and would like them all converted to ISO single quote character). Str...

Python: Convert Unicode to ASCII without errors

html = urllib.urlopen(link).read() html.encode("utf8","ignore") self.response.out.write(html) Traceback (most recent call last): File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/__init__.py", line 507, in __call__ ha...

Can't find out where my Ruby 1.9 string encoding is getting messed up.

Somewhere along the line from the DB to the application, this: sauté is getting turned into this: sautÃ© I'm using Ramaze + Rack + MySQL. I've got a force_encoding plugin set up, so the encoding on the string is UTF-8. If I view the record in the database shell, it's looks fine. The default charset on the table is utf8, and the fie...