Hello everyone,
Suppose I have a byte stream (array), and I want to write code (using .Net C#) to validate whether it is valid UTF-8 byte sequence or not. I want to write code from scratch because I need to report the exact location where there is invalid byte sequences and may even remove invalid bytes -- not just want to get yes or no...
Hi
We have developed a PHP-MySQL application in two languages - English and Gujarati. The Gujarati language contains symbols that need unicode UTF-8 encoding for proper display.
The application runs perfectly on my windows based localhost and on my Linux based testing server on the web.
But when I transfer the application to the clie...
Hello!
I need to convert strings from one encoding (UTF-8) to another. The problem is that in the target encoding we do not have all characters from the source encoding and libc iconv(3) function fails in such situation. What I want is to be able to perform conversion but in output string have this problematic characters been replaced w...
When using Grails 1.1 together with a MySQL the charsets of the auto-generated database tables seem to default to ISO-8859-1. I'd rather have everything stored as pure UTF-8. Is that possible?
From the auto-generated database definitions:
ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
Note the "latin1" part.
A work-around th...
Hello everyone,
I have an input file and it is very big (about 120M), and I do not want to load it into memory at once. My purpose is to check whether this file is using valid UTF-8 encoding encoded file. Any ideas to have a quick check without reading all file content into memory in the form of byte[]? Simple sample code appreciated.
...
When I start Python from Mac OS' Terminal.app, python recognises the encoding as UTF-8:
$ python3.0
Python 3.0.1 (r301:69556, May 18 2009, 16:44:01)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'UTF-8'
This works the same fo...
I am trying to read a CSV file with accented characters with Python (only French and/or Spanish characters). Based on the Python 2.5 documentation for the csvreader (http://docs.python.org/library/csv.html), I came up with the following code to read the CSV file since the csvreader supports only ASCII.
def unicode_csv_reader(unicode_csv...
I have a MySQL table with 120,000 lines stored in UTF-8 format. There is one field, product name, that contains text with many accents. I need to fill a second field with this same name after converting it to a url-friendly form (ASCII).
Since PHP doesn't directly handle UTF-8, I'm using:
$value = iconv ('UTF-8', 'ISO-8859-1', $value)...
I know of the non-standard %uxxxx scheme but that doesn't seem like a wise choice since the scheme has been rejected by the W3C.
Some interesting examples:
The heart character.
If I type this into my browser:
http://www.google.com/search?q=♥
Then copy and paste it, I see this URL
http://www.google.com/search?q=%E2%99%A5
which mak...
I have a problem where I am storing a UTF8 string in SQL Server as USC2, when I pull it out to display on a page with content-type set to UTF-8 it works fine. But I have a third party javascript component which when I pass it the string for the database it renders it as USC2. or not UTF8.
Is there a way in ASP to convert this string to ...
It seems that flex doesn't support UTF-8 input. Whenever the scanner encounter a non-ASCII char, it stops scanning as if it was an EOF.
Is there a way to force flex to eat my UTF-8 chars? I don't want it to actually match UTF-8 chars, just eat them when using the '.' pattern.
Any suggestion?
EDIT
The most simple solution would be:
...
Hello,
how can I save all files in a directory using utf-8?
There is a need to change the default file encoding in IIS to display all foreign characters correct. The problem is: all old files are saved in (different/random) encodings.
Is there a way to open (in current) and save all those files safely to UTF-8?
...
I have an html-form with several text fields.
When I try to submit not English characters (Russian in my case) server is received "unreadable" string (not questions - "???" but some strange characters).
I simplified my code to show it here:
<%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
<%@ page contentType="text/...
I'm really confused with the codecs.open function. When I do:
file = codecs.open("temp", "w", "utf-8")
file.write(codecs.BOM_UTF8)
file.close()
It gives me the error
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)
If I do:
file = open("temp", "w")
file.write(codecs.BOM_UTF8)
file.cl...
I'm trying to make a Bison parser to handle UTF-8 characters. I don't want the parser to actually interpret the Unicode character values, but I want it to parse the UTF-8 string as a sequence of bytes.
Right now, Bison generates the following code which is problematic:
if (yychar <= YYEOF)
{
yychar = yytoken = YYEOF;
...
As soon as I use accents in my text, it won't work anymore. It reports the error:
! Undefined control sequence.
<argument> R\UTF
{00E9}seau Ethernet
l.88 \section{R\UTF{00E9}seau Ethernet}
?
To explain the output a bit, I am trying to compile \section{Réseau Ethernet} in that line.
I think it has to do with the enc...
I have an old MySQL database with encoding set to UTF-8. I am using Ado.Net Entity framework to connect to it.
The string that I retrieve from it have strange characters when ë like characters are expected.
For example: "ë" is "ë".
I thought I could get this right by converting from UTF8 to UTF16.
return Encoding.Unicode.GetString(...
I am using Ubuntu 9.04
I have installed the following package versions:
unixodbc and unixodbc-dev: 2.2.11-16build3
tdsodbc: 0.82-4
libsybdb5: 0.82-4
freetds-common and freetds-dev: 0.82-4
I have configured /etc/unixodbc.ini like this:
[FreeTDS]
Description = TDS driver (Sybase/MS SQL)
Driver = /usr/lib/odbc/libt...
Suppose you have a large document with around ~7000 words. I need to send all data to server.
I have no chance to use jquery, prototype etc. It should be clean OO javascript.
Sample page would be json russian page
I will exclude all tags and html markup from words.
My question is;
1. How can i collect/harvest all (utf8) words from do...
I have some content from feeds. In these feeds, UTF-8 characters are often encoded as character references, ie "å" is "å". To avoid double encoding these in my views (ie "&#xE5;") I want to convert these back to normal UTF_8 characters. How can I do this in Ruby?
I want:
"å".convert_to_utf8 => "å"
...