I need to convert a bunch of files to utf-8 in Python, and I have trouble with the "converting the file" part.
I'd like to do the equivalent of:
iconv -t utf-8 $file > converted/$file # this is shell code
Thanks!
...
I'm implementing a blog with tags with some French characters. My question has to do with how to deal with spaces and unicode (utf-8) characters in the url.
let's say I have a tag called: ohlàlà! and I have the following code in my tag cloud:
<%= link_to h(tag.name.capitalize), { :controller => :blog, :action => :tag, :id => h(tag.name...
Recently, someone asked about an algorithm for reversing a string in place in C. Most of the proposed solutions had troubles when dealing with non single-byte strings. So, I was wondering what could be a good algorithm for dealing specifically with utf-8 strings.
I came up with some code, which I'm posting as an answer, but I'd be glad ...
I know this is not a "real" programming question. But, it relates to programming so I am going to set it anyway. I have a program that I need to test that reads the Byte Order Marker of the file to see if it is utf-8 or utf-16. My problem is I cannot find a program/text editor that will allow me to set the byte order marker. Can anyb...
One of the responses to a question I asked yesterday suggested that I should make sure my database can handle UTF-8 characters correctly. Anyone know how I can do this with MySQL?
Thanks!
Ben
...
I want to detect and replace the Malformed UTF-8 characters with blank space using Perl script while loading the data using SQL*Loader. How to do?
...
I am writing a small app which I need to test with utf-8 characters of different number of byte lengths.
I can input unicode characters to test that are encoded in utf-8 with 1,2 and 3 bytes just fine by doing, for example:
string in = "pi = \u3a0";
But how do I get a unicode character that is encoded with 4-bytes? I have tried:
str...
For debugging purposes, I need to recursively search a directory for all files which start with a UTF-8 byte order mark (BOM). My current solution is a simple shell script:
find -type f |
while read file
do
if [ "`head -c 3 -- "$file"`" == $'\xef\xbb\xbf' ]
then
echo "found BOM in: $file"
fi
done
Or, if you prefer s...
I am looking for a (simple) text editor that can handle text in different encodings in the same document.
I need to develop some sites with mixed Japanese and English text and the editors I have now (on an English Windows system) are unable to display the Japanese text.
Jedit files don't display the Japanese text I have inputted but whe...
I am attempting to start a new Wordpress blog. I am seeing funny characters in some browsers but not others instead of single quotes, double quotes and ellipses. Things I already thought of:
The HTML template page for output
itself is set to UTF-8
The admin page is UTF-8
The MySQL database tables where the
data is stored are UTF-8 en...
I have an ActiveRecord model, Foo, which has a name field. I'd like users to be able to search by name, but I'd like the search to ignore case and any accents. Thus, I'm also storing a canonical_name field against which to search:
class Foo
validates_presence_of :name
before_validate :set_canonical_name
private
def set_cano...
I have a procedure that imports a binary file containing some strings. The strings can contain extended ASCII, e.g. CHR(224), 'à'. The procedure is taking a RAW and converting the BCD bytes into characters in a string one by one.
The problem is that the extended ASCII characters are getting lost. I suspect this is due to their values me...
At work, I'm beginning to have some issues with character encoding. I'd like to make our web app use UTF-8 all the way around. After a few hours of googling, I've only found a few sites with information on a UTF-8 LAMP setup. Does anyone know of any good resources online about UTF-8, Linux, Apache, MySql and PHP? I'll post what I've foun...
All the PHP files in my workspace are encoded in Unicode (UTF-8, no BOM). I often duplicate an existing source file to use as a base for a new script. Invariably (with Path Finder or the original Finder), OS X will convert the encoding of the duplicate file to Western (Mac OS Roman).
Is there any way to make OS X behave and not convert ...
What's the simplest way to convert a Unicode codepoint into a UTF-8 byte sequence in C? The only way that springs to mind is using iconv to map from the UTF-32LE codepage to UTF-8, but that seems like overkill.
...
My client has an old MS SQL 2000 database that uses varchar(50) fields to store names. He tried to use this database to capture some data (via a web form). Some of the form-fillers are from other countries, and the varchar fields went nutty when some of these folks entered their names. Is it possible to recover the data somehow? Maybe by...
Hi,
Is there any way to make Visual Studio 2008 Express store all the files as UTF-8 by default?
Thanks for your time.
Best regards.
...
I'm sending an email using the dotnet framework. Here is the template that I'm using to create the message:
Date of Hire: %HireDate%
Annual Salary: %AnnualIncome%
Reason for Request: %ReasonForRequest%
Name of Voluntary Employee: %FirstName% %LastName%
Total Coverage Applied For: %EECoverageAmount%
Guaranteed Coverage Portion: %GICove...
I'm using Server.HtmlEncode on a utf-8 string in asp-classic, which works fine until there are some accents in the string e.g. Rüstü Recber, which appears as Rüstü Recber (Rüstü Recber in the source).
I've tried setting the Response.Charset property to utf-8 but this doesn't make any difference.
...
In RoR,how to validate a Chinese or a Japanese word for a posting form with utf8 code.
In GBK code, it uses [\u4e00-\u9fa5]+ to validate Chinese words.
In Php, it uses /^[\x{4e00}-\x{9fa5}]+$/u for utf-8 pages.
...