The following Python code ...
html_data = urllib2.urlopen(some_url).read()
f = codecs.open(filename, 'w', encoding='utf-8')
f.write(html_data)
f.close()
... sometimes fails with UnicodeDecodeError ...
File "/.../lib/python2.6/codecs.py", line 686, in write
return self.writer.write(data)
File "/.../lib/python2.6/codecs.py", line 351...
Is there a simple regular expression to match all unicode quotes? Or does one have to hand-code it like this:
quotes = ur"[\"'\u2018\u2019\u201c\u201d]"
Thank you for reading.
Brian
...
My code is basically this:
wstring japan = L"日本";
wstring message = L"Welcome! Japan is ";
message += japan;
wprintf(message.c_str());
I'm wishing to use wide strings but I do not know how they're outputted, so I used wprintf. When I run something such as:
./widestr | hexdump
The hexidecimal codepoints create this:
65 57 63 6c 6...
Hello,
there are already a few questions relating to this problem. I think my question is a bit different because I don't have an actual problem, I'm only asking out of academic interest. I know that Windows's implementation of UTF-16 is sometimes contradictory to the Unicode standard (e.g. collation) or closer to the old UCS-2 than to ...
Python 2.6.5 is said to support Unicode? How come listdir() doesn't in IDLE, but Python 3.1.2 does show Unicode in IDLE? (this is tested on Windows 7)
The following code is the same behavior:
for dirname, dirnames, filenames in os.walk('c:\path\somewhere'):
for subdirname in dirnames:
print (os.path.join(dirname, subdirna...
When exporting some data from MS SQL Server using Python, I found out that some of my data looked like computer \xa0systems which is causing encoding errors. Using SQL Management Studio the row simply appears to be double spaced: computer systems. It seems that this is the code for : how can I query MS SQL Server within managemen...
Is there an easy way to replicate the behavior of MySQL's utf_general_ci collation in C#?
In particular, given a Unicode string, I want to generate a(n ASCII?) string that can then be trivially sorted or compared, as utf_general_ci would.
I found this question, which shows how to strip accents from strings, which looks like a similar b...
I'm using Delphi 2009. In my program, I have been working very hard to optimize all my Delphi code for speed and memory use, especially my Unicode string handling.
I have the following statement:
Result := Result + GetFirstLastName(IndiID, 1);
When I debug that line, upon return from the GetFirstLastName function, it traces into ...
I am using Delphi 2009 with Unicode strings.
I'm trying to Encode a very large file to convert it to Unicode:
var
Buffer: TBytes;
Value: string;
Value := Encoding.GetString(Buffer);
This works fine for a Buffer of 40 MB that gets doubled in size and returns Value as an 80 MB Unicode string.
When I try this with a 300 MB Buffer...
I am using Lucene Search.
I have uploaded french file with following content.
french.txt
multimédia francophone pour l'enseignement du français langue étrangère
If I search for francophone then it shows file in search result.
Now when I search for multimédia or français or étrangère word it does not show any result.
I have tried to ...
I've got the string
$result = "bei einer Temperatur, die etwa 20 bis 60°C unterhalb des Schmelzpunktes der kristallinen Modifikation"
which comes straight from a MySQL table. The table, and the php headers are both set to UTF-8
I want to strip the 'degree' symbol: http://en.wikipedia.org/wiki/Degree_symbol and replace it with the wor...
The default Unicode collation element table defines four-level weight elements for Unicode characters, where the first three levels define the essential part of the sort order and the fourth level is essentially the character code, which is used for tie-breaking.
The section on variable weighting defines the "shifted" option (the defaul...
I have a UTF8 string with combining diacritics. I want to match it with the \w regex sequence. It matches characters that have accents, but not if there is a latin character with combining diacritics.
>>> re.match("a\w\w\wz", u"aoooz", re.UNICODE)
<_sre.SRE_Match object at 0xb7788f38>
>>> print u"ao\u00F3oz"
aoóoz
>>> re.match("a\w\w\wz...
I know you can put unicode character codes in a VB.Net string like this:
str = Chr(&H0030) & "More text"
I would like to know how I can put the char code right into the string literal so I can use unicode symbols from the designer view.
Is this even possible?
...
Coming from the land of Perl, I can do something like the following to test the membership of a string in a particular unicode block:
# test if string has any katakana script characters
my $japanese = "カタカナ";
if ($japanese =~ /\p{InKatakana}/) {
print "string has katakana"
}
I've read that Python does not support unicode blocks (tr...
I have a field in my Rails model that has max length 255.
I'm importing data into it, and some times the imported data has a length > 255. I'm willing to simply chop it off so that I end up with the largest possible valid string that fits.
I originally tried to do field[0,255] in order to get this, but this will actually chop trailing ...
We're having a lot of trouble tracking down the source of \u2028 (Line Separator) in user submitted data which causes the 'unterminated string literal' error in Firefox.
As a result, we're looking at filtering it out before submitting it to the server (and then the database).
After extensive googling and reading of other people's probl...
I would like to add Unicode support to a C library I am maintaining. Currently it expects all strings to be passed in utf8 encoded. Based on feedback it seems windows usually provides 3 function versions.
fooA() ANSI encoded strings
fooW() Unicode encoded strings
foo() string encoding depends on the UNICODE define
Is there an easy w...
I am getting the very familiar:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 24: ordinal not in range(128)
I have checked out multiple posts on SO and they recommend - variable.encode('ascii', 'ignore')
however, this is not working. Even after this I am getting the same error ...
The stack trace:
'...
I use this function to read file to string
function LoadFile(const FileName: TFileName): string;
begin
with TFileStream.Create(FileName,
fmOpenRead or fmShareDenyWrite) do begin
try
SetLength(Result, Size);
Read(Pointer(Result)^, Size);
except
Result := '';
Free;
raise;
end;
Free;
...