unicode

Working with Japanese filenames in PHP 5.3 and Windows Vista?

I'm currently trying to write a simple script that looks in a folder, and returns a list of all the file names in an RSS feed. However I've hit a major wall... Whenever I try to read filenames with Japanese characters in them, it shows them as ?'s. I've tried the solutions mentioned here: http://stackoverflow.com/questions/482342/php-rea...

Where can I find an array of the (un)assigned Unicode code points for a particular block?

At the moment, I'm writing these arrays by hand. For example, the Miscellaneous Mathematical Symbols-A block has an entry in hash like this: my %symbols = ( ... miscellaneous_mathematical_symbols_a => [(0x27C0..0x27CA), 0x27CC, (0x27D0..0x27EF)], ... ) The simpler, 'continuous' array miscellaneous_mathematical_sy...

python read utf8 text file problem

I have a problem with python about reading and print utf8 text file. I have a test.txt in utf8 encoding without BOM. This file has two characters in it: 大声 The first character "大" is Chinese and the second "声" is Japanese. Now, When I use Ulipad (a python editor) to run the following code to read the txt file, and print these two cha...

Filtering illegal XML characters in Java

XML spec defines a subset of Unicode characters which are allowed in XML documents: http://www.w3.org/TR/REC-xml/#charsets. How do I filter out these characters from a String in Java? simple test case: Assert.equals("", filterIllegalXML(""+Character.valueOf((char) 2))) ...

wchar to char in c++

I have a Windows CE console application that's entry point looks like this int _tmain(int argc, _TCHAR* argv[]) I want to check the contents of argv[1] for "-s" convert argv[2] into an integer. I am having trouble narrowing the arguments or accessing them to test. I initially tried the following with little success if (argv[1] ==...

Excel 2007 and Unicode

I have an israeli spreadsheet reading right to left. When I read the values (using VBA) it places a question mark (?) at the beginning and end of the text, in other words it wraps the text with the question mark (ie ?0123456?). If you type Range("A2").value or .value2 or .text the results are the same. Any idea on how to prevent this? ...

The confusion on python encoding

I retrieved the data encoded in big5 from database,and I want to send the data as email of html content, the code is like this: html += """<tr><td>""" html += unicode(rs[0], 'big5') # rs[0] is data encoded in big5 I run the script, but the error raised: UnicodeDecodeError: 'ascii' codec can't decode byte...... However, I tried...

i18n / Markdown - Does Markdown support internationalization?

I'm building a CMS which needs to manage content in english, chinese, and spanish at a minimum. Do most markdown implementations handle UTF-8 encoded text? Is the Markdown language designed to be used with non-english languages? I'm currently using Markdown Extra by Michel Fortin. ...

Using SHIFT_JIS text in PHP

I am building a form that needs to accept characters encoded in SHIFT_JIS and then send those results via email to a recipient. I've tried to simply capture the results from the $_POST variable and then to insert them into a block of text like this: $NameJp = $_POST['NameJp']; $contents = <<<TEST Name: $NameJp ... TEST but that does...

Unicode for displaying mathematical operations

How can I display a squared sign in unicode (I checked the Unicode reference and it is not there)? Also, is it possible to use unicode to display a fraction, for example 3/4 would look as it should with the horizontal vinculum? ...

IIS 6.0 Server and Unicode Characters

We are performing a pen test on a simple asp application that uses MS SQL Database. It seems for the authentication they are using dynamic constructed queries but escaping single qoutes. When we use Unicode quotes like %uFFO7,%u02b9 etc we are able to successfully inject SQL injections. Want to understand is it more a kind of configura...

TSQL Prefixing String Literal on Insert - Any Value to This, or Redundant?

I just inherited a project that has code similar to the following (rather simple) example: DECLARE @Demo TABLE ( Quantity INT, Symbol NVARCHAR(10) ) INSERT INTO @Demo (Quantity, Symbol) SELECT 127, N'IBM' My interest is with the N before the string literal. I understand that the prefix N is to specify encoding (in this case,...

Python unicode problem

I'm receiving some data from a ZODB (Zope Object Database). I receive a mybrains object. Then I do: o = mybrains.getObject() and I receive a "Person" object in my project. Then, I can do b = o.name and doing print b on my class I get: José Carlos and print b.name.__class__ <type 'unicode'> I have a lot of "Person" objects. T...

Why the creators of Windows and Linux systems chose different ways to support Unicode?

As far as I know Linux chose backward compatibility of UTF-8, whereas Windows added completely new API functions for UTF-16 (ending with "W"). Could these decisions be different? Which one proved better? ...

If I use Unicode on a ISO-8859-1 site, how will that be interpreted by a browser?

So I got a site that uses ISO-8859-1 encoding and I can't change that. I want to be sure that the content I enter into the web app on the site gets parsed correctly. The parser works on a character by character basis. I also cannot change the parser, I am just writing files for it to handle. The content in my file I am telling the ap...

How to decode Unicode escape sequences like "\u00ed" to proper UTF-8 encoded characters?

Is there a function in PHP that can decode Unicode escape sequences like "\u00ed" to "í" and all other similar occurrences? I found similar question here but is doesn't seem to work. ...

Is there any reason to prefer UTF-16 over UTF-8?

Examining the attributes of UTF-16 and UTF-8, I can't find any reason to prefer UTF-16. However, checking out Java and C#, it looks like strings and chars there default to UTF-16. I was thinking that it might be for historic reasons, or perhaps for performance reasons, but couldn't find any information. Anyone knows why these languages...

Convert char array to UNICODE in MFC C++

I'm using the folowing code to read files from a folder in windows. However since this a MFC application I have to convert the char array to UNICODE. For example if I hard code the path as "C:\images3\test\" as shown below the code works. WIN32_FIND_DATA FindFileData; HANDLE hFind = INVALID_HANDLE_VALUE; hFind = FindFirstFile(_T(...

trouble with boost::filesystem::wrecursive_directory_iterator

I'm trying to write a program to help me manage my iTunes library, including removing duplicates and cataloging certain things. At this point I'm still just trying to get it to walk through all the folders, and have run into a problem: I have a small amount of Japanese music, where the artist and/or album is written in Japanese characte...

A UnicodeDecodeError that occurs with json in python on Windows, but not Mac.

On windows, I have the following problem: >>> string = "Don´t Forget To Breathe" >>> import json,os,codecs >>> f = codecs.open("C:\\temp.txt","w","UTF-8") >>> json.dump(string,f) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python26\lib\json\__init__.py", line 180, in dump for chunk in iterable...