unicode

What is unicode character 2028 (LS / Line Separator) used for?

I was thinking to myself that the line breaking problem must be somewhat solved by someone, but maybe not widely adopted. Being forward thinking, I went to search to see if there was a platform independent unicode method to separate lines. In my search I found unicode character 2028. Then, I found Jeff Atwoods post on this topic where he...

Perl - Unicode::String sub need to add/convert for Latin-9 support

Part 3 (Part 2 is here) (Part 1 is here) Here is the perl Mod I'm using: Unicode::String How I'm calling it: print "Euro: "; print unicode_encode("€")."\n"; print "Pound: "; print unicode_encode("£")."\n"; would like it to return this format: € # Euro £ # Pound The function is below: sub unicode_encode { shift...

How can I open files containing accents in Java?

(editing for clarification and adding some code) Hello, We have a requirement to parse data sent from users all over the world. Our Linux systems have a default locale of en_US.UTF-8. However, we often receive files with diacritical marks in their names such as "special_á_ã_è_characters.doc". Though the OS can deal with these files f...

Displaying Unicode/ASCII Characters on console or window

I couldn't display a 'bullet' of character code DEC 149 which can be found on ASCII Chart. cout << char(149) << endl; it comes out as ò on console window. I know a few characters from charmap that I'd like to use but how would i know their character codes? ...

printing unicode through a QProcess

I'm having some trouble handling unicode output from a QProcess. When I run the following example I get ?? instead of 中文. Can anyone tell me how to get the unicode output? from PyQt4.QtCore import * def on_ready_stdout(): byte_array = proc.readAllStandardOutput() print 'byte_array: ', byte_array print 'unicode: ', unicode...

Reading Japanese filenames in windows, using Python and glob not working

I just setup PortablePython on my system, so I can run python scripts from PHP and I got some very basic code (Below) to list all the files in a directory, however it doesn't work with Japanese filenames. It works fine with English filenames, but it spits out errors (Below) when I put any file containing Japanese characters in the direct...

Dealing with multi-language directories (Python)

I'm trying to open a file and I just realized that py is having trouble with my username (It's in Russian). Any suggestions on how to properly decode/encode this to make idle happy? I'm using py 2.6.5 xmlfile = open(u"D:\\Users\\Эрик\\Downloads\\temp.xml", "r") Traceback (most recent call last): File "<pyshell#23>", line 1, in <modu...

Read and output possible unicode torrent contents in C++?

I'm trying to write a simple C++ program to open a torrent file (Passed through argv[1]), read all of it, and then print the entire file's contents verbatim with no alterations, it has to print a carbon copy of the original torrent. The issue is, some of the torrents may contain Japanese, Russian, etc. (FIlenames, description, etc.)... A...

Where can I find this unicode character?

I'm looking to find a unicode character that looks like ≪ or ≫ but rotated 90º and 270º to use in a GUI to signify something can be dragged vertically. Does anybody know of a character like this? ...

Limits to Swing's Unicode support

Not long ago I asked a question attempting to identify a certain unicode character for use in a GUI. I got the character I was looking for, but it didn't work in the Swing GUI I was building. So, SO Community, I pose of you these questions: What sort of limitations does Swing/Java have for Unicode support? Are there certain subsets o...

Convert UTF-16 to UTF-8

I am current using VC++ 2008 MFC. Due to PostgreSQL doesn't support UTF-16 (Encoding used by Windows for Unicode), I need to convert string from UTF-16 to UTF-8, before store it. Here is my code snippet. // demo.cpp : Defines the entry point for the console application. // #include "stdafx.h" #include "demo.h" #include "Utils.h" #incl...

swprintf chokes on characters outside 8-bit range

This happens on OS X, though I suspect it applies to any UNIX-y OS. I have two strings that look like this: const wchar_t *test1 = (const wchar_t *)"\x44\x00\x00\x00\x73\x00\x00\x00\x00\x00\x00\x00"; const wchar_t *test2 = (const wchar_t *)"\x44\x00\x00\x00\x19\x20\x00\x00\x73\x00\x00\x00\x00\x00\x00\x00"; In the debugger, test1 look...

Python not opening Japanese filenames

I've been working on a python script to open up a file with a unicode name (Japanese mostly) and save to a randomly generated (Non-unicode) filename in Windows Vista 64-bit, and I'm having issues... It just doesn't work, it works fine with non-unicode filenames (Even if it has unicode content), but the second you try to pass a unicode fi...

setw : Alignment for UTF-8 text file

Hello, all the while, I am using setw for my ANSI text file alignment. Recently, I want to support UTF-8 in my text file. I found out that setw no longer work. #include <windows.h> #include <iostream> // For StringCchLengthW. #include <Strsafe.h> #include <fstream> #include <iomanip> #include <string> #include <cassert> std::string wst...

string to unicode in c#

Hai i want to use unicode in my code.my unicode value is 0100, am adding my unicode string \u with my value. when i use string myVal="\u0100" its working., but when i use like below its not working the value is looking like "\\u1000"; how to resolve this. i want to use like below one.because the unicode value may vary sometimes. string...

scanning binary files in antlr3

I would like to parse a binary file and specify the characters in hex format instead of unicode, is this possible? For instance: rule: '\x7F' ; Instead of: rule: '\u007F' ; Since I do not understand how unicode maps to one byte. ...

How can I check if a Python unicode string contains non-Western letters?

I have a Python Unicode string. I want to make sure it only contains letters from the Roman alphabet (A through Z), as well as letters commonly found in European alphabets, such as ß, ü, ø, é, à, and î. It should not contain characters from other alphabets (Chinese, Japanese, Korean, Arabic, Cyrillic, Hebrew, etc.). What's the best way t...

Does IIS 5.0 Require Unique Configuration Settings To Support UTF-8?

[Note: I can only reproduce this issue with a Win2k web server running IIS 5.0. I can't reproduce this issue with a Windows XP web server (localhost) running IIS 5.1.] I've uncovered a lot of information pertinent to UTF-8 encoding. If I've learned one thing, it's this. EDIT: MSDN offered that for IIS 5.0 and earlier, Response.CodePag...

validate email addresses with IDN in php

How to validate email id having special characters (i.e. unicode/IDN)? I tried filter_var("[email protected]", FILTER_VALIDATE_EMAIL) but that's not working out properly. ...

Oracle-->SQL - forced conversion from non-unicode to unicode?

I have an ETL that is importing tables from Oracle to SQL 2008 using the OLEDB FastLoad. The data in Oracle is non-unicode. When the table is created in SQL it is created with unicode datatypes. For some reason the datatypes are being forced from non-unicode to unicode. Do any of you know of a way to stop this from happening? Possibly a ...