unicode

Specification of source charset encoding in MSVC++, like gcc "-finput-charset=CharSet"

Hello, I want to create some sample programs that deal with encodings, specifically I want to use wide strings like: wstring a=L"grüßen"; wstring b=L"שלום עולם!"; wstring c=L"中文"; Because these are example programs. This is absolutely trivial with gcc that treats source code as UTF-8 encoded text. But,straightforward compilation do...

Is it possible to use a Unicode "argv"?

Hi, I'm writing a little wrapper for an application that uses files as arguments. The wrapper needs to be in Unicode, so I'm using wchar_t for the characters and strings I have. Now I find myself in a problem, I need to have the arguments of the program in a array of wchar_t's and in a wchar_t string. Is it possible? I'm defining the m...

Generate C++ code for BNF grammar

I have looked at the following software tools: Ragel ANTLR BNF Converter Boost::Spirit Coco/R YACC ANTLR seems the most straight-forward, however its documentation is lacking. Ragel looks possible, too, but I do not see an easy way to convert BNF into its syntax. What other tools are available that can take BNF input and generate a ...

Are there Latin chars in the Korean Hangul unicode ranges of a fixed size?

Many Japanese fonts have a special fixed-width variant of the standard ASCII latin characters that are half as wide as the font's standard fixed-width for Kanji/Kana characters. This allows you to vertically line up Latin and Japnaese text by simply using 2 Latin chars per Japanese character. This is called something like "half-width ...

Show a character's Unicode codepoint value in Eclipse

I have a UTF-8 text file open in Eclipse, and I'd like to find out what a particular Unicode character is. Is there a function to display the Unicode codepoint of the character under the cursor? ...

C++ Unicode Encryption Library Required (Or is it?)

I need to encryption several pieces of text in a file along side unencrypted text in the same file. All the data is Unicode text. In all the encryption libraries I have looked at Crypto++ Botan Etc... None of them "appear" to provide Unicode aware methods for encrypting / decrypting data E.G. data can be passed in/out using char, stri...

How do I detect unicode characters in a Java string?

Suppose I have a string that contains Ü. How would I find all those unicode characters? Should I test for their code? How would I do that? For example, given the string "AÜXÜ", I'd like to transform it to "AYXY". I'd like to do the same for other unicode characters, and I would hate to have to store them in a translation map of some sor...

Emacs 23 uses character set four times larger than Unicode - why?

From Emacs 23.1 NEWS: *** The Emacs character set is now a superset of Unicode. (It has about four times the code space, which should be plenty). And more details later on: *** In multibyte buffers and strings, characters are represented by UTF-8 byte sequences. The character code space is now 0x0..0x3FFFFF with no g...

How to enforce unicode arguments for methods?

I have a model class with getter and setter methods, and the occasional static methods. I would like to enforce the usage of unicode strings as arguments for specific methods and using decorators was the first idea I had. Now I have something like this: import types class require_unicode(object): def __init__(self, function): ...

In Unicode, why are there two representations for the Arabic digits?

I was reading the specification of Unicode @ Wikipedia link text and I see that each of the Arabic digits has 2 Unicode codepoints. For example 1 is defined as U+0661 and as U+06F1 Which one should I use? ...

What is Perl's "standard string comparison order"?

This is really a double question, my two end goals having answers to: What is the standard string comparison order, in terms of the mechanics? What's a better name for that so I can update the docs? Perl's documentation for sort says that without a block, sort uses "standard string comparison order". But what is that order? There sho...

Is There An Efficient Whole Word Search Function in Delphi?

In Delphi 2009 or later (Unicode), are there any built-in functions or small routines written somewhere that will do a reasonably efficient whole word search where you provide the delimiters that define the word, e.g.: function ContainsWord(Word, Str: string): boolean; const { Delim holds the delimiters that are on either side of the ...

What is a good library for creating PDFs in Delphi 2010?

What is a good library for creating PDFs in Delphi 2010? Pre Unicode I used PowerPDF, which though obsolete, was flexible enough to do what I wanted to do (very customized non-db/table based reports) I currently have PowerPDF compiling in Delphi 2010, but not yet working, and I'd rather not port and debug if there are any good Open Sou...

Is the Unicode prefix N still needed in SQL Compact Edition?

At least in previous versions of SQL Server, you had to prefix Unicode string constants with an "N" to make them be treated as Unicode. Thus, select foo from bar where fizz = N'buzz' (See "Server-Side Programming with Unicode" for SQL Server 2005 "from the horse's mouth" documentation.) We have an application that is using SQL Compac...

What do these PHP mbstring settings do?

I'm trying to figure out exactly what these php.ini settings do. What happens when they're set to different values? When are they necessary? When are they harmful? mbstring.language mbstring.http_input mbstring.http_output mbstring.encoding_translation As usual, the PHP manual is less than helpful. EDIT: Just to clarify, I understan...

Displaying unicode symbols in HTML

I want to simply display the tick (✔) and cross (✘) symbols in a HTML page but it shows up as either a box or goop ✔ - obviously something to do with the encoding. I have set the meta tag to show utf-8 but obviously I'm missing something. <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> Edit/Solution: From comm...

How are the new unicode domains going to be handled by email regexes?

Since In October 2009, the Internet Corporation for Assigned Names and Numbers (ICANN) approved the creation of country code top-level domains (ccTLDs) in the Internet that use the IDNA standard for native language scripts. I'm pretty sure that the standard regexes most sites currently use won't mark these as valid, or ...

C Strings Library

Is there a C strings library for C (not C++) that implements an abstraction over char * and wchar_t * strings? The requirements are: to be BSD/MIT/CDDL licenced implements some kind of reference count mechanism has support for regular expressions has Unicode support Thanks, ...

problems with Indic fonts rending in C# .NET Application

I'm trying to display Telugu font string in a C# application. When the text is displayed using a rich textbox or textbox with font set to "Gautami" (one of the fonts that support Telugu language) the characters are broken. A single letter that is supposed to be displayed on screen is broken in to two characters. What could be the proble...

SSIS package having problem with datename(dw,datum) converting to varchar

I am moving data into a DW using SQL Server SSIS and have the following SQL to populate one dimension SELECT DISTINCT cast (datename(dw,datum) as varchar(10)) as veckodag FROM XXXXX.dbo.Bought as I have VARCHAR in the target column I need to CAST/CONVERT Question how to convert Unicode to Varchar? I get *> Validation error. Datu...