questions about unicode | ansaurus

unicode

Delphi SMTP component that supports UTF-8 or Unicode

It appears the Indy 10 SMTP component shipped with Delphi 2009 do not properly support unicode in the subject and body. Does anyone know of a good alternative, or has anyone made the necessary changes to Indy10 to solve this issue? Update: Thanks for all the answers, I've done a bit of investigation, and thought I might be able to so...

UTF8, UTF16, and UTF32

What are the differences between UTF8, UTF16, and UTF32. I understand that all 3 will store Unicode, and that how it stores the chars is different, but is there an advantage to choosing one over the other? ...

Delphi 2009 RawByteString vagaries

Suppose that for some perverse reason you want to display the raw byte contents of a UTF8String. var utf8Str : UTF8String; begin utf8Str := '€ąćęłńóśźż'; end; (1) This doesn't do, it displays the readable form: memo1.Lines.Add( RawByteString( utf8Str )); // output: '€ąćęłńóśźż' (2) This, however, does "work" - note the conc...

python 3.0, how to make print() output unicode?

I'm working in WinXP 5.1.2600, writing a Python application involving Chinese pinyin, which has involved me in endless Unicode problems. Switching to Python 3.0 has solved many of them. But the print() function for console output is not Unicode-aware for some odd reason. Here's a teeny program. print('sys.stdout encoding is "' + sys.std...

Is there an STL and UTF-8 friendly C++ Wrapper for ICU, or other powerful Unicode library

I need a good Unicode library for C++. I need: Transformations in a Unicode sensitive way. For example sort all strings in a case insensitive way and get their first characters for index. Convert various Unicode strings to upper and to lower case. Split text at a reasonable position -- words that would work for Chinese and Japanese as ...

Create SecureString from unmanaged unicode string

I am wanting to try to tie the CryptUnprotectData windows API function and the .net SecureString together the best way possible. CryptUnprotectData returns a DATA_BLOB structure consisting of an array of bytes and a byte length. In my program this will be a Unicode UTF-16 string. SecureString has a constructor which takes a char* and ...

How do I read Unicode-16 strings from a file using POSIX methods in Linux?

I have a file containing UNICODE-16 strings that I would like to read into a Linux program. The strings were written raw from Windows' internal WCHAR format. (Does Windows always use UTF-16? e.g. in Japanese versions) I believe that I can read them using raw reads and the converting with wcstombs_l. However, I cannot figure what locale ...

Unix vs. Windows rendering of characters

I have a text file that display differently when opening it in FreeBSD vs. Windows. On FreeBSD: AnÂ·lisis e InvestigaciÃ›n On Windows: Análisis e Investigación The windows representation is obviously right. Any ideas on how to get that result in bsd? ...

character-encoding

What is the best way to remove accents in a python unicode string?

I have a unicode string in python, and I would like to remove all the accents (diacritics). I found on the Web an elegant way to do this in Java: convert the unicode string to its long normalized form (with a separate character for letters and diacritics) remove all the characters whose unicode type is "diacritic". Do I need to inst...

How do I read UTF-8 with diamond operator (<>)?

I want to read UTF-8 input in Perl, no matter if it comes from the standard input or from a file, using the diamond operator: while(<>){...}. So my script should be callable in these two ways, as usual, giving the same output: ./script.pl utf8.txt cat utf8.txt | ./script.pl But the outputs differ! Only the second call (using cat) see...

script to save file as unicode

Do you know any way that I could programmatically or via scrirpt transform a set of text files saved in ansi character encoding, to unicode encoding? I would like to do the same as I do when I open the file with notepad and choose to save it as an unicode file. ...

Tool to convert code source from a codepage to UTF-8?

I'm working on an open source project. The original project contains comments in russian and is using codepage 1251. I'm using codepage 1252 and the russian comments aren't displayed correctly in Visual Studio Express 2008, not nice but anyway I can't read russian. Someone using codepage 950 (traditional chinese) tried to compile the pro...

C programming: How to program for Unicode?

What prerequisites are needed to do strict Unicode programming? Does this imply that my code should not use char types anywhere and that functions need to be used that can deal with wint_t and wchar_t? And what is the role played by multibyte character sequences in this scenario? ...

character-encoding

Displaying unicode text in Rave Reports on Delphi 2009

I am in the process of porting a Delphi 2006 app to Delphi 2009. Out of the box support for unicode has been easy - almost no work required. Most 3rd party controls already have Delphi 2009 updates available. Rave Reports (latest version 7.6.1, available here) has also been updated, but I cannot seem to get it to correctly display RTF t...

What is Codepage 0?

I'm using the Delphi function StringCodePage I call it on a string returned by a COM function (Acrobat Annotation getContents - see my other posts) and it returns 0. What is 0? Ansi? Thanks ...

Looking for a PDF file parser.

Does anyone know of a PDF file parser that I could use to pull out sections of text from the plaintext pdf file? Specifially I want a way to be able to reliably pull out the section of text specific to annotations? Delphi, C# RegEx I dont mind. ...

How do you reference unicode characters in Coldfusion regex?

I'm trying to match this character ’ which I can type with alt-0146. Word tells me its unicode 0x2019 but I can't seem to match it using regular expressions in coldfusion. Here's a snippet i'm using to match between 2 and 10 letters and apostrophes and this character [[:alpha:]'\x2019]{2,10} but it's not working. Any ideas? ...

Can CSS choose a different default font and size depending on Language

I have the following CSS fragment: INPUT{ font-family: Raavi; font-size: 14px;} Which works fine when the textbox contains some Punjabi script like this: ਪੰਜਾਬੀ But the user might enter English instead, and I would rather use the Verdana font with a different size, since the English letters in the Raavi font are real funky and the si...

Unicode Basics on Windows

Hello, I have a C++ library which I deliver to other developers. One of them needs i18n, so he asked me if I could add L prefix to the strings in the API. I don't know much about i18n so I have some basic questions: When I compile my lib with Unicode, can other developers use this build as usual ? Or shall developers also change the...

Fixing Unicode Byte Sequences

Sometimes when copying stuff into PostgreSQL I get errors that there's invalid byte sequences. Is there an easy way using either vim or other utilities to detect byte sequences that cause errors such as: invalid invalid byte sequence for encoding "UTF8": 0xde70 and whatnot, and possibly and easy way to do a conversion? Edit: What my w...

1
...
11
12
13
14
15
...
104