questions about unicode | ansaurus

unicode

PHP detecting filesystem encoding

Hi guys, I need to save files with non-latin filenames on a filesytem, using PHP. I want to make this work cross-platform. How do I know what encoding I can use to write the file? I understand many modern filesystems are UTF-8 based (is this correct?), but I doubt Windows XP is (for instance). So, is there a robust detection mechanism...

DjangoUnicodeDecodeError while storing pickle'd data.

I've got a simple dict object I'm trying to store in the database after it has been run through pickle. It seems that Django doesn't like trying to encode this error. I've checked with MySQL, and the query isn't even getting there before it is throwing the error, so I don't believe that is the problem. The dict I'm storing looks like ...

Does Perl's Net::Cassandra module support UTF-8?

I've run into a really strange UTF-8 problem with Net::Cassandra::Easy (which is built upon Net::Cassandra): UTF-8 strings written to Cassandra are garbled upon retrieval. The following code shows the problem: use strict; use utf8; use warnings; use Net::Cassandra::Easy; binmode(STDOUT, ":utf8"); my $key = "some_key"; my $column = "s...

Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI

I'm working on a english only C++ program for Windows where we were told "always use std::wstring", but it seems like nobody on the team really has much of an understanding beyond that. I already read the question titled "std::wstring VS std::string. It was very helpful, but I still don't quite understand how to apply all of that infor...

java: can I convert strings to byte arrays, without a BOM?

Suppose I have this code: String encoding = "UTF-16"; String text = "[Hello StackOverflow]"; byte[] message= text.getBytes(encoding); If I display the byte array in message, the result is: 0000 FE FF 00 5B 00 48 00 65 00 6C 00 6C 00 6F 00 20 ...[.H.e.l.l.o. 0010 00 53 00 74 00 61 00 63 00 6B 00 4F 00 76 00 65 .S.t.a.c....

help me with xor encryption in c#

I wrote this code in c# to encrypt a text with a key : using System; using System.Linq; using System.Collections.Generic; using System.Text; namespace ENCRYPT { class XORENC { private static int Bin2Dec(string num) { int _num = 0; for (int i = 0; i < num.Length; i++) { ...

two-way-encryption

Java Unicode encoding

A Java char is 2 bytes (max size of 65,536) but there are 95,221 Unicode characters. Does this mean that you can't handle certain Unicode characters in a Java application? Does this boil down to what character encoding you are using? ...

character-encoding

Understanding character encoding in typical Java web app

Some pseudocode: String a = "A bunch of text"; //UTF-16 saveTextInDb(a); //Write to Oracle VARCHAR(15) column String b = readTextFromDb(); //UTF-16 out.write(b); //Write to http response When you save the Java String (UTF-16) to Oracle VARCHAR(15) does Oracle also store this as UTF-16? Does the length of an Oracle VARCHAR refer to nu...

character-encoding

Regex Not Matching Unicode

How would I go about using Regex to match Unicode strings? I'm loading in a couple keywords from a text file and using them with Regex on another file. The keywords both contain unicode (such as á, etc). I'm not sure where the problem is. Is there some option I have to set? Code: foreach (string currWord in _keywordList) { Match...

Why do code files containing unicode string constants saved as UTF8 + BOM display correctly but when saved as UTF8 they do not in classic ASP?

I have a code file which I will refer to as "myConstants.res.asp" with a bunch of constants in both English and French... <% const myStr1 = "Bienvenue dans ma maison au moment de cette belle journée de repos et de détente" const myStr2 = "Welcome to my house at this beautiful day of rest and relaxation" ... more constants ... %>...

how to write unicode hello world in C on windows

im tyring to get this to work: #define UNICODE #define _UNICODE #include <wchar.h> int main() { wprintf(L"Hello World!\n"); wprintf(L"£안, 蠀, ☃!\n"); return 0; } using visual studio 2008 express (on windows xp, if it matters). when i run this from the command prompt (started as cmd /u which is supposed to enable unicode ?...

visual-studio-2008

PHP: Cyrillic characters not displayed correctly

Recently I switched hosting from one provider to the other and I have problems displaying Cyrillic characters. The characters which are read from the database are displayed correctly, but characters which are hardcoded in the php file aren't (they are displayed as question marks). The files which contain the php source code are saved in...

Unicode tooltips not showing up.

Hi, I am trying to display unicode tooltips in my application window, however they do not seem to display. Non-unicode text shows up correctly but as soon as I try doing unicode no tooltip shows up. The following is what I am currently doing, any help is appreciated thank you. HWND parentHwnd = pickInfo->getViewer().getCachedHwnd...

ajax(search suggest) funny character problem

ajax(search suggest), if input funny character(like Ô) and submit it, "?" is displayed in *.asp. ( response.write (request.form("str"))) i am using xmlhttp.open("post", "*****.asp", true); xmlhttp.setRequestHeader('Content-type','application/x-www-form-urlencoded; charset=UTF-8'); xmlhttp.send("str="+escape($("str").value)); and the...

Why is python decode replacing more than the invalid bytes from an encoded string?

Trying to decode an invalid encoded utf-8 html page gives different results in python, firefox and chrome. The invalid encoded fragment from test page looks like 'PREFIX\xe3\xabSUFFIX' >>> fragment = 'PREFIX\xe3\xabSUFFIX' >>> fragment.decode('utf-8', 'strict') ... UnicodeDecodeError: 'utf8' codec can't decode bytes in position 6-8: in...

screen-scraping

Python unicode issues (2.6)

I'm currently working on a irc bot for a multi-lingual channel, and I'm encountering some issues with unicode which are proving nearly impossible to solve. No matter what configuration of unicode encoding I seem to try, the list function which the below code sits within just flat out does nothing (c.notice is a class function which sen...

How to concatenate two unicode characters in DotNet and not have any space?

When I concatenate the following two unicode characters I see both but there is a space between them. Is there anyway to get rid of this space? StringBuilder sb = new StringBuilder(); int characterCode; characterCode = Convert.ToInt32("2758", 16); sb.Append((char)characterCode); characterCode = Convert.ToInt32("25c4", 16); sb.App...

string-concatenation

How can I copy files with names containing spaces and UNICODE, when using a shell script?

I have a list of files that I'm trying to copy and move (using cp and mv) in a bash shell script. The problem that I'm running into, is that I can't get either command to recognize a huge number of files, seemingly because the filenames contain spaces and/or unicode characters. I couldn't find any switches to decode/re-encode these cha...

\w in PHP preg_replace covers only second byte of UTF-8 chars

we have this code: $value = preg_replace("/[^\w]/", '', $value); where $value is in utf-8. After this transformation first byte of multibyte characters is stripped. How to make \w cover UTF-8 chars completely? Sorry, i am not very well in PHP ...

Forcing a mixed ISO-8859-1 and UTF-8 multi-line string into UTF-8 in Perl

Consider the following problem: A multi-line string $junk contains some lines which are encoded in UTF-8 and some in ISO-8859-1. I don't know a priori which lines are in which encoding, so heuristics will be needed. I want to turn $junk into pure UTF-8 with proper re-encoding of the ISO-8859-1 lines. Also, in the event of errors in th...

character-encoding

text-processing

1
...
63
64
65
66
67
...
104