ansaurus

Question

English site on Japanese language Operating System

Answer 1

+1 A:

Well, you could check for non-'english' characters easily enough (Regular expression I suppose), but I don't see why you would. But you could do that.

I also don't really ever see a good reason not to use NVARCHAR for user-supplied text fields. Requirements often change.

Noon Silk 2009-08-21 02:57:38

Answer 2

+1 A:

It's always easier to build multibyte charset support into an application from the beginning rather than retrofit it in later.

In addition to having to revisit all the code, you'll end up with errors converting your existing database to unicode, and you may find out that there's no good way to determine what character set a given piece of data was actually encoded in in the first place.

Richard Pistole 2009-08-21 03:06:27

Answer 3

+2 A:

I don't think there are any strong reasons not to use UTF-8. You never know where strange characters may leak in.

Any incoming data should be processed and re-encoded. With html forms you can supply the following tag:

<input type="hidden" name="_charset_" value="" />

All browsers should populate this with the charset the user is using, you can then use this to decode/re-encode the input.

Also, if you haven't read it, read Joel's post on Unicode: http://www.joelonsoftware.com/articles/Unicode.html

monkut 2009-08-21 03:15:07

Answer 4

+3 A:

Japanese fonts and input methods have "two" versions of the 'english' characters in Unicode - the normal width and the 'wide/monospaced' ones (which are useful when printed top-to-bottom versus left-to-right). Be careful how you attempt to 'filter out' non-english characters - if you raise an error for example #2 below your users will be very confused!

1) correctly encoded

2) ｃｏｒｒｅｃｔｌｙ　ｅｎｃｏｄｅｄ

The second line is NOT a different font or 'encoding' - they are additional fixed-width copies of our alphabet that align nicely within blocks of hiragana/katakana/kanji (Japanese writing).

I would definitely consider UTF8 encoding and NCHAR/NVARCHAR in the database.

CraigD 2009-08-21 05:17:21

ansaurus

tags:

views:

answers:

English site on Japanese language Operating System

related questions