questions about encoding | ansaurus

encoding

win32 ruby1.9 regexp and cyrillic string

#coding: utf-8 str2 = "asdfМикимаус" p str2.encoding #<Encoding:UTF-8> p str2.scan /\p{Cyrillic}/ #found all cyrillic charachters str2.gsub!(/\w/u,'') #removes only latin characters puts str2 The question is why \w ignore cyrillic characters? I have installed latest ruby package from http://rubyinstaller.org/. Here is my output of r...

Mysql german accents not-sensitive search in full-text searches

Let`s have a example hotels table: CREATE TABLE `hotels` ( `HotelNo` varchar(4) character set latin1 NOT NULL default '0000', `Hotel` varchar(80) character set latin1 NOT NULL default '', `City` varchar(100) character set latin1 default NULL, `CityFR` varchar(100) character set latin1 default NULL, `Region` varchar(50) charact...

full-text-search

What is winansi?

I can't find a wikipage or anthing :(. It's an encoding like unicode right? So it has it's own mapping of code points to characters? ...

Charset conversion from XXX to utf-8, command line

I have a bunch of text files that are encoded in ISO-8851-2 (have some polish characters). Is there a command line tool for linux/mac that I could run from a shell script to convert this to a saner utf-8? ...

asp.net mvc encode on form post

Hello, I'm using a rich text editor in my asp.net mvc form (nicedit with a textarea) and when I submit the form on post, because it is not html encoded I get the following message: "A potentially dangerous Request.Form value was detected from the client" . How can I html encode the textarea on post ? I don't want to cancel the validation...

How to determine if a character is a chinese character

How to determine if a character is a chinese character use ruby？ ...

Python BOM error in Ascii file

I have a wierd annoying problem with Python 2.6 I trying to run this file (and the other), on my Embedded Linux ARM board. http://svn.tuxisalive.com/software_suite_v3/smart-core/smart-server/trunk/TDSService.py I get this error File "tuxhttpserver.py", line 1 SyntaxError: encoding problem: with BOM I know that error is about ...

Encoding issue with form and HTML Purifier / MySQL

Driving me nuts... Page with form is encoded as Unicode (UTF-8) via: <meta http-equiv="content-type" content="text/html; charset=utf-8"> entry column in database is text utf8_unicode_ci copying text from a Word document with " in it, like this: “1922.” is insta-fail and ends up in the database as â��1922.â�� (typing new data into the...

how could I store data within a GUID

I have an application that I want to represent a users session (just small pieces of data here and there) within a GUID. Its a 16 HEX characters (so 16^16 possible values) string and I want to 'encode' some data within that GUID. How can I achieve this? I am really after any ideas and implementations here, Ive not yet decided on the bes...

How to change the stdin encoding on python

Hi, I'm using windows and linux machines for the same project. The default encoding for stdin on windows is cp1252 and on linux is utf-8. I would like to change everything to uft-8. Is it possible? How can I do it? Thanks Eduardo ...

Dealing with ISO-encoding in AJAX requests (prototype)

I have a HTML-page, that's encoded in ISO-8859-1 and a Prototype-AJAX call that's build like this: new Ajax.Request('api.jsp', { method: 'get', parameters: {...}, onSuccess: function(transport) { var ajaxResponse = transport.responseJSON; alert(ajaxResponse.msg); } }); The api.jsp returns its data in IS...

Watermarking Flash Videos (server-side)

Hi all, I have a bunch of flash videos that I need to watermark with user related information, to make illegal re-distribution of these files harder. I'm wondering how can this be done server-side. If done client-side, it will be quite easy for the user to intercept the videos before they are watermarked. Since the watermark should co...

Should I convert overlong UTF-8 strings to their shortest normal form?

I've just been reworking my Encoding::FixLatin Perl module to handle overlong UTF-8 byte sequences and convert them to the shortest normal form. My question is quite simply "is this a bad idea"? A number of sources (including this RFC) suggest that any over-long UTF-8 should be treated as an error and rejected. They caution against "n...

Special character in "entrée" cannot be displayed correctly if defined in a separate javascript file

Example: The following string is defined in a json.js file. var test = "One complimentary entrée with the purchase of an entrée."; It is included in an index.html file by <script type="text/JavaScript" src="./json.js"></script> When the string is displayed in UI, it shows up as "One complimentary entr�e with the purchase of an...

special-characters

Resources for character and text processing (encoding, regular expressions, NLP)

I'd like to learn foundations of encodings, characters and text. Understanding these is important for dealing with a large set of text whether that are log files or text source for building algorithms for collective intelligence. My current knowledge is pretty basic: something like "As long as I use UTF-8, I'm okay." I don't say I need ...

text-processing

PHP utf encoding problem

How can I encode strings on UTF-16BE format in PHP? For "Demo Message!!!" the encoded string should be '00440065006D006F0020004D00650073007300610067006'. Also, I need to encode Arabic characters to this format. ...

Can't get data with spaces into the database from Ajax POST request

I have a real simple form with a textbox and a button, and my goal is to have an asynchronous request (jQuery: $.ajax) send the text to the server (PHP/mysql a la Joomla) so that it can be added to a database table. Here's the javascript that is sending the data from the client: var value= $('#myvalue').val(); $.ajax( { type: ...

how to convert unicode to printable string in QT stream

I'm writing a stream to a file and stdout, but I'm getting some kind of encoding like this: \u05ea\u05e7\u05dc\u05d9\u05d8 \u05e9\u05e1\u05d9\u05de\u05dc \u05e9\u05d9\u05e0\u05d5\u05d9 \u05d1\u05e1\u05d2\u05e0\u05d5\u05df \u05dc\u05d3\u05e2\u05ea\u05d9 \u05d0\u05dd \u05d0\u05e0\u05d9 \u05d6\u05d5\u05db\u05e8 \u05e0\u05d...

character-encoding

Rails 2.3.5, Ruby 1.9, SQLite 3 incompatible character encodings: UTF-8 and ASCII-8BIT

Hello, I know that question with same title has been asked almost 6 month ago. I have Googled for this problem and I have not found any working solution. Has there been any fixes for this very critical problem? I need to get my website running ASAP. Just to get the site up and running I'm even ready to add utf8 conversion methods to ...

Django db encoding

Hey, I have a little problem with encoding. The data in db is ok, when I select the data in php its ok. Problem comes when I get the data and try to print it in the template, I get - Å port instead of Šport, etc. Everything is set to utf-8 - in settings.py, meta tags in template, db table and I even have unicode method specified for th...

1
...
60
61
62
63
64
...
93