questions about character-encoding | ansaurus

character-encoding

MySQL diacritic insensitive search (spanish accents)

Hello, I have a MySQL database with words containing accents in Spanish (áéíóú). I'd like to know if there's any way to do a diacritic insensitive search. For instance, if I search for "lapiz" (without accent), i'd like to get results containing the word "lápiz" from my db. The way I'm currently doing the query is as follows: $result =...

character-encoding

Display problem with Japanese characters

I am fetching a Japanese string from Oracle Database and displaying it on the browser. But the characters are shown on the browser like ???. Inserted the Japanese string into DB using the unistr() function. INSERT INTO MESSAGES (MESSAGE_ID,MESSAGE) VALUES (1,unistr('\0041\0063\0063\0065\0073\0073\0020\004d\0061\006e\0061\0067\0065\006d\...

web-development

character-encoding

What encoding is this, and better yet, how do i decode it in ruby

@string = "\x16\x03\x01\x00\x91\x01\x00\x00\x8D\x03\x01LI.\e\x8F|\x06\f\xA2Tu\xC8WW\xCF\x87G2O,98\xEC\xADMM H\xB4\x0E-G\x00\x00H\xC0\n\xC0\x14\x00\x88\x00\x87\x009\x008\xC0\x0F\xC0\x05\x00\x84\x005\xC0\a\xC0\t\xC0\x11\xC0\x13\x00E\x00D\x00f\x003\x002\xC0\f\xC0\x0E\xC0\x02\xC0\x04\x00\x96\x00A\x00\x04\x00\x05\x00/\xC0\b\xC0\x12\x00\x16\x0...

character-encoding

Encoding errors in .jspx

I'm currently trying to deploy some RSS feeds on a WebLogic Application Server. The feeds' views are .jspx files, like the one below: <?xml version="1.0" encoding="utf-8"?> <feed xmlns="http://www.w3.org/2005/Atom" xmlns:georss="http://www.georss.org/georss" xmlns:jsp="http://java.sun.com/JSP/Page" xmlns:c="http://java.sun....

character-encoding

Character detection in a text file in Python using the Universal Encoding Detector (chardet)

I am trying to use the Universal Encoding Detector (chardet) in Python to detect the most probable character encoding in a text file ('infile') and use that in further processing. While chardet is designed primarily for detecting the character encoding of webpages, I have found an example of it being used on individual text files. Howe...

character-encoding

Perl Text::CSV_XS Encoding Issues

I'm having issues with Unicode characters in Perl. When I receive data in from the web, I often get characters like √¢¬Ä¬ú or √¢¬Ç¬¨. The first one is a quotation mark and the second is the Euro symbol. Now I can easily substitute in the correct values in Perl and print to the screen the corrected words, but when I try to output to ...

character-encoding

certain utf characters do not show up on browsers and fails python script

Hi All, I generated a SQL script from a C# application on Windows 7. The name entries have utf8 characters. It works find on Windows machine where I use a python script to populate the db. Now the same script fails on Linux platform complaining about those special characters. Similar things happened when I generated XML file containing...

character-encoding

php website not showing correctly in utf-8

I have a website which has some non standard characters such as ë, Ç etc. The website uses ISO-8859-1 as it's character encoding, however at this point I want to switch it to UTF-8 for some reasons related to rss feeds. When i change the character encoding to utf-8 the mentioned characters are displayed incorrectly. I set the charset ...

character-encoding

What's the difference between utf8_general_ci and utf8_unicode_ci in MySQL?

For a while now, I've used phpMyAdmin to manage my local MySQL databases. One thing I'm starting to pick up is the correct character sets for my database. I've decided UTF-8 is the best for compatibility (as my XHTML templates are served as UTF-8) but one thing that confuses me is the varied options for UTF-8 I'm presented with in the ph...

character-encoding

Encoding problems while displaying formatted text with php

I have formatted english text stored in mysql. When I echo it out with php, I get a whole a lot of "� � � � � ��" instead of where spaces should be. It looks fine in the DB. Whats the reason for that? ...

character-encoding

Printing Turkish Characters in GUI

Hello, I have a Java project that connects to a C# program that prints Turkish words. Printing Turkish characters in C# using console is not causing any problems. However, the main issue is that when this C# program is called from Java, the Turkish characters are printed weirdly. What I would like to do is to get the output printed on c...

character-encoding

charset-aware tests like isalpha() etc. and iterators - is there such thing?

I get a character string and the encoding charset, like iso_8859-1, utf-8 etc. I need to scan the string tokenizing it to words, as I'd do using isspace() and ispunc(). Are there character test functions that take charset into account? Also, are there iterators that advance the correct number of bytes? Note: I know I can convert the st...

character-encoding

recognize code page of input string

How to recognize code page of input string, for example if I put something in Cyrillic it should return windows-1251 and when I put string in Chinese it return other code page etc ...

character-encoding

downgrade non-ascii symbols to closest 7-bit ASCII equivalent (preferrably Java)

Hello there, is there any simple/lightweight solution to change at least some of non-ASCII symbols to respective ASCII analogs? For example this string abc-åäö.txt should be changed to abc-aao.txt A bit of background: Zip-tools do not reliably support UTF-8, hence the need to downgrade. AFAICR Google "download attachments as sin...

character-encoding

Encoding problem in PHP SoapClient and C# SOAP Server responses

Hi, I have problems with special characters in soap responses. All connections are made in utf-8 (in xml headers, SoapClient configuration, php source code, database connections, soap server responses) and I don't understand what is happening. All special characters are replaced with a sharp "#" character. For example: Instead of "Sól...

character-encoding

How to get byte size of multibyte string

How do I get the byte size of a multibyte-character string in Visual C? Is there a function or do I have to count the characters myself? Or, more general, how do I get the right byte size of a TCHAR string? Solution: _tcslen(_T("TCHAR string")) * sizeof(TCHAR) EDIT: I was talking about null-terminated strings only. ...

character-encoding

Cannot save characters of my language

Hi, I am trying to save some data(name, last name) in my forms with php and mysql. It's simple form like: <input type='text' name='first_name' /> And php gets it after submiting with: $first_name = trim(mysql_prep($_POST['first_name'])); The problem is, that if I type characters in my languege(lithuanian), it won't save them and w...

character-encoding

Writing binary data to stdout with IronPython

I have two Python scripts which I am running on Windows with IronPython 2.6 on .NET 2.0. One outputs binary data and the other processes the data. I was hoping to be able to stream data from the first to the second using pipes. The problem I encountered here is that, when run from the Windows command-line, sys.stdout uses CP437 character...

character-encoding

append encoding on creation of xml file

this code creates an xml file if it does not exist: $xmldoc = new DOMDocument(); if(file_exists('test.xml')){ $xmldoc->load('test.xml'); } else { $xmldoc->loadXML('<root/>'); } however, i would also like encoding="UTF-8" to be appended automatically on creation of the file. how would one do this in php? ...

character-encoding

How can I get html webpage charset encode from html as string and not as dom?

Hi, How can I get html webpage charset encode from html as string and not as dom? I get html string like that: $html = file_get_contents($url); preg_match_all (string pattern, string subject, array matches, int flags) but i dont know regex, and I need to find out webpage charset (UTF-8/windows-255/etc..) Thanks, ...

character-encoding

1
...
38
39
40
41
42
...
51