questions about utf-8 | ansaurus

utf-8

UTF8 (Quoted Printable) conversion in C# question

Hello, I am pulling French emails from a mailbox and the emails contain accents. I believe it is using UTF8 encoding. I have tried different UTF8 conversion methods I've found around the Internet but have been unsuccessful. How, for example, in C#, do I convert this: Montr=C3=A9al to Montréal? Edit: Also, it is inconsistent. Someti...

How to avoid putting the magic encoding comment on top of every UTF-8 file in Ruby 1.9?

I have a Rails project with lots and lots of cyrillic strings in it. It worked fine on Ruby 1.8, but Ruby 1.9 recognizes all source files as US-ASCII-encoded until you provide an # encoding: utf-8 comment on top of each and every source file in the project. Obviously the files don't parse as US-ASCII. Is there a simpler way to say, li...

Python: what does "...".encode("utf8") fix?

I wanted to url encode a python string and got exceptions with hebrew strings. I couldn't fix it and started doing some guess oriented programming. Finally, doing mystr = mystr.encode("utf8") before sending it to the url encoder saved the day. Can somebody explain what happened? What does .encode("utf8") do? My original string was a un...

internationalization

polish characters utf8 dont show right

Currently my site supports English, portuguese, swedish and polish. But for some reason some polish characters dont show right, like Zal�z konto it should look like this Zalóz konto I have this // Send the Content-type header in case the web server is setup to send something else header('Content-type: text/html; charset=utf-8'); and ...

Python utf-8 handling

Hi, I am using Python 2.6.1 and am having utf-8 related problem with my code. This problem is reproducible with this code: # -*- coding: utf-8 -*- import os, sys import string, time import codecs, re bDATA='"Domenick Lombardozzi","Eddie Marsan","Isaach De Bankolé","John Hawkes"' print (bDATA) fileObj = codecs.open("btvresp1.txt", "r", ...

Cut an UTF8 text in PHP

Hi, I get UTF8 text from a database, and I want to show only the first $len characters (finishing in a word). I've tried several options but the function still doesn't work because of special characters (á, é, í, ó, etc). Thanks for the help! function text_limit($text, $len, $end='...') { mb_internal_encoding('UTF-8'); if( (mb_...

What is the most efficient way to format UTF-8 strings in java?

I am doing the following: String url = String.format(WEBSERVICE_WITH_CITYSTATE, cityName, stateName); String urlUtf8 = new String(url.getBytes(), "UTF8"); Log.d(TAG, "URL: [" + urlUtf8 + "]"); Reader reader = WebService.queryApi(url); The output that I am looking for is essentially to get the city name with blanks (e.g., "Overland Par...

php mb_convert_case() keep words that are in uppercase

Hi, Assuming I have a string "HET1200 text string" and I need it to change to "HET1200 Text String". Encoding would be UTF-8. How can I do that? Currently, I use mb_convert_case($string, MB_CASE_TITLE, "UTF-8"); but that changes "HET1200" to "Het1200. I could specify an exception, but it won't be an exhaustive. So I rather all upperca...

string-manipulation

changing the encoding for eclipse

I want to change the encoding for eclipse to UTF-8 ...

How do I HTML-/ URL-Encode a std::wstring containing Unicode characters?

Hi, I have another question yet. If I had a std::wstring looking like this: ドイツ語で検索していてこちらのサイトにたどり着きました。 How could I possibly get it to be URL-Encoded (%nn, n = 0-9, a-f) to: %E3%83%89%E3%82%A4%E3%83%84%E8%AA%9E%E3%81%A7%E6%A4%9C%E7%B4%A2%E3%81%97%E3%81%A6%E3%81%84%E3%81%A6%E3%81%93%E3%81%A1%E3%82%89%E3%81%AE%E3%82%B5%E3%82%A4...

Enforcing proper UTF-8 encoding from user input in a form

hi everyone, i have a web form written in asp.net that allows user enter content which is then saved to a DB and written out as an xml file for a third party to import into their systems. We output the xml file as UTF-8. They currently have a problem where a euro symbol (€) is breaking their xml parser with the following error: parse...

Display problem with Japanese characters

I am fetching a Japanese string from Oracle Database and displaying it on the browser. But the characters are shown on the browser like ???. Inserted the Japanese string into DB using the unistr() function. INSERT INTO MESSAGES (MESSAGE_ID,MESSAGE) VALUES (1,unistr('\0041\0063\0063\0065\0073\0073\0020\004d\0061\006e\0061\0067\0065\006d\...

web-development

character-encoding

Utf-8 characters displayed as ISO-8859-1

Hi there, I've got an issue with inserting/reading utf8 content from a db. All verifications I'm doing seem to point to the fact that the content in my DB should be utf8 encoded, however it seems to be latin encoded. The data are initially imported from a PHP script from the CLI. Configuration: Zend Framework Version: 1.10.5 mysql-ser...

Problem encoding accented characters with python

I'm having trouble encoding accented characters in a URL using the python command line. Reducing my problem to the essential, this code: >>> import urllib >>> print urllib.urlencode({'foo' : raw_input('> ')}) > áéíóúñ prints this in a mac command line: foo=%C3%A1%C3%A9%C3%AD%C3%B3%C3%BA%C3%B1 but the same code prints this in window...

How to enclose every cell with double quotes in Google docs spreadsheet

I have utf-8 data which I would like to save as csv. My old version of Excel mangles utf-8, so I have to resort to using google's spreadsheet which handles utf-8 beautifully. Some of my data have commas in them, so I must wrap every field of data in the csv with double quotes. I have hundreds of lines, so it would take some time to do it...

google-spreadsheet

How do I reliably pull a text file of unknown encoding into an NSString on an iPhone?

Here is some code which I use to pull a text file into a UITextView called textView. Because I can't always know the file's encoding ahead of time, I use the method -stringWithContentsOfFile:usedEncoding:error:, which stores an encoding value by reference. Once I have the encoding, I then open up the file and, if there's an error, prin...

Error reading UTF-8 file in Java

Hi, I am trying to read in some sentences from a file that contains unicode characters. It does print out a string but for some reason it messes up the unicode characters This is the code I have: public static String readSentence(String resourceName) { String sentence = null; try { InputStream refStream = ClassLoader ...

internationalization

mysql group_concat order by utf8

Hi everyone! I have the following problem I have 3 table (all of the are used utf8 / utf8_general_ci encoding) movies, channels, i also have 3 table movie_channels which is just a combination of the other two with just 2 fields: movie_id,channel_id here is my channels table (code,name) '1', 'ОРТ' '2', 'ТК Спорт' '3', 'ТК ТНВ' '4', ...

internationalization

How to remove u'' from python script result?

Hello. I'm trying to wrote parsing script using python/scrapy. How can I remove [] and u' from strings in result file? Now I have text like this: from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from scrapy.utils.markup import remove_tags from googleparser.items import GoogleparserItem import sys clas...

utfcpp and Win32 wide API

Is it good/safe/possible to use the tiny utfcpp library for converting everything I get back from the wide Windows API (FindFirstFileW and such) to a valid UTF8 representation using utf16to8? I would like to use UTF8 internally, but am having trouble getting the correct output (via wcout after another conversion or plain cout). Normal A...

1
...
54
55
56
57
58
...
69