character-encoding

In Java can I ask the system to tell me the charset of a file?

There are questions like this, that are about guessing charset/encode of a file. But is there a method in Java to ask the system to tell me before try to guess? ...

Special characters in iText

I need help in using these symbols ⎕, ∨, ๐, Ʌ, and so on. But when I create a PDF with iText these symbols do not appear. What can I do so that these symbols appear? ...

What encoding looks exactly like ASCII but has NULL bytes before each byte?!

I have a string that looks and behaves as follows (Python code provided). WTF?! What encoding is it in? s = u'\x00Q\x00u\x00i\x00c\x00k' >>> print s Quick >>> >>> s == 'Quick' False >>> >>> import re >>> re.search('Quick', s) >>> >>> import chardet >>> chardet.detect(s) /usr/lib/pymodules/python2.6/chardet/universaldetector.py:69: Unico...

What is the range of Unicode Printable Characters?

Can anybody please tell me what is the range of Unicode (UTF8) printable characters? [e.g. Ascii printable character range is \u0020 - \u007f] ...

How to remove funny characters in javascript?

On the following line: alert ( "Apenas os números 0, 1, 3, 5, 7 e 9 são permitidos." ); it prints like this: Apenas os n?meros 0, 1, 3, 5, 7 e 9 s?o permitidos. The problem is that the characters ú and ã are not showing correctly. In HTML I did something like: Apenas os números 0, 1, 3, 5, 7 e 9 são permitidos. ...

Finding Unicode character name with Javascript

I need to find out the names for Unicode characters when the user enters the number for it. An example would be to enter 0041 and get given "Latin Capital Letter A" as the result. Thanks ...

Fonts and character encodings

a) Do fonts know anything about coded character sets (Unicode, ASCII, etc.)? In other words, does a font file specify which coded character sets may use the font? b) I assume if a font supports certain coded character sets, then any character encoding (aka code page) for that coded character set can use this font? a) Does a font's file ...

error while inserting symbol in database with JPA

I am using JPA to insert into Mysql database and it is not able to persist symbols like double quotes(") or euro etc. instead of that it persist Que mark (?) ...

i18n : Umlaut not being displayed correctly in JSP

Hi All, I have a JSP that is supposed to display some German text from some .properties files by using fmt:message, e.g. The corresponding entry in the .properties file is: service.test.hware.test = Hardware prüfen (umlaut between r and f in 2nd word). On internet explorer this displays as: Hardware prüfen the umlaut being corr...

Encoding in Ruby 1.8.7 or 1.9.2

I have been trying to use the gem 'character-encodings' which doesn't build in 1.9.2 however it does in 1.8.7 but even when I require 'encoding/character/utf-8' I still cant do the simplest of encoding. require 'encoding/character/utf-8' str = u"hëllö" str.length #=> 5 str.reverse.length #=> 5 str[/ël/] #=> "ël" I get ruby-1....

Multiple character encodings inside one HTML page possible?

I have a webpage that is set to UTF-8. But parts of its content (built in php) come from iso-8859-1 files and are thus not displayed correctly. Is it possible to set a specific encoding for a particular page element? ...

Fixing encondings

I have ended up with messed up character encodings in one of our mysql columns. Typically I have √© instead of é √∂ instead of ö √≠ instead of í and so on... Fairly certain that someone here would know what happened and how to fix. UPDATE: Based on bobince's answer and since I had this data in a file I did the following #!/use...

How can I programmatically find the list of codecs known to Python?

I know that I can do the following: >>> import encodings, pprint >>> pprint.pprint(sorted(encodings.aliases.aliases.values())) ['ascii', 'base64_codec', 'big5', 'big5hkscs', 'bz2_codec', 'cp037', 'cp1026', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'cp424', 'cp43...

How can I prevent strange characters when pulling the atom feed from a wordpress 3.0 blog

I have an atom feed on a wordpress blog here: http://blogs.legalview.info/auto-accidents/feed/atom When I download the text of the file and display it on my site, I get strange charactes like the accented 'A' here: Recent studies are showing that car accident -related fatalities have declined almost 10% since 2008. The reason for t...

Java PreparedStatement UTF-8 character problem

Hi All; I have a prepared statement: PreparedStatement st; and at my code i try to use st.setString method. st.setString(1, userName); Value of userName is şakça. setString methods changes 'şakça' to '?akça'. It doesnt recognize UTF-8 characters. How can i solve this problem? Thanks. ...

Problem with cyrillic characters in Ruby on Rails

Hi, In my rails app I work a lot with cyrillic characters. Thats no problem, I store them in the db, I can display it in html. But I have a problem exporting them in a plain txt file. A string like "элиас" gets "—ç–ª–∏–∞—Å" if I let rails put in in a txt file and download it. Whats wrong here? What has to be done? Regards, Elias ...

How do I fix invalid HTML characters in pages served with different encoding?

I have a number of websites that are rendering invalid characters. The pages' meta tags specify UTF-8 encoding. However, a number of pages contain characters that can't be interpreted by UTF-8, probably because the files were saved with another encoding (such as ANSI). The one in particular I'm concerned about right now is a fancy apostr...

Python regex to convert non-ascii characters in a string to closest ascii equivalents.

I'm seeking simple Python function that takes a string and returns a similar one but with all non-ascii characters converted to their closest ascii equivalent. For example, diacritics and whatnot should be dropped. I'm imagining there must be a pretty canonical way to do this and there are plenty of related stackoverflow questions but I'...

Unicode value \uXXXX to Character in Javascript

I've never done this before and am not sure why it's outputting the infamous � encoding character. Any ideas on how to output characters as they should (ASCII+Unicode)? I think \u0041-\u005A should print A-Z in UTF-8, which Firefox is reporting is the page encoding. var c = new Array("F","E","D","C","B","A",9,8,7,6,5,4,3,2,1,0); ...

How to detect unicode strings with unprintable characters?

I have Unicode strings stored in a database. Some of the character encodings are wrong and instead of displaying actual characters for the language, it's now displaying characters that make no sense. How do I fix this issue? Is there a way to detect if strings have a wrong encoding? ...