unicode

PHP PCRE differences on testing and hosting servers

Hi all, I've got the following regular expression that works fine on my testing server, but just returns an empty string on my hosted server. $text = preg_replace('~[^\\pL\d]+~u', $use, $text); Now I'm pretty sure this comes down to the hosting server version of PCRE not being compiled with Unicode property support enabled. The diffe...

Characters in string changed after downloading HTML from the internet.

Using the following code, I can download the HTML of a file from the internet: WebClient wc = new WebClient(); // .... string downloadedFile = wc.DownloadString("http://www.myurl.com/"); However, sometimes the file contains "interesting" characters like é to é, ← to ↠and フシギダネ to フシギダãƒ. I think it may be something to do...

How to replace unicode characters by ascii characters in Python (perl script given)?

I am trying to learn python and couldn't figure out how to translate the following perl script to python: #!/usr/bin/perl -w use open qw(:std :utf8); while(<>) { s/\x{00E4}/ae/; s/\x{00F6}/oe/; s/\x{00FC}/ue/; print; } The script just changes unicode umlauts to alternative ascii output. (So the complete ...

HTML: Is there an ascii character for a up/down triangle (arrow)?

I'm looking for an html/ascii character which is a triangle up and down so that I can use it as a toggle switch. I found , and - but those have an arrow "stem". I'm looking just for the html arrow "head". ...

Four byte encoding of U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS)?

Which character encoding (or combinations of encodings) represents the character ö (U+00F6, LATIN SMALL LETTER O WITH DIAERESIS or simply put chr(246) in ISO-8859-1) as the four octets combination chr(195) . chr(63) . chr(194) . chr(164)? ...

Which of the following Unicode characters should be used in HTML?

I am aware that any Unicode character can be inserted into an HTML document via the following format: &#x0000; ...where 0000 is the character code of the desired character My question is: which of these characters has the most widespread availability when it comes to the client's browser being able to display the character? In other...

Unicode filenames on windows in ruby

I have a piece of code that looks like this: Dir.new(path).each do |entry| puts entry end The problem comes when I have a file named こんにちは世界.txt in the directory that I list. On a Windows 7 machine I get the output: ???????.txt From googling around, properly reading this filename on windows seems to be an impossible task. Any s...

Fast, Unicode-capable, cross-platform programmer's text editor that shows invisibles like ZWSP?

Our publishing workflow includes Windows and Linux machines (there are some Macs too, but not in the critical-path workflow). Many texts include both English and Khmer and are marked-up in XML. XML Copy Editor is the best cross-platform open-source XML editor I've discovered. It utilizes the Scintilla editing component, which is general...

How to do proper Unicode and ANSI output redirection on cmd.exe?

If you are doing automation on windows and you are redirecting the output of different commands (internal cmd.exe or external, you'll discover that your log files contains combined Unicode and ANSI output (meaning that they are invalid and will not load well in viewers/editors). Is it is possible to make cmd.exe work with UTF-8? This qu...

Dompdf unicode problem

Is there any solution for dompdf unicode. ...

Unicode Kangxi radicals range 2F00–2FDF not displayed on iphone device, but in simulator

Hi, Kangxi radicals in the range 2F00-2FDF (see link text) are not displayed correctly on the iPhone device. They appear as a crossed-out box. In the simulator they display correctly. I tried the system font and also the [UIFont fontWithName:@"STHeitiTC-Medium" size:24]; ... Is the unicode codepoint coverage limited on the iphone...

How to ensure that no non-ascii unicode characters are entered ?

Given a java.lang.String instance, I want to verify that it doesn't contain any unicode characters that are not ASCII alphanumerics. e.g. The string should be limited to [A-Za-z0-9.]. What I'm doing now is something very inefficient: import org.apache.commons.lang.CharUtils; String s = ...; char[] ch = s.toCharArray(); for( int i=0; i<...

How to get decimal value of a unicode character in c++

For one of my opensource project, i need to compute decimal equivalent of given unicode character. For example if tamil character L'அ' given, output should be 2949 . I am using c++ in Qt environment. I googled and couldnot find a solution for this. Pls help if you know a solution for this. ...

UTF GET parameter codification problem in JSP (JBoss 2.0.1)

I´m trying to take a string from a GET or POST parameter in JSP with some accents in UTF-8: <%@ page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %> <% request.setCharacterEncoding("UTF-8"); String value = request.getParameter("q"); out.print(value+" | aáa"); %> The codification of the hardcoded string is co...

ISO-8859-1 and MacRoman Encoding

I've got a MySQL database table with an ISO-8859-1 encoded text field containing user names. When I export that to a text file using PHP I get a normal text file saved on the client computer. When I open it in Word or Excel on a Windows system, it looks good. When I open it on Mac using Word or Excel, the high-ascii characters are wro...

Why can't I display a unicode character in the Python Interpreter on Mac OS X Terminal.app?

If I try to paste a unicode character such as the middle dot: · in my python interpreter it does nothing. I'm using Terminal.app on Mac OS X and when I'm simply in in bash I have no trouble: :~$ · But in the interpreter: :~$ python Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type...

In MySQL how can I tell what character set a particular table is using?

I have a large mysql table that I think might be using the wrong character set. If so I'll need to change it using ALTER TABLE mytable CONVERT TO CHARACTER SET utf8 But since this is a very large table, I'd rather not run this command unless I have to. So my question is, how can I ask mysql what the character set is on a particular t...

In Adobe Flex, why does an embedded version of a font, behave differently from the same font installed in the system.

Scenario: Flex application utilizing an @font-face declaration for embedding the font. (Embedded fonts are required to be able to rotate text.) The application was originally developed as an English application, but during localization it became necessary to locate a unicode font capable of displaying Asian characters. The original im...

win32 ruby1.9 regexp and cyrillic string

#coding: utf-8 str2 = "asdfМикимаус" p str2.encoding #<Encoding:UTF-8> p str2.scan /\p{Cyrillic}/ #found all cyrillic charachters str2.gsub!(/\w/u,'') #removes only latin characters puts str2 The question is why \w ignore cyrillic characters? I have installed latest ruby package from http://rubyinstaller.org/. Here is my output of r...

What is winansi?

I can't find a wikipage or anthing :(. It's an encoding like unicode right? So it has it's own mapping of code points to characters? ...