utf-8

Setting Quercus db connection encoding to UTF-8 (urgent problem and need your great help)

Now we are going to use java class in my website developed with PHP + mySQL. I came to know Quercus and it worked well. But only problem is encoding. Quercus is providing ISO8859 encoding in default and all database in UTF-8 is not shown properly like ???. If anybody know the way to set Quercus db connection encoding to UTF-8, please he...

move_uploaded_file does not support utf8 file name

I am using uploadify and the file name retrieved from $_FILES["fileData"]["name"] on server side is in utf8. It may contain chinese or japanese characters. After the following codes run, $tempFileWithPath = $_FILES['Filedata']['tmp_name']; $destFile = $_FILES['Filedata']['name']; $destFileWithPath=myUtility::getFileRepositoryPath().'/'...

Python zlib output, how to recover out of mysql utf-8 table?

In python, I compressed a string using zlib, and then inserted it into a mysql column that is of type blob, using the utf-8 encoding. The string comes back as utf-8, but it's not clear how to get it back into a format where I can decompress it. Here is some pseduo-output: valueInserted = zlib.compress('a') = 'x\x9cK\x04\x00\x00b\x00b' ...

JavaScript DOM XSS Injection validation

Is this regular expression enough to catch all cross site scripting attempts when embedding HTML into the DOM. eg: Such as with document.write() (javascript:|<\s*script.*?\s*>) It is referenced in this document from modsecurity.com http://www.modsecurity.org/documentation/Ajax%5FFingerprinting%5Fand%5FFiltering%5Fwith%5FModSecurity%5...

Convert two string to the same byte length

I have 2 strings in my PHP code, 1 is a parameter to my method and 1 is a string from an ini file. The problem is that they are not equal, although they have the same content, probably due to encoding issues. When using var_dump, it is reported that the first string's lenght is 23 and the second string's length is 47 (see the end of my q...

Encoding strings in XML from Oracle query

I'm producing XML right from PL/SQL in Oracle. What is the preferred way of ensuring that outputted strings are XML-conformant, with regards to special characters and character encoding ? Most of the XML file is static, we only need to output data for a few fields. Example of what I consider bad practice: DECLARE @s AS NVARCHAR(100...

utf-8 decoding in java

I'm trying to pass parameters from a PHP middle tier to a java backend that understands J2EE. I'm writing the controller code in Groovy. In there, I'm trying to decode some parameter that will likely contain international characters. I am really puzzled by the results of my debugging this problem so far, hence I wanted to share it with ...

PHP mbstring.func_overload vs using mbstring functions

I want to conform my site's string handling to support other languages per UTF-8. It seems that the best way to do this is to forsake all the standard string functions. So I have two options, I can set the mbstring.func_overload option in php.ini or I can go back over my code and just replace all the functions with mb_*. I would assume ...

PHP input filtering - checking ascii vs checking utf8

I need to insure that all my strings are utf8. Would it be better to check that input coming from a user is ascii-like or that it is utf8-like? //KohanaPHP function is_ascii($str) { return ! preg_match('/[^\x00-\x7F]/S', $str); } //Wordpress function seems_utf8($Str) { for ($i=0; $i<strlen($Str); $i++) { if (ord($Str[$i]) ...

How to convert with Ruby accented characters in HTML special entities

How can I do this on Ruby? puts some_method("ò") # => "&ograve;" In other words convert an accented character like ò to his HTML version: &ograve; I tried like this: # coding: utf-8 require 'rubygems' require 'htmlentities' require 'unicode' coder = HTMLEntities.new string = "Scròfina" puts coder.encode(string, :named) but what I...

nolatin characters in xml output

Edit: I hardcoded the charcter and use repsonse writer to write it, it still comes out to be K�nigsberger response.setCharacterEncoding("UTF-8"); response.setContentType(contentType); //if(contentType!=null)response.setHeader("Content-Type",contentType); Writer writer = response.getWriter();//new OutputStreamWriter(...

UTF-8 output on Windows XP console

The following code shows unexpected behaviour on my machine (I'm using Visual C++ 2008 SP1 on Windows XP here): int main() { SetConsoleOutputCP( CP_UTF8 ); std::cout << "\xc3\xbc"; int fail = std::cout.fail() ? '1': '0'; fputc( fail, stdout ); fputs( "\xc3\xbc", stdout ); } I simply compiled with cl /EHsc...

How can I convert an XML document from Latin-1 to UTF-8 in Perl?

We at the company want to convert all the sites we are hosting from Latin-1 to UTF-8. After a ot of googling, we have our Perl script almost complete. The only thing that is missing now are the XML files. What is the best way to convert XML from Latin-1 to UTF-8 and is it useful? I am asking because we are unsure about it since most en...

how to set UTF8 lang on Tomcat/java running on Mac OS 10.5.8?

I'm running Tomcat6 locally on Mac OS 10.5.8. Our staging and production servers have setup an environment variable of: LANG=en_US.UTF-8 Stage and production run on CentOS and read this value in when java and Tomcat starts up. However, it doesn't appear that java is reading this value and is defaulting to en_US_ISO_85591. On my local ...

How to Generate all the characters in the UTF-8 charset in .net

I have been given the task of generating all the characters in the UTF-8 character set to test how a system handles each of them. I do not have much experience with character encoding. The approaching I was going to try was to increment a counter, and then try to translate that base ten number into it's equivalent UTF-8 character, but...

Windows-1251 file inside UTF-8 site?

Hello everyone Masters Of Web Delevopment :) I have a piece of PHP script that fetches last 10 played songs from my winamp. This script is inside file (lets call it "lastplayed.php") which is included in my site with php include function inside a "div". My site is on UTF-8 encoding. The problem is that some songs titles are in Windows-12...

Passing foreign language characters to/from a database

I am trying to allow users to enter Hebrew characters into certain fields in an HTML form (processed using java). I did some research, and it is apparent that the following tag needs to be part of the HTML document: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> That being done, I am having the following result...

Guessing the encoding of text represented as byte[] in Java

Given an array of bytes representing text in some unknown encoding (usually UTF-8 or ISO-8859-1, but not necessarily so), what is the best way to obtain a guess for the most likely encoding used (in Java)? Worth noting: No additional meta-data is available. The byte array is literally the only available input. The detection algorithm ...

PostgreSQL + PHP + UTF8 = invalid byte sequence for encoding

I'm migrating a db from mysql to postgresql. The mysql db's default collation is UTF8, postgres is also using UTF8, and I'm encoding the data with pg_escape_string(). For whatever reason however, I'm running into some funky errors about bad encoding: pg_query() [function.pg-query]: Query failed: ERROR: invalid byte sequence for encoding...

Change encoding to UTF-8 recursively on Windows?

Does anybody know a tool, preferably for the Explorer context menu, to recursively change the encoding of files in a project from / to UTF-8 and other encodings? Freeware or not too expensive would be great. Edit: Thanks for the answers, +1 for all of them as they are all fine but I am a lazy bastard sometimes, and would really like to ...