character-encoding

Make encoding uniform before comparing strings in PHP

I'm working on a feature which requires me to get the contents of a webpage, then check to see if certain text is present in that page. It's a backlink checking tool. The problem is this - the function runs perfectly most of the time, but occasionally, it flags a page for not having a link when the link is clearly there. I've tracked ...

How to a recover a text from a wrong encoding?

I have got some files created from some asian OS (chinese and japanese XPs) the file name is garbled, for example: иè+¾«Ñ¡Õä²ØºÏ¼­ how i can recover the original text? I tried with this in c# Encoding unicode = Encoding.Unicode; Encoding cinese = Encoding.GetEncoding(936); byte[] chineseBytes = chinese.GetBytes(garbledString); b...

How to deal with HTML-entities for publishing multilingual content

In case of publishing any text online as a HTML page – I face the problem of the correct reflection of symbols of several languages which require extended Latin character encoding. In this case I’m searching the Entity (hex) from the list on this site http://theorem.ca/~mvcorks/code/charsets/auto.html . I wonder If it’s possible to save ...

strange Encoding issue

i have a contact form sending itself to me by email using ASP classic, CDO.message. thing is, that it has hebrew characters and i encoded it UTF-8. but when sending to my email i get ??????? instead of hebrew. i copied the exact files handeling this form to another FTP i have, and BOOM! it works fine.. what is the cause? ...

ASP.NET & Ajax: query string parameters using ISO-8859-1 encoding

Hi there, folks Here's another one for you to help me solve: I have an ASP.NET website that uses AJAX (asynchronous) calls to am .ashx handler, passing a query string parameter to get some information from the database. Here's an example of how it works: Client-side (Javascript) code snippet that makes the asynchronous call to the han...

Is ASCII "../" the only byte sequence that indicates a directory traversal in PHP?

I have a PHP app that uses a $_GET parameter to select JS/CSS files on the filesystem. If I deny all requests in which the input string contains ./, \ or a byte outside the visible 7-bit ASCII range, is this sufficient to prevent parent directory traversals when the path is passed to PHP's underlying (C-based) file functions? I'm aware...

Convert only certain xml characters to their HTML entities (&#nnn;)

I have a problem where I have some html like this <p>There is the unfinished business of Taiwan, eventual “reunification”...a communiqué committing</p> In that text string I would not want to change the < and > to & lt ; and ^ gt ; However I would want to convert the quotes around “reunification” and the é in communiqué. ...

struts2 request encoding

I am sending a XML in HTTP POST body. Question: Does struts2 support processing request in utf-8 encoding? Reference: http://www.experts-exchange.com/Programming/Languages/Java/Q%5F24061148.html (Around bottom of the page) ...

How to read the encoding header with out knowing the encoding?

If I am reading an XML of HTML file, don't I have to read the tag that tells me the encoding to be able to read the file? Isn't that tag encoded the same way the file is? I am curious how you read that tag with out knowing the encoding. I realize this is solved problem. I am just curious how its done. Update 1 I dont get it, in UTF-16 ...

Decoding split 16-bit character in Java

In my application, I receive a URL-UTF8 encoded string of characters, which is split up by the sending client. After splitting, each message part includes some header information which is meant to be used to reconstruct the message. With English characters, it's pretty straightforward String content = new String(request.getParameter("c...

Converting a database from one character encoding to another

I have a MYSQL database. Text is currently stored in charset latin1, collation latin1_swedish_ci. These are the defaults and it wasn't a problem back in the day when the database was originally created. I want to switch over to UTF8 so the text encoding in the database matches out text encoding used elsewhere on the web site that uses t...

Making sure String does not exceeds 2000 bytes in Oracle database table column

Want to truncate error string so it for sure fits into Oracle table column VARCHAR2(2000 BYTE) Design forces: The main goal is to fit to the table column. 90-95% of string text is exception message and stacktraces. But it could contain some customer name with french, turkish characters which I am willing to disregard and see as ? or ...

can't understand these xml encoding woes

The following hunk of code (snipped for brevity) generates an xml doc, and spits it out to a file. If I open the file in Visual Studio it appears to be in chinese characters. If I open it in Notepad it looks as expected. If I Console.WriteLine it look correct. I know it's related to encoding, but I though I had all the encoding ducks ...

Find a Windows-1252 char in mysql column

There's a row that I believe contains a Windows-1252 smart-quote char in a particular column that is messing up a user of this table. How can I select any row that contains any Windows-1252 punctuation in this column? AND it would be really cool if I had a way of converting these values if I redefine the column as being utf8 (it's curre...

How to post form data on an UTF-8 page to a western european (ISO) page

I need to post a form on a new website which is UTF-8 encoded. The problem is - i need to post it to a legacy site encoded with western european (iso). Certain characters gets messed up in the post (like danish special characters). It is not possible to change the character encoding on the legacy website as it would definately break stu...

How can I convert an input file to UTF-8 encoding in Perl?

I already know how to convert the non-utf8-encoded content of a file line by line to UTF-8 encode, using something like the following code: # outfile.txt is in GB-2312 encode open my $filter,"<",'c:/outfile.txt'; while(<$filter>){ #convert each line of outfile.txt to UTF-8 encoding $_ = Encode::decode("gb2312", $_); ...} ...

Strange gcc error: stray '\NNN' in program

The following issue popped up in my open source library, and I can't figure out what's going on. Two of my users have (gcc) compiler errors that look like: /home/someone/Source/src/._regex.cpp:1:1: warning: null character(s) ignored /home/someone/Source/src/._regex.cpp:1: error: stray ‘\5’ in program /home/someone/Source/src/._regex.cp...

jQuery.post and encoding

I have a form in a webpage, where the user can enter any arbitrary html. Once he clicks submit, I am sending the content to the webserver via AJAX using jQuery.post(). But for certain HTML, I am getting this response from the server HTTP/1.0 400 Bad Request Content-Type: text/plain Date: Mon, 26 Oct 2009 05:28:00 GMT BAD REQUEST: Bad...

Should this char be unsigned?

I found some confusing code during code review and am a bit puzzled. Doing some research I found this situation. I wrote this sample of code to highlight the problem char d = '©';// this is -87,the copyright symbol , (actually its 169 unsigned) if(ispunct(d)) // will assert. { } so, the programmer who was bug fixing, did the ...

Can MS SQL Server 2000 handle Chinese text encoded with Unicode?

Hi all, Now, I am using Ms sql server 2000 and I want to store my data as the unicode for chinese font. But I don't know it can store this type or not? If not, could anybody guide me? Thanks, Sopolin ...