Problem with libxml character enconding on win32 | ansaurus

tags:

views:

68

answers:

1

Q:

Problem with libxml character enconding on win32

While parsing some html files with libxml the function xmlParseFile() returns that the code includes non UTF-8 characters How can i modify the default charset of the library to ISO-8859-1 ? Is there any other way to solve this ?

PS: The entire development is based on libxml and works in most cases so I can't switch to another library.

+1 A:

The encoding used for XML data must be specified in the XML's prolog. If no encoding is specified, W3's XML spec dictates that UTF-8 must be assumed instead.

Why are you using an XML parser for parsing HTML data? libxml has an HTML parser separate from its XML parser. Look at htmlParseFile() and related functions. Since HTML is not XML, there would be no XML prolog present to indicate the data encoding. HTML does have a <meta> tag available that can be used inside the <head> tag for that, though. libxml's HTML parser does look for that tag to determine the encoding, if not explicitally passed to htmlParseFile() directly.

Remy Lebeau - TeamB 2009-08-14 21:10:36

related questions

Autosizing Textarea

Regular expression for parsing links from a webpage?

What are good tools for creating compiled HTML help files (.chm)?

Looking for WYSIWYG HTML editor

Any reason not to start using the HTML 5 doctype?

HTML comments break down

HTML Comments Markup

Setting a div's height in HTML with CSS

Wrapping lists into columns

Is a "Confirm Email" input good practice when user changes email address?

<XMP> Tag

HTML version choice

Options for HTML scraping?

How do you disable browser Autocomplete on web form field / input tag?

How do I make a checkbox toggle from clicking on the text label as well?

Html CSS Editor

Wordpress theme development offline tools

How do I give my web sites an icon for iPhone?

In HTML, how to word-break on a dash?

Detecting font in JavaScript

How do you test layout design across multiple browsers/OSs?

How do I print an HTML document from a web service?

Multiple submit buttons on a HTML form

How can I determine a web user's time zone?

Why doesn't the percentage width child in absolutely positioned parent work in IE7?