views:

1093

answers:

3

I have a MyFaces Facelets application, where the page coding is a bit rugged. Anyway, it's developed with Eclipse and built with Ant, and kindof runs ok in Tomcat 2.0.26. So far so good.

Now, I'd rather build with Maven, so I made a couple of pom-files, opened them in Netbeans and built, and now I have a war file that deploys ok. However, on any facelet page it barfs out with

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:554)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)

So, I've tried a lot of different things, and the application actually run simple pages without facelet stuff. But, everything runs if I just build with Ant instead ... So my question is: What's the most likely difference between an ant build and a maven build that may cause this?

It also seems that even though I've configured for UTF-8 in Netbeans and pom-files, Netbeans eventually ends up reporting the facelet files as ISO-8859-1 after some editing.

I've made sure that most central libs are of same version (especially xerces 2.3.0), I've added an encoding servlet filter that had no effect.

And, I'd rather fix the maven build and keep the buggy pages, than the other way around ... it's my intention to introduce Naven, not fix buggy pages.

Here is what the pom.xml says about encoding:

Basically the pom.xml has the following set ...

 <plugins>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>2.0.2</version>
                <configuration>
                    <source>1.6</source>
                    <target>1.6</target>
                    <encoding>${project.build.sourceEncoding}</encoding>>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-resources-plugin</artifactId>
                <version>2.2</version>
                <configuration>
                    <encoding>${project.build.sourceEncoding}</encoding>
                </configuration>
            </plugin>

....

    <properties>
        <netbeans.hint.deploy.server>Tomcat60</netbeans.hint.deploy.server>
        <project.build.sourceEncoding>utf-8</project.build.sourceEncoding>
    </properties>
A: 

@Pascal Thivent: thanks, I don't think I can post the entire application, it's too messy, but I'll try to set up a minimalistic stack that exhibits the same behaviour and post it.

And yes, regarding the link, I have checked in a hex-editor that the files are utf-8, and I've already made the servlet filter that changes encoding, and it had no effect.

deleted
A: 

I had the same problem...

solved using

String str = new String(oldstring.getBytes("UTF-8");

Ederson Amorim
This is not a good solution. This encodes a string of dubious validity to UTF-8 bytes, then decodes them using the system encoding. Depending on the machine you run this code on, this operation may corrupt data or do absolutely nothing.
McDowell
A: 

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.

The cause of this is a file that is not UTF-8 is being parsed as UTF-8. It is likely that the parser is encountering a byte value in the range FE-FF. These values are invalid in the UTF-8 encoding.

The problem could probably be solved by changing the XML declaration of the file to be the correct encoding or re-encoding the file to UTF-8.

McDowell