views:

133

answers:

4

I have a rather annoying issue that I solved using a simple recursive method in Java. However, I'm looking for a better way to do this.

The initial problem involved the presence of whitespace within a Quoted Printable/Base64 encoded Mime header - which as I read the RFC 2047 specification - isn't allowed. This means that decoding fails for a MIME header when whitespace is present, e.g.

=?iso-8859-1?Q?H=E4 ll and nothing?=

or more pertinently:

=?iso-8859-1?Q?H=E4 ll?= preserve this text =?iso-8859-1?Q?mo nk ey?=

The goal is to only remove the whitespace between the =? ?= boundaries (or re-encode using =20). Other text outside this should be preserved.

I'm looking for alternative approaches to solving this my target language for this is Java. Any ideas on the simplest, cleanest approach to this?

A: 

Regular expressions http://java.sun.com/docs/books/tutorial/essential/regex/.

\s = whitespace
\S = non-whitespace
\? = escaped question mark
. = all characters, similar to * in weaker pattern matching.

Might be easiest to do a multi-part find and replace using something like this: Pull out this part: =\?.\?=

Globally replace \s in that part with empty string.

Put the part back.

You might be able to get it down to a single search and replace if you play with the regex long enough...

steamer25
You could use capture groups and iterate over the results but at that point you're probably better off with your simple recursive method which I assume is essentially pgras' state machine.
steamer25
+2  A: 

You could build a simple state machine to track if you are between =? and ?= , then read the input char by char and output it char by char converting whitespaces when needed...

pgras
A: 

You could split the string on ?, then put it back together, alternating between replacing spaces and not.

Edit: Oops. Missed the equal signs. Will correct.

Edit 2: Corrected implementation (derived from Javadoc example for Matcher.appendReplacement() ):

String input = "=?iso-8859-1?Q?H=E4 ll?= what about in this case? :) =?iso-8859-1?Q?mo nk ey?=";

Pattern p = Pattern.compile("=\\?(.*?)\\?=");
Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer();
while (m.find()) {
    m.appendReplacement(sb, m.group().replaceAll(" ", ""));
}
m.appendTail(sb);
System.out.println(sb.toString());
Chris Thornhill
=?iso-8859-1?Q?H=E4 ll?= what about in this case? :) =?iso-8859-1?Q?mo nk ey?=
Jon
A: 

Well, I don't know about better, but here's an alternate approach:

    public static void main( String[] args )
    {
        String ex1 = "=?iso-8859-1?Q?H=E4 ll?= " + 
            "preserve this text =?iso-8859-1?Q?mo nk ey?=";
        String res1 = removeSpaces( ex1 );

        System.out.println( ex1 );
        System.out.println();
        System.out.println( res1 );
    }

    public static String removeSpaces( String str )
    {
        StringBuffer result = new StringBuffer();
        String strPattern = "(\\?.+\\?)";
        Pattern p = Pattern.compile( strPattern );
        Matcher m = p.matcher( str );

        if ( !m.find() || m.groupCount() == 0 )
        { // Contains no matching sequence.
            return str;
        }

        for ( int i = 1; i <= m.groupCount(); i++ )
        {
            m.appendReplacement( result, 
                m.group( i ).replaceAll( "\\s", "" ) );
        }

        return result.toString();
    }
Joseph Gordon