views:

187

answers:

1

I wonder how to configure Quartz scheduled job threads to reflect proper encoding. Code which otherwise executes fine within Springframework injection loaded webapps (java) will get encoding issues when run in threads scheduled by quartz.

Is there anyone who can help me out? All source is compiled using maven2 with source and file encodings configured as UTF-8.

In the quartz threads any string will have encoding errors if outside ISO 8859-1 characters:

Example config

  <bean name="jobDetail" class="org.springframework.scheduling.quartz.JobDetailBean">
    <property name="jobClass" value="example.ExampleJob" />
  </bean>

  <bean id="jobTrigger" class="org.springframework.scheduling.quartz.SimpleTriggerBean">
    <property name="jobDetail" ref="jobDetail" />
    <property name="startDelay" value="1000" />
    <property name="repeatCount" value="0" />
    <property name="repeatInterval" value="1" />
  </bean>

  <bean class="org.springframework.scheduling.quartz.SchedulerFactoryBean">
    <property name="triggers">
      <list>
        <ref bean="jobTrigger"/>
      </list>
    </property>
  </bean>

Example implementation

public class ExampleJob extends QuartzJobBean {

    private Log log = LogFactory.getLog(ExampleJob.class);

    protected void executeInternal(JobExecutionContext ctx) throws JobExecutionException {
        log.info("ÅÄÖ");
        log.info(Charset.defaultCharset());
    }
}

Example output

2010-05-20 17:04:38,285  1342 INFO  [QuartzScheduler_Worker-9] ExampleJob - ÅÄÖ
2010-05-20 17:04:38,286  1343 INFO  [QuartzScheduler_Worker-9] ExampleJob - UTF-8

The same lines of code executed within spring injected beans referenced by servlets in the web-container will output proper encoding.

What is it that make Quartz threads encoding dependent?

+2  A: 

I haven't seen the ÅÄÖ pattern before. This doesn't fit in the usual malform patterns using any of the ISO-8859 charsets I am aware of. Since you were talking about Mac OS Roman in one of your comments, I investigated its codepage and came to the conclusion that this encoding is been used incorrectly somewhere.

The string ÅÄÖ is composed of the following UTF-8 bytes:

String s = "ÅÄÖ";
for (byte b : s.getBytes("UTF-8")) {
    System.out.printf("0x%X ", b); // 0xC3 0x85 0xC3 0x84 0xC3 0x96 
}

The codepage learns me that 0xC3 indeed stands for and that 0x85, 0x84 and 0x96 stands for Ö, Ñ and ñ respectively in the Mac OS Roman encoding.

Since you told that it works fine when used in servlets and that both uses the same logging appender, the logging output can be excluded from being suspect. I can now then think of only one cause: the file with those characters is been saved using the Mac OS Roman encoding instead of UTF-8. It's unclear which editor you're using on Mac, but both Notepad and Eclipse would have displayed a warning message about that and reopening the file in the editor should have shown the same malformed characters.

Which editor are you using? Is it explicitly configured to save files using UTF-8 encoding?


Update: since that doesn't seem to be the cause of the problem, let's go back to the fact that it works fine when using servlets. How exactly did you test it? Didn't you accidently enter those characters using Mac OS Roman encoding so that it would end up correctly when the logger is after all probably configured to use Mac OS Roman? Where does the logger log to? The command console or a logfile? How are they encoded? What does encoding detectors tell about the file encoding? (sorry, I don't do Mac, but Editplus/Notepad++ in windows for example can detect/autoguess file encoding and tell about it).

BalusC
Files are saved in UTF-8, and platform encoding is not the issue. It will output the same result on any platform/default encoding. The pattern can be corrected by new String("åäö".getBytes("UTF-8")), but I want the saved UTF-8 encoding of the file to be reflected in both quartz threads and request threads in tomcat. It is only quartz which fails to recognize encoding.
Martin
(The editor is Eclipse, and the files are UTF-8)
Martin
Another idea that suggests the source files are wrongly interpreted by the Quartz threads: if I use a database to load my data it will have correct encoding.Anyone got any further on this, or reproduced my problem?
Martin
Eclipse is setup with UTF-8 for the project. Using 'file -I {file}' I get different intersecting charsets of UTF-8 for different sets of characters. Thus I think I've validated that eclipse outputs UTF-8. The logger is configured the same for both contexts (same property file): log4j.appender.A1.Encoding=UTF-8
Martin
I'll make a better example, so that we don't confuse ourselves with irrelevant configuration. :DBy the way, if I run the code inside Eclipse everything is ok as well, it's only in Quartz.
Martin