tags:

views:

103

answers:

5

Hi,

I am looking for a Java 5 lbrary which let me compare some text as following lines returns true:

  • " foo bar " == "foo bar"
  • "foo\nbar" == "foo bar"
  • "foo\tbar" == "foo bar"
  • "féé bar" == "fee bar"
  • and so on...

Any suggestions?

+2  A: 

You can use regular expression to compare pattern and ...

http://java.sun.com/docs/books/tutorial/essential/regex/
http://www.regular-expressions.info/java.html

SjB
+1  A: 

I don't think you'll find a library with these specific rules. You'll have to code them yourself. For some of the rules, regular expressions or even the String framework methods can be useful, but, for the last rule, I think you'll have to keep a Map of equality for those special chars. Then, you'll have to iterate through each char in the string comparing them using this Map. And, since you're iterating already through the string maybe you could apply all the rules in one iteration, avoiding regular expressions.

bruno conde
There is no perfect answer, but yours is exhaustive. But instead of a map, I use a set of char in regex (f.i. [êéèë]) with replaceAll.
enguerran
+1  A: 

Sounds like you want to write a method to "normalize" your strings according to your rules, before comparing them. Use trim for the first rule, a number of replace, or maybe StringUtils.replaceChars(), for the others.

Mirko Nasato
+1  A: 

It doesn't have your specified functionality directly, but you may also be able to use the CharMatcher functions found in the google-guava library: http://code.google.com/p/guava-libraries/

Chris Winters
+1  A: 

There appear to be functions in the ICU library to remove diacritical marks:

http://site.icu-project.org/

The rest you can probably do with one or more regular expressions.

Sarah K
Shoot, in JDK 1.6, you can use java.text.Normalizer to remove the diacriticals! Previously this was a Sun internal class.
Sarah K