views:

419

answers:

5

Hi!

I've got a question about testing methods working on strings. Everytime, I write a new test on a method that has a string as a parameter.

Now, some issues come up:

  • How to include a test string with \n, \r, \t, umlauts etc?
  • How to set the encoding?
  • Should I use external files that are opened by a FileInputStream? (too much overhead, imho)

So... what are your approaches to solve this?

+1  A: 
  • If you have a lot of them, keep test strings in separate class with string consts
  • Try not to keep the files on disk unless you must. I agree with your claim - this brings too much overhead (not to mention what happens if you start getting I/O errors)
  • Make sure you test strings with different line breaks (\n, \r\n, \r\n\r) for different OSs
Yuval A
So you propose to use string literals in unit tests? Even if they are loooong (e.g. 200 lines?)
furtelwart
It's a matter of convenience. If you believe they are long enough to put in an external file, and start dealing with I/O in tests then do that. Otherwise, yes, keep them in the tests, but organize them nicely.
Yuval A
+2  A: 

How to include a test string with \n, \r, \t, umlauts etc?

Um... just type it the way you want? You can use \n, \r and \t, umlauts stc. in Java String literals; if you're worried about the encoding of the source code file, you can use Unicode escape sequences, and you can produce them using the native2ascii tool that comes with the JDK.

How to set the encoding?

Once you have a Java String, it's too late to worry about encodings - they use UTF-16, and any encoding problems occur when translating between Strings and byte arrays (unlike C, Java keeps these concepts clearly separate)

Edit: If your Strings are too big to be comfortably used in source code or you're really worried about the treatment of line breaks and white space, then keeping each String in a separate file is probably best; in that case, the encoding must be specified when reading the file (In the constructor of InputStreamReader)

Michael Borgwardt
A: 

You could use a scripting language to code your tests.

JRuby and Groovy support HERE documents that make it easier to define a big string that spans multiple lines

# In JRuby
mystring = <<EOS
This is a long string that
spans multiple lines.
EOS

# In Groovy
def mystring = """This is a long string that
spans multiple lines."""

This will also make your test code more easy to write as both languages have a lot of shortcuts that help write simpler code (but some might say less robust which does not matter as much if it is only unit testing code).

Michel
I don't get your idea. Why should I _script_ a test for a java class?
furtelwart
Not script, but to write the actual test in Groovy/JRuby. Not sure about JRuby, but you have java interoperability in Groovy, and thus can test you classes using Groovy.
Chii
A: 

If you repeatedly use characters that are difficult to express in literal Strings (such as ", \, characters not in [ -~]), then you might want to consider doing a quick find-and-replace on the string before using it. For instance, if you use \ a lot, then you might wirte a function to exchange \ and /. You might use a multi-character sequence to represent accented characters.

However, there is obvious danger in ending up with a solution out of proportion to the problem. Sometimes \u#### is just easier.

If you are going for non-Java files, I suggest opening them as resources (Class.getResourceAsStream/getResource) rather than as loose files.

Tom Hawtin - tackline
+1  A: 

For LARGE strings, I would use files. The performance is plenty fast enough for unit tests. For that small trade-off, you:

  1. Don't have to worry about escaping characters
  2. Can diff the content in source control
  3. Can validate the documents independently (ie, xml/html)
Chase Seibert