views:

992

answers:

8

Is there any way to use raw strings in Java (without escape sequences)?

(I'm writing a fair amount of regex code and raw strings would make my code immensely more readable)

I understand that the language does not provide this directly, but is there any way to "simulate" them in any way whatsoever?

+4  A: 

No (quite sadly).

jsight
+8  A: 

No, there isn't.

Generally, you would put raw strings and regexes in a properties file, but those have some escape sequence requirements too.

stevedbrown
+2  A: 

String#getBytes() exposes a copy of the internal byte array contained in every single String object which actually contains the 16-bit UTF-16 encoded String - the byte array will contain the same string converted to match the platform's default charset. What I'm saying is that I think this is as close to "raw" string as you can ever get in Java.

Esko
You should use getBytes() with the charsetName, the String may not have the same encoding as the platform
Rich Seller
Any decent IDE has a property file editor which can handle all the nasty escaping. E.g. Elicpse
Thorbjørn Ravn Andersen
Rich Seller: According to javadocs it should match the platform default charset, however I wouldn't be surprised if it didn't.
Esko
A: 

You could write your own, non-escaped property reader and put your strings in a resource file.

ShabbyDoo
+1  A: 

I personally consider regex strings data and not code, so I don't like them in my code--but I realize that's impractical and unpopular (Yes, I realize it, you don't have to yell at me).

Given that there is no native way to do this, I can come up with two possibilities (well, three but the third is, umm, unnatural).

So my personal preference would be to just parse a file into strings. You could name each entry in the file and load them all into a hash table for easy access from your code.

Second choice, create a file that will be pre-processed into a java interface; it could escape the regex as it does so. Personally I hate code generation, but if the java file is 100% never human edited, it's not too bad (the real evil is generated files that you are expected to edit!)

Third (tricky and probably a bad idea): You might be able to create a custom doclet that will extract strings from your comments into a text file or a header file at compile time, then use one of the other two methods above. This keeps your strings in the same file in which they are being used. This could be really hard to do correctly, and the penalties of failure are extreme, so I wouldn't even consider it unless I had an overwhelming need and some pretty impressive talent.

I only suggest this because comments are free-form and things within a "pre" tag are pretty safe from formatters and other system uglies. The doclet could extract this before printing the javadocs, and could even add some of the generated javadocs indicating your use of regex strings.

Before downvoting and telling me this is a stupid idea--I KNOW, I just thought I'd suggest it because it's interesting, but my preference as I stated above is a simple text file...

Bill K
Most regexs I have seen are definitely an integral part of the program that uses them and should not be seen as data. You do not want to externalise them any more or less than any other piece of logic in there, such as conditions in if statements.
Thilo
Actually, externalizing conditions is often good as well, that's a lot of what is behind closures. Aren't regexes usually tied to external data though? If so, you certainly want to be able to change them. I guess the point is that you SHOULD externalize everything you can, and the big advantage of regex is that you can.
Bill K
I'm with Thilo on this. Regexes usually define the kind of data specific code is looking for or for analyzing that data. If you externalize it, I have found it is easy for someone to change that without realizing the implications.
Kevin Brock
+2  A: 

Have the raw text file in your class path and read it in with getResourceAsStream(....)

Thorbjørn Ravn Andersen
+1  A: 

( Properties files are common, but messy - I treat most regex as code, and keep it where I can refer to it, and you should too. As for the actual question: )

Yes, there are ways to get around the poor readability. You might try:

String s = "crazy escaped garbage"; //readable version//

though this requires care when updating. Eclipse has an option that lets you paste text in between quotes, and the escape sequences are applied for you. The tactic would be to edit the readable versions first, and then delete the garbage, and paste them in between the empty quotes "".


Idea time:

Hack your editor to convert them; release as a plugin. I checked around for plugins, but found none (try searching though). There's a one-to-one correspondence between escaped source strings and textbox text (discounting \n, \r\n). Perhaps highlighted text with two quotes on the ends could be used.

String s = "##########
#####";

where # is any character, which is highlighted - the break is treated as a newline. Text typed or pasted within the highlighted area are escaped in the 'real' source, and displayed as if they were not. (In the same way that Eclipse escapes pasted text, this would escape typed text, and also display it without the backslashes.) Delete one of the quotes to cause a syntax error if you want to edit normally. Hmm.

mk
A: 

This is a work-around if you are using eclipse. You can automatically have long blocks of text correctly multilined and special characters automatically escaped when you paste text into a string literal

"-paste here-";

if you enable that option in window→preferences→java→Editor→Typing→"Escape text when pasting into a string literal"

Dread