tags:

views:

115

answers:

7

I have a string in what is the best way to put the things in between $ inside a list in java?

String temp = $abc$and$xyz$;

how can i get all the variables within $ sign as a list in java [abc, xyz]

i can do using stringtokenizer but want to avoid using it if possible. thx

+9  A: 

Maybe you could think about calling String.split(String regex) ...

Riduidel
Its the recommended way to go ! StringTokenizer is there just for backwards compatibility.
Tom
A: 

If you want a simple split function then use Apache Commons Lang which has StringUtils.split. The java one uses a regex which can be overkill/confusing.

Mike Q
+1  A: 

Just try this one:temp.split("\\$");

khotyn
the problem is the prefix and suffix is $ so i want the string between 1st $ and 2nd $
Shah
sorry for misunderstanding. I think regex can handle your problem
khotyn
+1  A: 

I would go for a regex myself, like Riduidel said.

This special case is, however, simple enough that you can just treat the String as a character sequence, and iterate over it char by char, and detect the $ sign. And so grab the strings yourself.

On a side node, I would try to go for different demarkation characters, to make it more readable to humans. Use $ as start-of-sequence and something else as end-of-sequence for instance. Or something like I think the Bash shell uses: ${some_value}. As said, the computer doesn't care but you debugging your string just might :)

As for an appropriate regex, something like (\\$.*\\$)* or so should do. Though I'm no expert on regexes (see http://www.regular-expressions.info for nice info on regexes).

extraneon
Whether or not human-readable delimiters matter depends on whether humans will ever read these strings! If you're asking a user to type these in, then yes, it's a curious delimiter. If this is something used internally or passed between modules, then it doesn't matter if it's human-readable.
Jay
@Jay a developer is also human. If it is a template and it needs change it better be readable, just like other code.
extraneon
@extraneon: "developer is also human". Really? Wow, not around here.
Jay
+4  A: 

The pattern is simple enough that String.split should work here, but in the more general case, one alternative for StringTokenizer is the much more powerful java.util.Scanner.

    String text = "$abc$and$xyz$";
    Scanner sc = new Scanner(text);

    while (sc.findInLine("\\$([^$]*)\\$") != null) {
        System.out.println(sc.match().group(1));
    } // abc, xyz

The pattern to find is:

\$([^$]*)\$
  \_____/     i.e. literal $, a sequence of anything but $ (captured in group 1)
     1                 and another literal $

The […] is a character class. Something like [aeiou] matches one of any of the lowercase vowels. [^…] is a negated character class. [^aeiou] matches one of anything but the lowercase vowels.

(…) is used for grouping. (pattern) is a capturing group and creates a backreference.

The backslash preceding the $ (outside of character class definition) is used to escape the $, which has a special meaning as the end of line anchor. That backslash is doubled in a String literal: "\\" is a String of length one containing a backslash).

This is not a typical usage of Scanner (usually the delimiter pattern is set, and tokens are extracted using next), but it does show how'd you use findInLine to find an arbitrary pattern (ignoring delimiters), and then using match() to access the MatchResult, from which you can get individual group captures.

You can also use this Pattern in a Matcher find() loop directly.

    Matcher m = Pattern.compile("\\$([^$]*)\\$").matcher(text);
    while (m.find()) {
        System.out.println(m.group(1));
    } // abc, xyz

Related questions

polygenelubricants
See example of a typical way to match `"quoted"` contents like `'this'` and `"o'my"` with regex ( http://stackoverflow.com/questions/3561353/matching-quote-contents/3561377#3561377 ) - you can do this with `Matcher` or `Scanner` as well.
polygenelubricants
+1 Great answer
Helper Method
+1  A: 

Basically I'd ditto Khotyn as the easiest solution. I see you post on his answer that you don't want zero-length tokens at beginning and end.

That brings up the question: What happens if the string does not begin and end with $'s? Is that an error, or are they optional?

If it's an error, then just start with:

if (!text.startsWith("$") || !text.endsWith("$"))
  return "Missing $'s"; // or whatever you do on error

If that passes, fall into the split.

If the $'s are optional, I'd just strip them out before splitting. i.e.:

if (text.startsWith("$"))
  text=text.substring(1);
if (text.endsWith("$"))
  text=text.substring(0,text.length()-1);

Then do the split.

Sure, you could make more sophisticated regex's or use StringTokenizer or no doubt come up with dozens of other complicated solutions. But why bother? When there's a simple solution, use it.

PS There's also the question of what result you want to see if there are two $'s in a row, e.g. "$foo$$bar$". Should that give ["foo","bar"], or ["foo","","bar"] ? Khotyn's split will give the second result, with zero-length strings. If you want the first result, you should split("\$+").

Jay
A: 

You can do it in simple manner writing your own code. Just use the following code and it will do the job for you

import java.util.ArrayList; import java.util.List;

public class MyStringTokenizer {

/**
 * @param args
 */
public static void main(String[] args) {

    List <String> result = getTokenizedStringsList("$abc$efg$hij$");

    for(String token : result)
    {
        System.out.println(token);
    }

}

private static List<String> getTokenizedStringsList(String string) {

    List <String> tokenList = new ArrayList <String> ();

    char [] in = string.toCharArray();

    StringBuilder myBuilder = null;
    int stringLength = in.length;
    int start = -1;
    int end = -1;
    {
        for(int i=0; i<stringLength;)
        {
            myBuilder = new StringBuilder();
            while(i<stringLength && in[i] != '$')
                i++;
            i++;
            while((i)<stringLength && in[i] != '$')
            {

                myBuilder.append(in[i]);
                i++;
            }
            tokenList.add(myBuilder.toString());                
        }
    }
    return tokenList;
}

}

Saurabh