tags:

views:

130

answers:

8

Hi,

I need to be able to parse strings like these:

kev-+kvs+-one-+gdl+-greg-+kvs+-two-+gdl+-les-+kvs+-three

-+gdl+-kev-+kvs+-one-+gdl+-greg-+kvs+-two-+gdl+-les-+kvs+-three

kev-+kvs+-one-+gdl+-greg-+kvs+-two-+gdl+-les-+kvs+-three-+gdl+-

and in all three cases recognise these three groups:

kev-+kvs+-one

greg-+kvs+-two

les-+kvs+-three

In other words, it should use the string -+gdl+- to split the string.

Assume that the sequence -+gdl+- will not occur except as a delimiter.

How would I write regex for that?

A: 
.*?\-\+gdl\+\-.*?
Doug D
Why are you escaping the `-`, and why did you include the `.*?` in your regex?
Bart Kiers
A: 

Most regex libraries have split function. You just call this function with argument +gdl+ and it returns you an array. Details vary from language to language.

However, you don't even need regex. Many language libraries will have a function to split on "+gdl+". What language are you using?

yu_sha
As I said in my comment above to Andreas, I'm aware that I can split this. I wanted to know if it was possible to do using regex. Are you saying that it's not possible?
Shoko
A: 

can you use a function like perl's split? if so it's quite easy

@results = split /\-\+gdl\+\-/, $yourWeirdString
john
The `-` needs no escaping.
Bart Kiers
+1  A: 

You dont have to write a regexp for that just a split with the string you want as a separator, and you will get the field(s) wanted.

An example but i dont know what language you use

 "kev-+kvs+-one-+gdl+-greg-+kvs+-two-+gdl+-les-+kvs+-three".split("-+gdl+-")
Patrick
In some languages, the only split(...) method there is, takes a regex (not a plain string). Take Java for example.
Bart Kiers
A plain string is a regex which happens to have no metacharacters
kemp
Sigh... (15 char fill-up)
Bart Kiers
String.split("-\+gdl\+-") should work just fine in Java, you just have to escape it properly.
Daniel Bruce
+1  A: 

In short, the regular expression you need is this:

-\+gdl\+-

The following Java code can do this, printing out the number of tokens and the tokens themselves:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Regex {
    public static void main(String[] args) {
     String text = "kev-+kvs+-one-+gdl+-greg-+kvs+-two-+gdl+-les-+kvs+-three";
     String regex = "-\\+gdl\\+-";
     Pattern p = Pattern.compile(regex);
     String[] tokens = p.split(text);
     System.out.println("Found " + tokens.length + " tokens");
     for (String token: tokens) {
      System.out.println("Found " + token);
     }
    }
}
Sevas
A bit verbose though, the three lines `String regex = "-\\+gdl\\+-"; Pattern p = Pattern.compile(regex); String[] tokens = p.split(text);` could simply be written as: `String[] tokens = text.split("-\\+gdl\\+-");`
Bart Kiers
A: 

You can solve this with a regular expression, just use -+gdl+- as the pattern for the split. What needs to be escaped depends on your regex flavor.

EDIT after your comment: you can do it with a match but it adds unnecessary complexity. It also depends on the language, example in PHP

preg_match_all('/(.*?)(?:-\+gdl\+-|$)/', $string, $match);

you'll get empty matches though.

kemp
Thanks, kemp. I didn't know that $ could stand alone as part of a regular expression.
Shoko
A: 

I'm not sure what language you're looking for, but in Ruby you can just use [String#split][1] (and you don't need a regexp, a simple string parameter will do):

>> strings = ["kev-+kvs+-one-+gdl+-greg-+kvs+-two-+gdl+-les-+kvs+-three",
              "-+gdl+-kev-+kvs+-one-+gdl+-greg-+kvs+-two-+gdl+-les-+kvs+-three",
              "kev-+kvs+-one-+gdl+-greg-+kvs+-two-+gdl+-les-+kvs+-three-+gdl+-"]
>> split = strings.map {|s| s.split "-+gdl+-"}
=> [["kev-+kvs+-one", "greg-+kvs+-two", "les-+kvs+-three"], 
    ["", "kev-+kvs+-one", "greg-+kvs+-two", "les-+kvs+-three"], 
    ["kev-+kvs+-one", "greg-+kvs+-two", "les-+kvs+-three"]]

Note that this does have the problem of introducing null fields at the beginning or in the middle of your string. If you don't want any null fields, you'll probably have to filter those out afterwards:

>> split.map {|a| a.reject {|s| s == ""}}
=> [["kev-+kvs+-one", "greg-+kvs+-two", "les-+kvs+-three"], 
    ["kev-+kvs+-one", "greg-+kvs+-two", "les-+kvs+-three"], 
    ["kev-+kvs+-one", "greg-+kvs+-two", "les-+kvs+-three"]]

If you're not familiar with ruby, the map part is simply applying the same thing to each item in the array, so I can demonstrate how this applies to all of our examples.

Brian Campbell
A: 

I am not sure what programming language you are using. If you are using high level programming languages such as java, python its pretty easy, as most of them specified, you would find a split function.

If you are using command line such as bash prompt, i would use sed

$ str="kev-+kvs+-one-+gdl+-greg-+kvs+-two-+gdl..."

$ for i in `echo $str | sed 's/-+gdl+-/ /g' `; do echo $i; done

kev-+kvs+-one

greg-+kvs+-two

les-+kvs+-three

kev-+kvs+-one

greg-+kvs+-two

les-+kvs+-threekev-+kvs+-one

greg-+kvs+-two

les-+kvs+-three

Or in Perl you can do slightly differently

$ echo $str | perl -pe 's/(.*?)-+gdl+-/$1\n/g'

chinmaya