tags:

views:

82

answers:

4

I'm working in Java and having trouble matching a repeated sequence. I'd like to match something like:

a.b.c.d.e.f.g.

and be able to extract the text between the delimiters (e.g. return abcdefg) where the delimiter can be multiple non-word characters and the text can be multiple word characters. Here is my regex so far:

([\\w]+([\\W]+)(?:[\\w]+\2)*)

(Doesn't work)

I had intended to get the delimiter in group 2 with this regex and then use a replaceAll on group 1 to exchange the delimiter for the empty string giving me the text only. I get the delimiter, but cannot get all the text.

Thanks for any help!

A: 

Why not use String.split?

KennyTM
The problem is that the text will occur within a larger body that won't have the regular pattern.
Eric
I guess you need to modify your example to show the irregularity. For now I still don't see why `"yourStr".split(/\W+/)` isn't suffice.
KennyTM
A: 

Replace (\w+)(\W+|$) with $1. Make sure that global flag is turned on.

It replaces a sequence of word chars followed by a sequence of non-word-chars or end-of-line with the sequence of words.

String line = "Am.$#%^ar.$#%^gho.$#%^sh";
line = line.replaceAll("(\\w+)(\\W+|$)", "$1");
System.out.println(line);//prints my name
Amarghosh
Trying line = line.replaceAll("([\\w]+)([\\W]+)", "\1"); but it is producing only the string "g" (the last letter of the input)
Eric
use `$1` for replacement. `\1` is for backreferences within the regex - that was a typo :(
Amarghosh
see the update for code sample
Amarghosh
Check @Ruben's answer. `(\\w+)\\W+` will be enough - last part needn't be replaced. <looks for something to hit me on the head with>
Amarghosh
A: 

Why not ..

  • find all occurences of (\w+) and then concatenate them; or
  • find all non word characters (\W+) and then use Matcher.html#replaceAll with an empty string?
The MYYN
There are some non-word characters in the input that I care about so replacing all of them will not work as I desire. I need to only strip them when they follow this particular pattern. For a sequence of 4 characters or more (e.g. a.b.)
Eric
+1  A: 

Replace (\w+)\W+ by $1

Rubens Farias
oops... u r right, this is enough. What was I thinking matching the last part with a `$` and making it look more complicated than it should be? +1 :)
Amarghosh