tags:

views:

37

answers:

4

hi, i have a document and i want copy group. start document and end document e.g.

bold*Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum vitae dapibus tortor. Duis odio massa, viverra quis vestibulum nec, tincidunt ac tellus.*bold
Ut id enim sapien, ut varius dolor. Curabitur ipsum dolor, consectetur quis fermentum ut,
bold*aliquam nec felis. Praesent sed malesuada sem. Integer cursus lectus ac eros aliquet rutrum.*bold

i only want match lorem ipsum until tellus, and discard middle line, and match aliquam until rutrum, how i can do that?

+1  A: 

Start with an expression that matches the pieces you care about:

/lorem ipsum(.*?)tellus(.*?)aliquam(.*?)retrum/

Now the first and third sub-pattern, concatenated together, are your final content.

In some flavors of regular expression you can get the middle sub-pattern not to count — in Perl's flavor (and PHP's preg) it's (?:.*?).

VoteyDisciple
i test that in rubular, but doesnt work:http://www.rubular.com/r/U0SBv3zV6W, can help me one more time?i try modify this regexp to make 2 groups and work but doesnt owrk too
Stefhan
Rubular already includes the slashes that delimit the regular expression; you can't paste them into the expression itself. You also need the `m` and `i` flags to account for case insensitivity and multiline text. And finally I wrote `retrum` but the text has `rutrum`. http://www.rubular.com/r/2hZ8xeKS9e
VoteyDisciple
A: 

If you're looking for first and last line (its not clear (at least to me) what you mean by first and last part), the following regex will capture first line in $1 and last line in $2 (provided there are at least two lines)

 \A([^\n]+)[\s\S]+([^\n]+)\Z
Amarghosh
see bold, i want match first text part...and end text part...not line.
Stefhan
+1  A: 

If the groups you want are always separated in blocks, like the paragraphs in your example you can find all occurrences of that block, probably using the newline as the ending item, and then display the first and last numbered matches.

Or do you need the actual RegEx to match those blocks? If so, first of all I recommend http://rubular.com/ for testing out RegEx stuff since it is in real time it makes it easier to see how things affect it.

Knowing what language are you doing this with or if it is a cli kind of search, i.e. egrep, would help some in the answer.

LokNessMobster
hi want 2 groups, one start and other end, i am using rubular but i am newbie in regexp i tried several times, need some help =/...i am using java
Stefhan
+2  A: 

In Perl, you can do:

#!/usr/bin/perl 
use 5.10.1;
use warnings;
use strict;

my $str = q!Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum vitae dapibus tortor. Duis odio massa, viverra quis vestibulum nec, tincidunt ac tellus.
Ut id enim sapien, ut varius dolor. Curabitur ipsum dolor, consectetur quis fermentum ut,
aliquam nec felis. Praesent sed malesuada sem. Integer cursus lectus ac eros aliquet rutrum.!;

$str =~ /\A(.+)[\s\S]+?(.+)\Z/;
say '$1 = ',$1;
say '$2 = ',$2;

Output:

$1 = Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum vitae dapibus tortor. Duis odio massa, viverra quis vestibulum nec, tincidunt ac tellus.
$2 = aliquam nec felis. Praesent sed malesuada sem. Integer cursus lectus ac eros aliquet rutrum.

Explanation:

/         : begin of regex
 \A       : begining of string
 (        : begining of group 1
  .+      : any char except newline one or more time
 )        : end of group 1
 [\s\S]   : any char including newlines
   +?     :   one or more time non greedy
 (        : begining of group 2
  .+      : any char except newline one or more time
 )        : end of group 2
 \Z       : end of string
/         : end of regex

Sure this can be adapted to others languages.

M42