tags:

views:

109

answers:

4

Hey guys, I don't know RegExp yet. I know a lil about it but I'm not experience user.

Supposed that I run a RegExp match on a website, the matches are:

Data: Informations
Data: Liberty

Then I want to extract only Informations and Liberty, I don't want the Data: part.

+1  A: 

Can't be absolutely sure without knowing more about the potential matches, but this should be at least a good starting point:

Data: (.*)$

That will return everything after "Data: " to the end of the line.

Chad Birch
A: 

Search for a regular expression like

Data: (.*)

Then use the "first submatch", which is often referred to by "$1" or "\1", depending on the language you are using.

antti.huima
+1  A: 
  1. Does Data: always appear at the begining of a line?
  2. Can there be multiple spaces between the : and the next word?
  3. Do you know about groups?
  4. What do you want: lazy matching vs greedy matching?

If so, you can use (with lazy matching):

^Data:\s+(.*?)$

With character classes:

^Data:\s+(\w+)$

if you know that it'll always be a word. Try this website.

dirkgently
Hi, I actually reading the tutorials in http://www.regular-expressions.info/Thanks for giving me a wonderful tip and a wonderful link.Raymond
Raymond G.
BTW, which is better? Lazy or Greedy matching? I know that with Lazy matching I will have more control over my search. Greedy matching tends to be out of control
Raymond G.
Depends on the context. Lazy matching in some cases gives better performance. See: http://en.wikipedia.org/wiki/Regular_expression#Lazy_quantification (comments don't have a better link representation -- so sorry!)
dirkgently
@dirk, the `\w` in your second regex matches exactly one word _character_; to match a whole word you need `\w+`.
Alan Moore
there were two typo's in that region and nearabout. thanks a lot Alan.
dirkgently
A: 

Regular expression engines support what are commonly called "capturing groups". If you surround a pattern or part of a pattern with (), the part of the string matched by that part of the regular expression will be captured.

The command(s) you use to do the matching will determine how to get these captured values. They may be stored in special variables (eg: $1, $2) or you may be able to specify the names of the variables either embedded within the regular expression or as arguments to the regular expression command. Exactly how depends on what language you are using.

So, read up on the regexp commands for the language of your choice and look for the term "capturing groups" or maybe just "groups".

Bryan Oakley