views:

58

answers:

2

is it possible to get a regular expression that do:

  1. find first occurence of some word in a string
  2. return a substring of a nr of letters before and after the occurence
  3. but if it encounters a . (dot) before the nr of letters before and after occurence, it will just return the sub string before/after the dot.
  4. return whole words

example:

"Anyone who knows how to do this. Create a program that inputs a regular expression and outputs strings that satisfy that regular expression. And bla bla"

if the keyword is 'program' and we put nr of letters to 20 it will return 20 letters before and after 'program'. But since it encounters a dot before it gets to 20 letters it will stop there.

"Create a program that inputs a regular..."

Is this possible with regexp? what php function do i have to use? is there any finnished script for this? I guess its a quite basic need when showing search results. Someone already got the function to share?

A: 
[^.]{0,MAXCHARS}wordtofind[^.]{0,MAXCHARS}

Replace MAXCHARS with the number corresponding to the maximum number of characters you want on each side.

The [^.] pattern matches any character that's not a period, and the {0,MAXCHARS} qualifier matches anywhere from 0 to MAXCHARS of those characters.

Amber
it says unknown modifier '{'.and shouldnt one escape the . ? should i use preg_match for this?
weng
Did you actually replace `MAXCHARS` with a number? The `.` does not need to be escaped because it's inside a character class definition (inside `[]`). You should be able to use your regex match function of choice; if it's a perl-compatible one such as `preg_match` you'll need to put delimiters around the regex, such as `/regexhere/`.
Amber
+1  A: 

Here's Dav's regular expression in php:

<?php
  $str = "Anyone who knows how to do this. Create a program that inputs a regular expression and outputs strings that satisfy that regular expression. And bla bla";
  $key = "program";
  $lim = 20;
  $reg = "/([^.]{0,{$lim}})({$key})([^.]{0,{$lim}})/"; // /[^.]{0,20}program[^.]{0,20}/

  $res = preg_match($reg, $str, $matches);
  echo $matches[0];

  print_r($matches); // $matches[1] is the pre-text, and $matches[3] is the post-text

The trickiest requirement is #4: "return whole words". One way you could handle this while still making use of the above regular expression is pull more text before and after than you really want (say 40 chars). Then you could preg_split the before and after text on whitespace, which would give you two arrays of words. Run the arrays through a function that gives you back a subset of the array where the total length of all the words is less than your limit of 20...

Ben Marini