views:

240

answers:

5

My first attempt using RE has me stuck. I'm using Regex on a Wordpress website via the Search-Regex Plugin and need to match on a specific " buried within a bunch of html code. HTML example:

provide brand-strengthening efforts for the 10-school conference.&#0160; </p>
<p>
   <a href="http://www.learfield.com/oldblog/.a/6a00d8345233fa69e201157155a6fc970c-pi"&gt;
   <img alt="MOvalleyConf500" 
        border="0" 
        class="at-xid-6a00d8345233fa69e201157155a6fc970c"
        src="http://www.learfield.com/oldblog/.a/6a00d8345233fa69e201157155a6fc970c-800wi" 
        style="border: 1px solid black; margin: 0px; width: 502px; height: 384px;"             
        title="MOvalleyConf500" />
   </a>
</p>
<p>The photo above

In the above example, there are three targets

6a00d8345233fa69e201157155a6fc970c-pi"
6a00d8345233fa69e201157155a6fc970c"
6a00d8345233fa69e201157155a6fc970c-800wi"

The Regex I'm using is /6a00d834.*?"/ it locates them, however I only want to match on the ending " and not the entire string. These are images that are missing their file extension, so I need to replace the ending " with .jpg" I understand the replacement part of the expression, it's the initial matching I'm having trouble with.

I have a bunch of these (221), all the targets all begin with 6a00d834 then some random alphanumeric ending with a "

Appreciate any insight. Thanks.

Edit added from OP's comment: Actually it's on a Wordpress site using a plugin (REGEX) to query and replace data within SQL. I can use any Perl compatible regex. (Note from editor - depending on the plugin, this is most likely not actually using Perl but PHP's implementation of PCRE.)

A: 

Perhaps use a group operator?

/6a00d834.*?(")/

Then, depending on your regex API, you can pull out just what is matched in the parens.

Edit

Ah, you want to do string replacement. I'll guess you're using Perl. Try this:

s/(6a00d834.*?)(")/\1.jpg\2/
Michael Donohue
This is kind of redundant though, since you'd always get a double qoute from that grouping.
You
Yeah, but I thought that was what he wanted: 'however I only want to match on the end " and not the entire target string.'
Michael Donohue
All due respect, but the OP may not be the best judge of what he _really_ wants, if you know what I mean.
Telemachus
+2  A: 

Wouldn't this work?

/(6a00d834.*?)"/

Edit: You said in one of your comments you wanted to replace the " with .jpg"; in that case this regexp would probably work:

/6a00d834.*?(")/

However, the best thing to do is probably to use the first regexp I provided, and use a replacement string that looks like this:

'\\1.jpg"'

Of course, \\1 has to be replaced with whatever you particular regexp engine uses for backreferences.

You
@Michael: I probably nested it incorrectly. Is it correct now?
You
thanks guys, /(6a00d834.*?)"/ is matching the same as /6a00d834.*?"/ for me. In the three example string I posted, I only was to hit on the ending " These are image links that are missing their extension, so I want to replace the " with .jpg" hope that helps clear it up.
Phil Atkinson
@Phil: replacing the `"` feels kind of backwards; try appending `.jpg` the match of the first result of the first regexp I posted.
You
+1  A: 

You question is not entirely clear, but perhaps you mean:

/6a00d834[^"]*"/

(That is: match 6a00d834 followed by zero or more characters that are not a " followed by a ")

Alternatively, if it is available in the regex engine you are using, you can use a non-greedy specifier to limit the '*' meta-character. Keep in mind that any question about regex's is dependent on the engine you are using. For example:

$ cat input
6a00384foo" more"
$ perl -ne '/(6a00384[^"]*")/; print "$1\n"' input
6a00384foo"
$ perl -ne '/(6a00384.*?")/; print "$1\n"' input
6a00384foo"
$ sed 's/\(6a00384[^"]*"\).*/\1/' input
6a00384foo"
$ sed 's/\(6a00384.*?"\).*/\1/' input
6a00384foo" more"

Notice that the '?' does not serve as a non-greedy specifier in sed.

William Pursell
the non-greedy regex Phil is using already accomplishes this. That's the '*?' part of the regex.
Michael Donohue
Using ? to indicate non-greedy is perl specific. Using [^"] is more general.
William Pursell
[^"] is harder to read and maintain though
Michael Donohue
Sometimes portability is more important. Response edited to clarify this point.
William Pursell
A: 

I assume that you want to extract everything after 6a00d834 up to the first following ". So try this:

/6a00d834([^"]*)"/

The match of first grouping will than be the string you are looking for.

Gumbo
I only wanting to extract the " at the end of a string that begins with 6a00d834<some unknow number of alphanumerics>"
Phil Atkinson
Why would you want to do that?
Gumbo
to replace the " with .jpg" the are image links that are mising the file extension
Phil Atkinson
Why didn’t you say that in the first place? And what language do you use?
Gumbo
Phil is already using a non-greedy match, so to get this closer to his regex it would be: /6a00d834(.*?)"/
Michael Donohue
+4  A: 

String replacement can be done along with the matching. Since you're using PHP, use preg_replace

$newstring = preg_replace("/(6a00d834.*?)(\")/", "\\1.jpg\\2", $oldstring)

This breaks the match into two groups, and then inserts '.jpg' between them.

For the wordpress regex plugin, use /(6a00d834.*?)(")/ for the match string, and then use \1.jpg\2 for the replacement string.

Michael Donohue
That did it. Thanks Michael. Sorry the initial question was so unclear. I'll try to be more specific next time.Community: thanks so much for your assistance. You guys are great!
Phil Atkinson
Michael was able to make sense of this and provide the solution. Thanks to all, but especially Michael for sticking it out!
Phil Atkinson