views:

206

answers:

1

I have the following string:

<img alt="over 40 world famous brandedWATCHES BRANDs to choose from
" src="http://www.fastblings.com/images/logo.jpg"&gt;&lt;/strong&gt;&lt;/a&gt;&lt;br&gt;

I want to define a regex pattern like <img alt="(.+?)" src="http://(.+?).(jpg|gif)"&gt;, but as you can see the target string has a linebreak in the alt attribute - so how can i incorporate this? the rule should be like "anything in the alt-attribute including linebreaks".

+4  A: 

By default, the . wildcard operator does not match newline characters (\n, \r). In other languages, there is a DOTALL mode (sometimes called single line mode) to make . match anything. Javascript doesn't have it for some reason. If you want the equivalent use [\s\S], which means any character is white space or is not white space so:

/<img alt="([\s\S]+?)" src="http:\/\/(.+?)\.(jpg|gif)">/

See Javascript regex multiline flag doesn’t work.

Also I escaped the . before jpg|gif otherwise it'll match any character and not the . that you intend.

That being said, parsing HTML with regexes is a really bad idea. What's more, unless there is relevant detail missing from your question, you can do this easily with jQuery attribute selectors:

$("img[src='http://.*\.gif|jpg']").each(function() {
  var alt = $(this).attr("alt");
  var src = $(this).attr("src");
  ...
});

Or if you want there to be an alt attribute:

$("img[alt][src='http://.*\.gif|jpg']").each(function() {
  var alt = $(this).attr("alt");
  var src = $(this).attr("src");
  ...
});
cletus
@Felix: you're quite right. Got my wires crossed. Fixed now.
cletus
@cletus: Deleted my comment as I thought it works nevertheless as the accepted answer uses the same approach. Anyway good to hear that I didn't miss some obvious point ;) Btw +1 from me.
Felix Kling