ansaurus

Question

Regular expression to add base domain to directory

Answer 1

A:

Matching regular expression:

(?:src|href)="(http://www\.example\.com/)?.+

Delan Azabani 2010-07-25 04:44:14

Can't get this to work. I tried strDomain = http://www.example.com and RegEx.Pattern = "(?:src|href)=chr(34)(strDomain)?.+" and when I tried strHTMLCode = RegEx.Replace(strHTMLCode) I got an error

2010-07-25 06:35:17

this one doesn't replace, just matches. If there's a RegEx.Match() method, This should return true for all src or href tags in any xhtml document.

Tim 2010-07-25 06:47:13

OK, I solved the problem by using the base reference tag. w3schools.com: http://www.w3schools.com/tags/tag_base.asp . Thanks everyone for their help

2010-07-25 07:33:15

well, that's an approach that just went right past me, lol. Sure, take the easy way out ;o)

Tim 2010-07-25 14:03:41

Answer 2

+2 A:

without knowing the language, you can use the (maybe most portable) substitute modifier:

s/^(src=")([^"]+")$/$1www\.example\.com\/$2/

This should do the following: 1. the string 'src="' (and capture it in variable $1) 2. one or more non-double-quote (") character followed by " (and capture it in variable $2) 3. Substitutes 'www.example.com/' in between the two capture groups.

Depending on the language, you can wrap this in a conditional that checks for the existence of the domain and substitutes if it isn't found.

to check for domain: /www\.example\.com/i should do.

EDIT: See comments:

For PHP, I would do this a bit differently. I would probably use simplexml. I don't think that will translate well, though, so here's a regex one...

$html = file_get_contents('/path/to/file.html');
$regex_match = '/(src="|href=")[^(?:www.example.com\/)]([^"]+")/gi';
$regex_substitute = '$1www.example.com/$2';
preg_replace($regex_match, $regex_substitute, $html);

Note: I haven't actually run this to debug it, it's just off the cuff. I would be concerned about 3 things. first, I am unsure how preg_replace will handle the / character. I don't think you're concerned with this, though, unless VB has a similar problem. Second, If there's a chance that line breaks would get in the way, I might change the regex. Third, I added the [^(?:www\.example\.com)] bit. This should change the match to any src or href that doesn't have www.example.com/ there, but this depends on the type of regex being used (POSIX/PCRE).

The rest of the changes should be fine (I added href=" and also made it case-insensitive (\i) and there's a requirement to make it global (\g) otherwise, it will just match once).

I hope that helps.

Tim 2010-07-25 04:47:34

How would i set this up so it will alter all the html at once. using vbscript for this (dont ask)strHTML = all the cached HTML codestrDomain = domain name Set RegEx = New RegExp RegEx.Pattern = "s/^(src=")([^"]+")$/$1strDomain\/$2/" RegEx.Multiline = True RegEx.Global = True newstrHTML = RegEx.Replace(strHTML)How do i set up the regex in vbscript to just substitute the domain if its not present in the directory. I'm not very good at regex at all. TIA

2010-07-25 05:12:25

I'll be honest, I have never used vb. Also, I'm having trouble "seeing" the code, can you edit your question with a code block to see it better? One more thing, I would add the trailing / to the strDomain variable (if I'm reading that correctly). Then you won't have any weird escaping needs.

Tim 2010-07-25 05:16:59

I guess we cant use line breaks in the comment section. I'll throw up a plain text file on my website so you can see what I'm talking about" http://www.genxts.com/regex.txt

2010-07-25 05:42:47

I'm afraid that my help stops at the basic regex. However, if vb really is that easy to understand, then this should work. The question in my mind is what RegEx.Replace() actually does. If it simply overwrites the supplied parameter, then I see it working. If it does something else, then I am not sure... I can give you a PHP or Perl version...

Tim 2010-07-25 05:48:45

php version would be great. i can convert between the two usually. TIA

2010-07-25 06:01:27

ansaurus

tags:

views:

answers:

Regular expression to add base domain to directory

related questions