tags:

views:

37

answers:

4

Hello, i have been working on one tool lately. It grabs all the link addresses from the website.

My problem is that links in html code sometimes is different:

I need to make all links same:

/index.php                       -> http://www.website.com/index.php
index.php                        -> http://www.website.com/index.php
http://www.website.com/index.php -> http://www.website.com/index.php

Thanks for help.

+1  A: 

Welcome to GoogleOverflow.com.

Here is the complete tutorial for parsing links in HTML using PHP and regex: http://www.the-art-of-web.com/php/parse-links/

Jay
Combine this with Max S's function, and you're set.
Jay
GoogleOverflow.com ?
Robert Hurst
Type the 3 tags into Google and see if the question isn't answered in 0.26 seconds. The frequency with which this is the case is alarming. http://meta.stackoverflow.com/questions/8724/how-to-deal-with-google-questions
Jay
+1  A: 

Here's a function which will return the absolute URL given the base (current) URL and a relative one.

Max Shawabkeh
Thanks, that really helped.
Semas
+1  A: 

You need to check for the existence of a base tag. If you find it, it specify the base URL (otherwise, the base URL is the same path the browser points to, up to the last /).

Ofir
A: 

Using preg_replace to fix relative urls


Requires:
$domain = the subject sites domain
$path = the document or string you are looking for relative links with in.

Returns:
$url = the doument or string with the links within it converted to proper urls with the domain given.

Code:

$url = preg_replace('<a\shref="([\/\?\w\.=\&]+)"([\s]rel="(\w+)")*>/', '<a href="http://{$site_domain}$1" rel="$3">' $path)  

good luck, let me know how it goes.

Robert Hurst