views:

252

answers:

4

I've got a bunch of web page content in my database with links like this:

<a href="/11ecfdc5-d28d-4121-b1c9-1f898ac0b72e">Link</a>

That Guid unique identifier is the ID of another page in the same database.

I'd like to crawl those pages and check for broken links.

To do that I need a function that can return a list of all the Guids on a page:

Function FindGuids(ByVal Text As String) As Collections.Generic.List(Of Guid)
    ...
End Function

I figure that this is a job for a regular expression. But, I don't know the syntax.

+2  A: 

There are easier ways to check for broken links.... for example I think http://www.totalvalidator.com/ will do it :D

This could also help

static Regex isGuid = 
    new Regex(@"^(\{){0,1}[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}(\}){0,1}$", RegexOptions.Compiled);

and then

static bool IsGuid(string candidate, out Guid output)
{
bool isValid = false;
output=Guid.Empty;
if(candidate!=null)
{

 if (isGuid.IsMatch(candidate))
 {
  output=new Guid(candidate);
  isValid = true;
 }
}
return isValid;

}

DrG
That looks handy. But many pages of this web site require login and have other business rules I must handle.
Zack Peterson
Total validator (advanced) will do authentication too!
DrG
I think it is actually the Pro (not advanced) version
DrG
+4  A: 

[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}

Joel Coehoorn
+2  A: 

Suggest you grab a free copy of expresso and learn to build them!

Here's a 10 second attempt with no optimization, checks upper and lower case and creates a numbered capture group:

([a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12})

Then you just have to iterate through the matched groups...

Si
+2  A: 
Function FindGuids(ByVal Text As String) As List(Of Guid)
    Dim Guids As New List(Of Guid)
    Dim Pattern As String = "[a-fA-F0-9]{8}-([a-fA-F0-9]{4}-){3}[a-fA-F0-9]{12}"
    For Each m As Match In Regex.Matches(Text, Pattern)
        Guids.Add(New Guid(m.Value))
    Next
    Return Guids
End Function
dotjoe