ansaurus

Question

Build a regular expression to find id in an href.

Answer 1

+1 A:

Something like: href=".*/([^"?/]*)?[^"]*RELATION_ID[^"]*". This assumes you're being consistent in using double quotes for you attributes. This should be perl & java friendly.

The ([^"?/]*) will capture the bit between the slash and the question mark. In java, you would use Matcher.group(int) to get the value. If you're trying to get multiple values from the same document, look at Matcher.find(int).

sblundy 2008-11-17 21:26:00

Thanks!! is there an easy way to get the id which is between the slash and the questionmark easily then?

joe 2008-11-17 21:34:31

Answer 2

+1 A:

It might not be prudent to attack this with a plain-old regex. XPath with a built-in url-parsing function might be a better solution.

As stated before, the best solution depends on the language you're using.

Cybis 2008-11-17 21:32:49

Answer 3

+1 A:

maybe something like this href="(.+?)/(.+?)\?(.+?)RELATION_ID" and use the second match if your only looking for the id part (37004e1f800021f3 in your example)

Tjofras 2008-11-17 21:36:02

Answer 4

+1 A:

Here is a python solution:

expr = re.compile('href=.*?/(.*?)\?.*?=RELATION_ID', re.MULTILINE)

for x in expr.finditer(test_string): # iterate through all matches
   s = x.group(1) # get the one and only group of the match
   ss = s.split("/") # split off the ISDOFSDdev
   s = ss[len(ss) - 1] # grab the last element
   print s # print it

Output where test_string is the string you posted:

37004e1f800021f3
37004e1f800021f4

Again this is in python, but with any modern regex library you should be able to replicate it.

It is extremely difficult to get a regular expression that will just pull out the ID. I am not saying it is impossible, but it is often easier to get close with the regex then split out what you need from the substring the regex gives you.

Documentation on the python regex module.

grieve 2008-11-17 21:53:00

Well Will showed an extremely easy regular expression that does pull out the ID. :) :(

grieve 2008-11-17 22:18:41

Answer 5

+4 A:

You can use this regex expression:

[a-fA-F0-9]+(?=\?DMS_OBJECT_SPEC=RELATION_ID)

which matches the the hex number immediately before the query string.

I'd also suggest using XPath to do this over regex.

Will 2008-11-17 21:57:15

I misread the question, sorry.

Will 2008-11-17 22:02:48

Thanks that works good now.

joe 2008-11-17 22:06:27

Answer 6

+2 A:

As you have XML data, why not using an XSLT stylesheet?. This example picks the value of the desired attributes. This examples uses only XPath 1.0 functions which are somewhat limited. It outputs the values of desired href attributes.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     >
     <xsl:output method="text" indent="no"/>
     <xsl:template match="*[@href]">
      <xsl:if test="contains(@href, 'RELATION_ID')">
       <xsl:value-of select="@href"/>
   <xsl:text>&#xa;</xsl:text>
      </xsl:if>
      <xsl:apply-templates select="*"/>
     </xsl:template>
     <xsl:template match="*">
      <xsl:apply-templates select="*"/>
     </xsl:template>
</xsl:stylesheet>

Considering you name "example.xml" the given file and "example-xslt.xsl" the XSLT stylesheet provided you can use the following line to save the result to a file "out.txt" using MSXSL.exe:

C:\Documents and Settings\fer\Escritorio>msxsl.exe -xw example.xml example-xslt.xsl > out.txt

Edit: Next is the XSLT using XPath v2.0 that let's you use the power of regular expressions inside string handling funcions. The result is the ID inside URL you were looking for (instead of the whole value of href attributes).

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:fn="http://www.w3.org/2005/xpath-functions" >
     <xsl:output method="text" indent="no"/>
     <xsl:template match="*[@href]">
      <xsl:if test="fn:contains(@href, 'RELATION_ID')">
       <xsl:value-of select="fn:replace(@href,'.*/([^/]*)\?.*', '$1')"/>
       <xsl:text>&#xa;</xsl:text>
      </xsl:if>
      <xsl:apply-templates select="*"/>
     </xsl:template>
     <xsl:template match="*">
      <xsl:apply-templates select="*"/>
     </xsl:template>
</xsl:stylesheet>

There are not many free XSLT v2.0 processors out there, but AltovaXML-2008 is one of those. The following command line gives you the expected result.

C:\Documents and Settings\fer\Escritorio>AltovaXML -xslt2 example-xslt.xsl -in example.xml

Fernando Miguélez 2008-11-17 22:17:55

Answer 7

A:

First find the href attribute using this regex: href="[^=]*=RELATION_ID"

Once you have a collection of those attributes, use the following regex to find the ID: dctm:[^?]*

Explanation of first regex

href=" : Match the characters "href="" literally
[^=]* : Match any character that is NOT a "=" between zero and unlimited times
=RELATION___ID : Match the characters "=RELATION_ID" literally.

Explanation of second regex

dctm: : Match the characters "dctm:" literally
[^?]* : Match any character that is NOT a "?" between zero and unlimited times.

If you are going to use regular expressions often you should strongly consider buying Regex Buddy at http://www.regexbuddy.com/

Jason 2008-11-17 23:44:31

ansaurus

tags:

views:

answers:

Build a regular expression to find id in an href.

related questions