ansaurus

Question

How to wrap text in a hyperlink ONLY if it isn't already wrapped in a hyperlink

Answer 1

A:

it seems as if you are parsing rendered html, if that is the case why not parse the raw html? Then the problem becomes trivial

ennuikiller 2009-07-28 02:32:19

I don't see how it becomes trivial. I don't understand the difference between raw and rendered html. html is a format. The browser renders the format into an interface.The documents I'm using the regex against are html documents. So there's no way to remove the html.

Laran Evans 2009-07-29 15:05:13

Answer 2

+1 A:

Lookarounds could get you somewhere. Though not perfect at all, here is a quick regex check to see whether your text has been wrapped in anchor tags already.

(?<=>)quick brown(?=</a>)

Note: lookbehind assertions need to be fixed length (at least in PCRE).

Geert 2009-07-28 14:39:30

Answer 3

+1 A:

If the string you want to wrap a link around is YOUR_STRING, first identify all places where YOUR_STRING is surrounded by a link tag.

regex = <a[^>]*>[^<]*(YOUR_STRING)[^<]*</a>

Starts with <a

Followed by a sequence of length zero or more that doesn't contain > .

Followed by >

Followed by a sequence of length zero or more that doesn't contain <.

Followed by YOUR_STRING This is a capturing group.

Followed by a sequence of length zero or more that doesn't contain <.

Followed by </a>

Now you can identify the character offsets of the places where captured group YOUR_STRING is surrounded by a link tag.

Other than these places, in all other places where YOUR_STRING occurs literally, wrap the link tag around it.

Bonus point: Note that when you insert text into a string, you may change the character offsets, OR your regex may throw a ConcurrentModificationException / not allow you to insert text during analysis time (depending on what library you are using). The best way to handle this is to create a separate StringBuffer and append text to it as your analyze your original string.

Also note: The regex to identify the hyperlink tag can be written more smarter (for correct html) but this should work for bad html too. E.g. with a missing href attribute such as <a>quick brown fox</a>. If the html you are expecting can be imperfect and you would like to handle those issues, then you should modify the regex accordingly.

Hope it works.

hashable 2009-08-03 18:02:33

ansaurus

tags:

views:

answers:

How to wrap text in a hyperlink ONLY if it isn't already wrapped in a hyperlink

related questions