If the string you want to wrap a link around is YOUR_STRING
, first identify all places where YOUR_STRING
is surrounded by a link tag.
regex = <a[^>]*>[^<]*(YOUR_STRING)[^<]*</a>
Starts with <a
Followed by a sequence of length zero or more that doesn't contain >
.
Followed by >
Followed by a sequence of length zero or more that doesn't contain <
.
Followed by YOUR_STRING
This is a capturing group.
Followed by a sequence of length zero or more that doesn't contain <
.
Followed by </a>
Now you can identify the character offsets of the places where captured group YOUR_STRING
is surrounded by a link tag.
Other than these places, in all other places where YOUR_STRING
occurs literally, wrap the link tag around it.
Bonus point: Note that when you insert text into a string, you may change the character offsets, OR your regex may throw a ConcurrentModificationException / not allow you to insert text during analysis time (depending on what library you are using). The best way to handle this is to create a separate StringBuffer and append text to it as your analyze your original string.
Also note: The regex to identify the hyperlink tag can be written more smarter (for correct html) but this should work for bad html too. E.g. with a missing href attribute such as <a>quick brown fox</a>
. If the html you are expecting can be imperfect and you would like to handle those issues, then you should modify the regex accordingly.
Hope it works.