Yes, I know, I know, parsing HTML with regular expressions is very bad. But I am working with legacy code that is supposed to extract all link
and style
elements from a html page. I would change it and use the dom
extension instead, but after the regex there is a huge code block which relies on the way preg_match_all
returns the matched results.
The script is using this regex:
$pattern = '/<(link|style)(?=.+?(?:type="(text\/css)"|>))(?=.+?(?:media="(.*?)"|>))(?=.+?(?:href="(.*?)"|>))(?=.+?(?:rel="(.*?)"|>))[^>]+?\2[^>]+?(?:\/>|<\/style>)\s*/is';
preg_match_all($pattern, $htmlContent, $cssTags);
But it doesnt work. No elements are matched. Unfortunately I really suck at regex, so if someone could help me out it would be great.