ansaurus

Question

Answer 1

A:

(?i)<meta\\s+name=\"keywords\"\\s+content=\"(.*?)\">

Would produce something like:

preg_match('~<meta\\s+name=\"keywords\"\\s+content=\"(.*?)\">~i', $html, &$matches);

JoostK 2009-11-15 15:49:54

Answer 2

A:

This is a simple regex, that matches the first meta keywords tag. It only allows characters, numbers, legal URL characters, HTML entities and spaces to appear inside the content attribute.

$matches = array();
preg_match("/<meta name=\"Keywords\" content=\"([\w\d;,\.: %&#\/\\\\]*)\"/", $html, $matches);
echo $matches[1];

gnud 2009-11-15 15:53:46

Answer 3

+1 A:

(.*) matches everything up to the LAST "(quote) in the document, obviously not what you want. Regex is greedy by default. You need to use

content=\"(.*?)\"

or

content=\"([^\"]*)\"

yu_sha 2009-11-15 15:57:07

That won't work completely, since he uses the `^`, so the meta-element needs to be at the beginning of the html which should never be the case.

JoostK 2009-11-15 16:01:08

Answer 4

+1 A:

Use the function get_meta_tags();

Tutorial

Cups 2009-11-15 16:14:38

When fetching stuff to work on, I am guessing that getting the keywords is only one operation, I always do it in 2 bites. 1) Get the file and store it locally 2) Do my post-fetch rippingI just find that more reliable as so much can go wrong when fetching from the web. But if you're only after the keywords, why bother getting the file, just use file_get_meta() ;

Cups 2009-11-15 18:26:27

Was not aware of the get_meta_tags function. Awesome - thanks!

SerpicoLugNut 2009-11-16 14:38:27

Answer 5

+3 A:

I would use a HTML/XML parser like DOMDocument and XPath to retrieve the nodes from the DOM:

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$keywords = $xpath->query('//meta[translate(normalize-space(@name), "KEYWORDS", "keywords")="keywords"]/@content');
foreach ($keywords as $keyword) {
    echo $keyword->value;
}

The translate function seems to be necessary as PHP’s XPath implementation does not know the lower-case function.

Or you do the filtering with PHP:

$metas = $xpath->query('//meta');
foreach ($metas as $meta) {
    if ($meta->hasAttribute("name") && trim(strtolower($meta->getAttribute("name")))=='keywords' && $meta->hasAttribute("content")) {
        echo $meta->getAttribute("content")->value;
    }
}

Gumbo 2009-11-15 16:16:00

I would +1 if I had any daily votes left :(

meder 2009-11-15 16:16:49

+1, except, there is get_meta_tags() built in.

Svante 2009-11-15 16:36:28

@Svante: But `get_meta_tags` expects a filename and not the HTML source.

Gumbo 2009-11-15 16:48:10

Answer 6

+1 A:

Stop trying to parse HTMl with regular expressions.

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Ether 2009-11-15 18:31:29

ansaurus

tags:

views:

answers:

RegEx to get the keywords from HTML

related questions