ansaurus

Question

Answer 1

+4 A:

Why can't you use a DOM parser and then just extract all the meta elements and iterate through and do whatever you want?

meder 2009-09-13 22:22:40

I have no experience using dom objects... how would I extract the meta keywords from it ?

Gerald Ferreira 2009-09-13 22:25:32

+1. Writing your own regex is prone to errors and can be very difficult to debug. No reason to do this when there are free parsers all over the place.

Fragsworth 2009-09-13 22:25:40

First find a DOM parser, then load the document, the getElementsByTagName method is what you're looking for and you can use getAttribute after iterating through the nodeList of meta elements.

meder 2009-09-13 22:28:12

which DOM parser would you recommend?

Gerald Ferreira 2009-09-13 22:32:36

That syntax is Javascript - are you scraping an external URI? What languages are you familiar with?

meder 2009-09-13 22:39:30

mainly javascript, visual script and asp - I am trying to write an extension to extract all meta data from my sites, count the number of words in the tags and test to see if meta data exists for all my pages

Gerald Ferreira 2009-09-13 22:42:03

@Gerald: if you are doing this within a browser window, then you can leverage the DOM/DHTML features of your browser through JavaScript. You can use document.getElementsByTagName("META") to return a JavaScript array of all META tags defined within your page.

David Andres 2009-09-13 22:42:47

Answer 2

A:

I don't have specific answer, but is this helpful? It is what I use in TextPad's find and replace.

^<meta[^"]+"\([^"]*\)"[^"]*"\([^"]*\)"*.*

FIND:
^[^"]+"\([^"]*\)"[^"]*"\([^"]*\)"*.*
REPLACE:
<\1>\2</\1>

CHANGES:
<TITLE>Q10022</TITLE>
<META HTTP-EQUIV="CONTENT-Type" CONTENT="text/html; charset=iso-8859-1" />

TO:
<TITLE>Q10022</TITLE>
<CONTENT-Type>text/html; charset=iso-8859-1</CONTENT-Type>

AMissico 2009-09-13 23:36:50

Answer 3

A:

<meta name="keywords" content="(.+)" />

I may be wrong but if it's non-greedy that should be it. just escape the special characters.

Jeff 2009-09-13 23:49:35

Answer 4

A:

This is untested but with jquery couldn't you just do:

$('meta').each(function() { // insert code here to put the attributes in an array or whatever });

and then inside there either store the data, do some ajax calls or some actions based on the data in each of the meta tags.

aphelionz 2009-09-14 00:02:12

ansaurus

tags:

views:

answers:

regex to get meta keywords

related questions