tags:

views:

257

answers:

4

Hi I was hopeing that someone can help me with this regex.

I want to match the patern below once to extract meta keywords from a page:

.match(/(<meta name=[\"|\']keywords([^\/>]*))/ig);

Any ideas will be welcomed

+4  A: 

Why can't you use a DOM parser and then just extract all the meta elements and iterate through and do whatever you want?

meder
I have no experience using dom objects... how would I extract the meta keywords from it ?
Gerald Ferreira
+1. Writing your own regex is prone to errors and can be very difficult to debug. No reason to do this when there are free parsers all over the place.
Fragsworth
First find a DOM parser, then load the document, the getElementsByTagName method is what you're looking for and you can use getAttribute after iterating through the nodeList of meta elements.
meder
which DOM parser would you recommend?
Gerald Ferreira
That syntax is Javascript - are you scraping an external URI? What languages are you familiar with?
meder
mainly javascript, visual script and asp - I am trying to write an extension to extract all meta data from my sites, count the number of words in the tags and test to see if meta data exists for all my pages
Gerald Ferreira
@Gerald: if you are doing this within a browser window, then you can leverage the DOM/DHTML features of your browser through JavaScript. You can use document.getElementsByTagName("META") to return a JavaScript array of all META tags defined within your page.
David Andres
A: 

I don't have specific answer, but is this helpful? It is what I use in TextPad's find and replace.

^<meta[^"]+"\([^"]*\)"[^"]*"\([^"]*\)"*.*

FIND:
^[^"]+"\([^"]*\)"[^"]*"\([^"]*\)"*.*
REPLACE:
<\1>\2</\1>

CHANGES:
<TITLE>Q10022</TITLE>
<META HTTP-EQUIV="CONTENT-Type" CONTENT="text/html; charset=iso-8859-1" />

TO:
<TITLE>Q10022</TITLE>
<CONTENT-Type>text/html; charset=iso-8859-1</CONTENT-Type>
AMissico
A: 
<meta name="keywords" content="(.+)" />

I may be wrong but if it's non-greedy that should be it. just escape the special characters.

Jeff
A: 

This is untested but with jquery couldn't you just do:

$('meta').each(function() { // insert code here to put the attributes in an array or whatever });

and then inside there either store the data, do some ajax calls or some actions based on the data in each of the meta tags.

aphelionz