ansaurus

Question

Parsing vCards on web pages into a MySQL DB

Answer 1

A:

I believe you are looking for HTML parsers. Here is HTML parsing module for python

You need to parse the relevant data out of all the HTML files and then do whatever with it.

I have not tried any php html parsers to recommend any but since you are working on a webserver I'm hoping it has perl? Take a look at perl html parsers.

#this snippet will get contents of organization name

 sub start {
      my ($self, $tag, $attr, $attrseq, $origtext) = @_;

      if ($tag =~ /^span$/i && $attr->{'class'} =~ /^fn org$/i) {
          # see if we find <span class="fn org"
          push (@org_names, $origtext);
      } 
  }

now you have @org_names array that contains all organization names.

Omnipresent 2009-10-31 02:23:41

I can't run Python on my server.

WillKop 2009-10-31 02:36:12

Answer 2

A:

Try the DOMDocument class' loadHTML method. Then you can use DOMDocument methods to select the nodes, attributes and values you want. Or if you're familiar with XPath, you can also instantiate a DOMXPath object to query against the loaded DOMDocument to select the desired data.

grantwparks 2009-10-31 04:34:07

ansaurus

tags:

views:

answers:

Parsing vCards on web pages into a MySQL DB

related questions