tags:

views:

140

answers:

3

Hello,

I asked a question earlier today regarding using Perl to search in a CSS document. I have since refined my requirements a little bit, and have a better idea of what I am trying to do.

The document I am searching through is actually an .html doc with CSS as a style in the <head>, if that makes sense.

Basically, what I need to do though is find all the CSS elements that have a color or background color attribute, and record them. Here's my thought process.

  1. open the file and set it as an array
  2. read the array line-by-line until it comes to a "{"
  3. make everything into a scalar variable or array until I get to the "}"
  4. search the secondary variable or string for instances of "color" blah blah blah.

The issue I am having is finding a way to scour the document and turn everything between { and } into a variable of some sort. Any one have any ideas?

Cheers!

+3  A: 

No matter what, I wouldn't recommend writing your own code from the ground up for this. You should use a parser. A quick search on CPAN suggests this family of modules. On the other hand, if your css is in an html file rather than a separate css file (shame on you), then you might end up needing a different type of parser.

Either way, it's generally not a good idea to try to hand-roll your own quasi-parser out of regular expressions. Use a proper parser, and leverage someone else's work.

On a slightly different tack, if you only want to extract some of the information from a file of any kind, then in many cases you don't want to put the whole file into an array first. (It can be memory intensive if the file is very large, and it's unnecessary.) It's easy to open the file and the process the items as you work through it line by line.

#!/usr/bin/env perl
use strict;
use warnings;

open my $fh, '<', 'file-of-interest'
  or die "Couldn't open 'file-of-interest': $!";

my @saved_items;

while (my $line = <$fh>) {
  # process $line
  # push @saved_items, $something
}

# Do more fun stuff with @saved_items
Telemachus
*most* cases is a little strong; only occasionally is a file big enough to even worry about, and when doing multiline matching (as here), having it all in memory is significantly simpler
ysth
Fair enough and changed. (I was assuming that a situation like the OP's parsing across multiple lines was not part of the equation since I'd already said use a parser for that. I just meant more generally that people always jump to "put it in an array" and in my experience iterating is often better.)
Telemachus
+1  A: 

You could use the CSS module, which is available on CPAN.

Peter Stuifzand
+1  A: 

I think this is really just the same question that you asked previously, although you didn't mention as you did in a previous comment that you don't think you are allowed to use modules.

The CSS module already does this. You can look at the source to see how they do it. That's the same answer I gave you last time too.

There isn't really any magic or secret way that everyone is hiding from you. Most times, if the module you find on CPAN could be simpler, it would be. However, without any more information that constrains your problem, a general solution like SS](http://search.cpan.org/dist/CSS) is the way to go. Study that source or just lift it wholly into your script, although you might try some arguments to get some modules installed. If you could use the module, you might already be done and onto the next project. That's often a convincing argument. :)

brian d foy