ansaurus

Question

Answer 1

A:

Although I have not tested the code below, but something like this should work:

if ($line =~ m/\.(.*?) \{(.*?)color:(.*?);(.*)/) {
 print "$1,$3\n";
}

You should invest some time learning regular expressions for Perl.

Alec Smart 2009-06-04 18:33:57

This is a really bad regex. For one, use \s instead of spaces. You're not using any regex modifiers, like /i and /m which you will most likely need here. Finally, what happens if there's no color property?

Artem Russakovskii 2009-06-06 11:03:36

Answer 2

+5 A:

Since you asked for advice (and this isn't a coding service) I'll offer just that.

Always use strictures and warnings:

use strict;
use warnings;

Always check the return value of open calls:

open(FILE, 'filename') or die "Can't read file 'filename' [$!]\n";

Use the three-arg form of open and lexical filehandles instead of globs:

open(my $fh, '<', 'filename') or die "Can't read file 'filename' [$!]\n";

Don't slurp when line-by-line processing will do:

while (my $line = <$fh>) {
    # do something with $line
}

Use backreferences to retrieve data from regex matches:

if ($line =~ /color *: *(#[0-9a-fA-F]{6})/) {
    # color value is in $1
}

Save the class name in a temporary variable so that you have it when you match a color:

if ($line =~ /^.(\w+) *\{/) {
    $class = $1;
}

Michael Carman 2009-06-04 18:37:12

I still think this is not the answer needed. Excellent general advices, thought.

Leonardo Herrera 2009-06-04 19:41:52

yes the advice was very helpful. Thanks. I have come a bit further in the past few hours with how I am approaching the solution. I'm not having problems with the regex's but rather with the capture of data.Since CSS elements are typically multi-line, I need to figure out how to create an array between the { and } with each linebreak as a delimiter for list items. The final (revised) I need this data in is as follows (example)body:color:#000000

Ryan Max 2009-06-04 22:39:59

Just remember that not all CSS elements are multi-line. Many simple cases declare multiple properties on one line. For example: * { margin: 0; padding: 0; }

Telemachus 2009-06-06 10:57:49

Answer 3

+2 A:

Well, this is not as simple as it seems.

CSS classes can be defined in many ways. For example,

    .classy {
         color: black;
    }

Good luck using a line-by-line approach for parsing that.

Actually, my first approach would be searching CPAN. This looks promising:

CSS - Object oriented access to Cascading Style Sheets (CSS)

Edit:

I installed HTML::TreeBuilder and CSS modules from CPAN and concocted the following aberration:

use strict;
use HTML::TreeBuilder;
use CSS;

foreach my $file_name (@ARGV) {
    my $tree = HTML::TreeBuilder->new; # empty tree
    $tree->parse_file($file_name);

    my $styles = $tree->find('style');

    if ($styles) {
        foreach my $style ($styles) {
            # This is an insane hack, not guarantee
            # to work in the future.
            my $css = CSS->new;
            $css->read_string(join "\n", @{$style->{_content}});

            print $css->output;
        }
    }
    $tree = $tree->delete;
}

This thing only prints all the CSS selectors from list of HTML files, but nicely formatted so you should be able to continue from here.

Leonardo Herrera 2009-06-04 18:46:49

There's nothing difficult about parsing that line-by-line. You just need to save a copy of the class name when you find one. And while using CPAN is a good thing this is a good (simple) exercise for a Perl novice to cut his teeth on.

Michael Carman 2009-06-04 18:53:12

Michael, if you can't see the difficulties of parsing CSS then I suspect you haven't thought it through. To parse a CSS you have to implement a recursive descent parser.

Leonardo Herrera 2009-06-04 19:30:21

I was addressing your example, not the general case.

Michael Carman 2009-06-04 20:37:42

AFAICS, there is no need for a full recursive parser. The one thing you have to watch out for is CSS comments, but braces don't nest. Text outside of braces is either 1. a selector expression or 2. a comment. Text inside is either 1.a property expression or 2. a comment. You just have to scan for a couple of indicative expressions and transition states on those.

Axeman 2009-06-04 21:08:48

unfortunately I am doing this for work and they are not willing to give me permissions to install any modules.

Ryan Max 2009-06-04 21:10:42

Even if you think you can't install modules, you can always look inside them to see what they do. And, since they are just text, with a little work you can copy the modules right into your source. There are all sorts of ways around that.

brian d foy 2009-06-04 22:43:12

You can easily install modules alongside your own code, and thus don't need permissions to install them in the usual perl dirs. Lookup local::lib on CPAN.

castaway 2009-06-06 10:45:27

Answer 4

+2 A:

For yet another way to do it, you can ask perl to read from the file in sections other than lines, for example by using the "}" as a record separator.

my $color = "color:";

open (my $fh, '<', "index.html") || die "Cant open file $!";  

{
    local $/ = "}";
    while( my $section = <$fh>) {  
    if($section =~ /$color(.*)/) {
        my ($selector) = $line =~ /(.*){/;
        print "$selector, $section\n";  
    }  
}

Untested! Also, this of course assumes that your CSS neatly ends its sections with a } on a line on it's own.

castaway 2009-06-06 10:43:46

slick! Way to think outside the box.

Artem Russakovskii 2009-06-06 11:00:48

Answer 5

+1 A:

I'm not having problems with the regex's but rather with the capture of data. Since CSS elements are typically multi-line, I need to figure out how to create an array between the { and } with each linebreak as a delimiter for list items.

No, you don't.

For the problem as stated, the only lines of interest will be those containing either a class name or a color definition, and possibly also lines containing } to mark the end of a class. All other lines can be ignored, so there's no need to put them into an array.

Since class specifications cannot be nested[1], the last seen set of class names will always be the active set of classes. Therefore, you need only record the last seen set of class names and, when a color specification is encountered, print those class names.

There are still some potential difficulties handling cases in which a specification block is shared by multiple classes (.foo, .bar, .baz { ... }), which may or may not be spread across multiple lines, or if multiple attributes are defined on the same line, but dealing with those should follow fairly easily from what I've already laid out. Depending on your input data, you may also need to include a basic state engine to keep track of whether you're in comments or not.

[1] i.e., Although you can have semantically-nested classes, such as .foo and .foo .bar, they have to be specified in the CSS file as

.foo {
  ...
}
.foo .bar {
  ...
}

and cannot be

.foo {
  ...
  .bar {
    ...
  }
}

Dave Sherohman 2009-06-06 11:05:10

ansaurus

tags:

views:

answers:

How can I search CSS with Perl?

related questions