tags:

views:

74

answers:

1

I have to parse this HTML:

<p>
<strong>abc:</strong>
asfkjsdg
</p>

$para contains the value of the element <p>. I am using HTML::TreeBuilder.

$para->as_text

gives me

 abc:asfkjsdg

How can I get only:

 asfkjsdg
+4  A: 
join('', grep { !ref } $para->content_list)
reinierpost
could you please add an explanation of what this syntax means. i am a beginner at perl.
iamrohitbanga
it works. but what does `grep {!ref}` do?
iamrohitbanga
@iamrohitbanga - it filters out reference types, leaving only the children nodes that are text
K Prime
@iamrohitbanga: So you took a Perl job but you do not know any Perl. Interesting. `ref` returns a true value if its argument is a reference. The `grep` thus selects all elements of the content list that are not references.
Sinan Ünür
done! understood.
iamrohitbanga
To explain the syntax further: any Perl functions substitute $_ or @_ when no argument is given, so !ref really stands for !ref($_); and iterators like grep and map use $_ as the variable that takes each value in turn. You can find all this in the Perl documentation, e.g. on the command line type perldoc -f ref or perldoc -f grep for details.
reinierpost
@reinerpost: grep uses $_, but it's not true that any Perl funciton does. Also, only a couple use @_ as a default. You have to check each function to see what its default is. There is no general rule.
brian d foy
If this code is a sample of <p><strong>a:</strong>b<strong>c:</strong>d</p> You'll need a more intelligent algorithm, because here you'll find you have the value `bd`
Evan Carroll
Use the documentation. Type "perldoc -f ref" at the command line, or see http://perldoc.perl.org/functions/ref.html (and then do the same for the 'grep' function).
Ether
@brian: augh, i wrote 'any' where I meant 'many'. sorry.
reinierpost