ansaurus

Question

Answer 1

+3 A:

As numerous people will/have pointed out, you're much better off using an HTML/XML parser for the above (like this one). HTML isn't regular and there are numerous edge cases to code around if you use a regular expression.

Given that you just want to extract the text, perhaps XPath will help. An expression such as:

/tr/td/text()

may do the trick.

Brian Agnew 2009-10-02 09:37:31

Answer 2

A:

isn’t strip_tags an option?

it will strip all tags and only leave the text between the tags. it strips attributes too though

in your case this would result in:

  01.10.2009
   AN09551
     [2009132] Ich bin Un. 
   7.500,00 € 
    Entwurf

knittl 2009-10-02 09:40:48

could be i haveto test it

streetparade 2009-10-02 09:44:30

Answer 3

A:

Otherwise with a regexp you could use this (with multi-line option):

(?:\<td[^\>]*?\>([^\<]*?)\</td\>)+

But as pointed out by @Brian Agnew, this is just nowhere as good as an xml/html parser...

Locksfree 2009-10-02 09:44:12

Worket like nothing else #(?:\<td[^\>]*?\>([^\<]*?)\</td\>)+#siUThanks

streetparade 2009-10-02 10:06:59

Answer 4

+1 A:

Try:

// http://simplehtmldom.sourceforge.net/
include('simple_html_dom.php');
$str = '<tr class="rowodd" onclick="window.location.href=\'/portal/offers/show/entityId/32114\';">
  <td>
    01.10.2009
  </td>
  <td>
    AN09551
  </td>
  <td>
    [2009132] Ich bin Un. <a href="/portal/clients/show/entityId/762350">
    <myimsrc="/img/bullet_go.pngs" alt="" title="Kundenakte aufrufen"></a>
  </td>
  <td class="number" title="7.500,00">
    7.500,00
  </td>
  <td>
    Entwurf
  </td>
</tr>';
$html = str_get_html($str);
foreach($html->find('td') as $element) {
  echo trim($element->innertext) . "\n";
}

Output:

01.10.2009
AN09551
[2009132] Ich bin Un. <a href="/portal/clients/show/entityId/762350">
    <myimsrc="/img/bullet_go.pngs" alt="" title="Kundenakte aufrufen"></a>
7.500,00
Entwurf

Bart Kiers 2009-10-02 09:50:08

Call to undefined function str_get_html()is it simple_html_parser?

streetparade 2009-10-02 10:00:15

but its a html page so there maybealot more td's. So to find all td's isnt a good idea

streetparade 2009-10-02 10:04:20

Yes, str_get_html() is defined in simple_html_parser

Bart Kiers 2009-10-02 10:10:29

You can get certain (or just one) tr's based on a given attribute and get the td's from it. Read the documentation, it's pretty straight forward.

Bart Kiers 2009-10-02 10:11:45

Answer 5

+1 A:

Don’t use that many inexplicit non-greedy expressions like .*?. Though they do what you want, they come with a lot of backtracking and thus make your whole expression inefficient. Especially when you use so many of them.

Try to be as explicit as possible:

#<tr\b(?:[^"'>]*|"[^"]*"|'[^']*')*>\s*
    <td\b(?:[^"'>]*|"[^"]*"|'[^']*')*>((?:[^<]|(?!</td\s*>)<)*)</td\s*>\s*
    <td\b(?:[^"'>]*|"[^"]*"|'[^']*')*>((?:[^<]|(?!</td\s*>)<)*)</td\s*>\s*
    <td\b(?:[^"'>]*|"[^"]*"|'[^']*')*>((?:[^<]|(?!</td\s*>)<)*)</td\s*>\s*
    <td\b(?:[^"'>]*|"[^"]*"|'[^']*')*>((?:[^<]|(?!</td\s*>)<)*)</td\s*>\s*
    <td\b(?:[^"'>]*|"[^"]*"|'[^']*')*>((?:[^<]|(?!</td\s*>)<)*)</td\s*>\s*
</tr\s*>#sx

But as you see, this is a mess.

You should better use an HTML parser like the one of DOMDocument. Then you can query the elements with XPath as Brian Agnew suggested. That’s way more reliable and comfortable than regular expressions.

Gumbo 2009-10-02 11:50:14

Thanks it worrked $pattern = '#<tr\b(?:[^\"\'>]*|\"[^\"]*\"|\'[^\']*\')*>\s* <td\b(?:[^\"\'>]*|\"[^\"]*\"|\'[^\']*\')*>((?:[^<]|(?!</td\s*>)<)*)</td\s*>\s* <td\b(?:[^"\'>]*|"[^"]*"|\'[^\']*\')*>((?:[^<]|(?!</td\s*>)<)*)</td\s*>\s* <td\b(?:[^"\'>]*|"[^"]*"|\'[^\']*\')*>((?:[^<]|(?!</td\s*>)<)*)</td\s*>\s* <td\b(?:[^"\'>]*|"[^"]*"|\'[^\']*\')*>((?:[^<]|(?!</td\s*>)<)*)</td\s*>\s* <td\b(?:[^"\'>]*|"[^"]*"|\'[^\']*\')*>((?:[^<]|(?!</td\s*>)<)*)</td\s*>\s* </tr\s*>#sx';

streetparade 2009-10-05 11:38:58

Answer 6

A:

In PHP world, there's preg_match_all which makes it much easier than do in JS.

$ptn = "/<\s*td[^>]*>([^<^>]*)</;
preg_match_all($ptn, $str, $matches);
print_r($matches);

Test the result in Preg Tester

unigg 2009-10-02 13:01:40

ansaurus

tags:

views:

answers:

Regex Tables how to match?

related questions