ansaurus

Question

Regular expressions help

Answer 1

A:

You are currently doing it with greedy regexps. Use not greedy regexps instead.

Lyubomyr Shaydariv 2009-11-07 23:39:53

Answer 2

A:

You may need to use the "/u" option to correctly process UTF8 text.

http://php.net/manual/en/reference.pcre.pattern.modifiers.php

Tim Sylvester 2009-11-07 23:40:12

Answer 3

+2 A:

Try

<character>(.*?)<\/character>

The question mark is an ungreedy qualifier, meaning it'll match a string as short as possible. Also < and > doesn't need escaping.

Jonas 2009-11-07 23:44:49

I just wanted to say the same, but I have lost my sample source code that was pretty easy to find. )))

Lyubomyr Shaydariv 2009-11-07 23:46:18

Answer 4

+3 A:

If using the preg family of functions, your regular expression should be:

/\<character>(.*?)\<\/character>/s

The non-greedy operator ? will prevent you from only getting one match starting from the first <character> and ending at the last </character>.The /s flag will allow your dot to match line breaks.

BipedalShark 2009-11-07 23:53:10

`<` needs no escaping.

Bart Kiers 2009-11-08 11:11:22

Answer 5

+5 A:

Unless you're required at gunpoint to use regular expressions to do this, DOMDocument will be far more accurate.

<?php

$dom = new DOMDocument;
$dom->loadXML($data);

$character_nodes = $dom->getElementsByTagName('character');

// use $character_nodes...
?>

seanmonstar 2009-11-08 00:34:34

even at gunpoint there's no good reason to use regexes for parsing xml, but it remains possible that the data just looks like xml, but isn't quite valid xml...

Kris 2009-11-08 00:45:24

@Kris, I think "not getting shot" remains a good reason to do something when at gunpoint. ;)

BipedalShark 2009-11-08 01:04:33

+1 for giving a proper answer. There are DOM parsers for HTML, too. RegEx is a great tool... for other tasks.

TrueWill 2009-11-08 02:51:16

My document isn't a valid HTML, so I got a lot of errors...

Anthony 2009-11-08 12:07:30

Anthony, I asked a similar question before. The aim of the question was loading a not-strict XML/HTML document into DOM. Check this: http://stackoverflow.com/questions/1473214/how-to-parse-not-strict-html-documents-indulgently

Lyubomyr Shaydariv 2009-11-08 12:59:34

This is a knee-jerk, pat answer as far I'm concerned. Regexing through HTML can be perilous, sure, but for limited cases like Anthony's, regular expressions are perfectly fine.

BipedalShark 2009-11-11 01:12:41

ansaurus

tags:

views:

answers:

Regular expressions help

related questions