ansaurus

Question

How to write this Regex

Answer 1

+2 A:

No not parse HTML using a regex like it's just a big pile of text. Using a DOM parser is a proper way.

teukkam 2010-08-27 05:12:49

Answer 2

+2 A:

Don't use regular expressions to parse HTML...

Alex Martelli 2010-08-27 05:14:12

Answer 3

A:

please learn to use jQuery for this sort of thing

Scott Evernden 2010-08-27 05:16:23

I don't see any suggestion in that question that JavaScript is being used, and even if there was, "use jQuery" is a rubbish answer which would need to be more specific.

David Dorward 2010-08-27 05:19:06

hmmmm .. rubbish eh ? .. fascinating

Scott Evernden 2010-08-27 05:20:26

"My engine is giving off steam!" "Use a spanner".

David Dorward 2010-08-27 05:33:47

Please -- you are kidding me. he asked exactly 'I want match the <li> node.' .. that's precisely what jQuery is designed to do . . match nodes. Look at all the other answers indicating he should process the DOM rather than use a regex. What's jQuery designed for eh???

Scott Evernden 2010-08-27 06:01:09

Answer 4

+1 A:

Don't use a regular expression to match an html document. It is better to parse it as a DOM tree using a simple state machine instead.

I'm assuming you're trying to get html list items. Since you're not specifying what language you use here's a little pseudo code to get you going:

Pseudo code:

while (iterating through the text)

    if (<li> matched)

        find position to </li>
        put the substring between <li> to </li> to a variable

There are of course numerous third-party libraries that do this sort of thing. Depending on your development environment, you might have a function that does this already (e.g. javascript).

Spoike 2010-08-27 05:20:24

....I's just string....in .Net/C#....

Dreampuf 2010-08-27 06:27:44

thanks...I would want to do like this..

Dreampuf 2010-08-27 16:20:36

Answer 5

A:

Which language do you use?

If you use Python, you should try lxml: http://codespeak.net/lxml/. With lxml, you can search for the node with tag ul and class "past". You then retrieve its children, which are li, and get text of those nodes.

2010-08-27 05:22:10

thx...but i want use Regex....

Dreampuf 2010-08-27 06:29:46

Ok. You should do 2 steps. First, you extract the text inside tags **ul**. Then, you extract **li**. If you use Python, the code is here: http://pastebin.com/HesVF7zJ

2010-08-27 17:51:11

Answer 6

A:

If you are trying to extract from or manipulate this HTML, xPath, xsl, or CSS selectors in jQuery might be easier and more maintainable than a regex. What exactly is your goal and in what framework are you operating?

Peter DeWeese 2010-08-27 05:24:24

ansaurus

tags:

views:

answers:

How to write this Regex

related questions