ansaurus

Question

How to filter using Regx and javascript?

Answer 1

+3 A:

document.evaluate("//span[@class='discount']", 
  document, 
  null, 
  XPathResult.ANY_UNORDERED_NODE_TYPE, 
  null).singleNodeValue.textContent.replace("now $", "");

EDIT: This is standard XPath. I'm not sure what kind of explanation you're seeking. For outdated browsers, you will need a third-party library like Sarissa and/or Java-line.

Matthew Flaschen 2009-05-25 14:36:25

Don't you think some explanation would help? What is this? What are the requirements to use this?

rudolfson 2009-05-25 14:40:57

I am only vaguely familiar with XPath, but here is a rough explanation: // means that we want to match any instances of the following tag in the tree (as opposed to one with a specific parent), span is the tag to match, the stuff between [] are additional constraints, in this case the attribute class must be discount.

Chas. Owens 2009-05-25 14:50:49

Answer 2

+4 A:

<script language="javascript">
window.onload = function () {

    // Get all of the elements with class name "discount"
    var elements = document.getElementsByClassName('discount');

    // Loop over each <span class="discount">
    for (var i=0; i < elements.length; i++) {

         // get the text, e.g. "now $39.99"
         var rawText = elements[i].innerHTML;

         // Here's a regular expression to match one or more digits (\d+)
         // followed by a period (\.) and one or more digits again (\d+)
         var priceAsString = rawText.match(/\d+\.\d+/)

         // You'll want to make the price a floating point number if you 
         // intend to do any calculations with it.
         var price = parseFloat(priceAsString); 

         // Now what do you want to do with the price? I'll just write it out
         // to the console (using FireBug or something similar)
         console.log(price);

    }
}
</script>

Patrick McElhaney 2009-05-25 14:41:42

Just because you name the variable spans doesn't mean you're only matching spans...

Matthew Flaschen 2009-05-25 14:46:49

Good point, Matthew. Corrected.

Patrick McElhaney 2009-05-25 14:50:14

Yes..., but you only corrected the variable name. It still matches non-spans!

Matthew Flaschen 2009-05-25 14:52:10

The OP says "I have some text in an element in my page" - it doesn't say that it's just in spans.

nickf 2009-05-25 14:55:33

He said, it contains a "price like that", where that clearly is a "span".

Matthew Flaschen 2009-05-25 14:56:37

and it is also clearly an "example". you don't know that all the occurrences appear in spans.

nickf 2009-05-25 15:06:46

Answer 3

+1 A:

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

Patrick McElhaney's and Matthew Flaschen's answers are both good ways to solve the problem.

Chas. Owens 2009-05-25 14:52:50

The question asks how to filter using regexes AND javascript. You would leverage the DOM functions of javascript and then use a regex. A browser is as good a HTML parser as you'll need. There's *definitely* no need to be using a new parsing library just for this.

nickf 2009-05-25 15:00:16

@nickf As you say the browser is a parser. Look at the examples on the page, there are at least two that use the browser as the parser.

Chas. Owens 2009-05-25 16:13:29

Answer 4

A:

as Matthew Flaschen suggested, XPATH is a better way to go, if you know something about the node structure of the target document (and since you provided an example, you seem to). If you don't know the node structure, regexes are still lousy for parsing XML.

some more resources to kick-start you:

XPath in Javascript: Introduction
DOM Parsing With XPath and JavaScript
Mozilla dev-center: Introduction to using XPath in JavaScript

I've also found the FireFox extension combo of DOM Inspector and XPather to be an invaluable tool for deriving and testing XPath expressions on a given page. (If you're using another browser -- well, I don't know).

Michael Paulukonis 2009-05-26 14:00:11

ansaurus

tags:

views:

answers:

How to filter using Regx and javascript?

related questions