views:

17

answers:

1

I am well aware that this is an unusual thing I'm looking to do here, but just trust me, yes I do need to do this. It's all about pattern matching within the raw html (same as view source) for a web page.

If a user selects some text in a web browser, I want to be able to say - that selection is at position (say) 1234, (or there abouts) within the string that would make up the view source for a document.

The main problem is that all the information you can get around the user selection is related to the DOM representation of the document html, which is not the same as the view source (raw html).

I can get a fair bit of information client side, I'm thinking I'm going to have to pass this back server side and do some fuzzy logic type stuff to workout roughly where in the raw html the selection relates.

I don't have much of experience of 'inferred' decision making within a program.

Can anyone make a helpful suggestion about how this might be approached (Cos my brain's smoking a bit!).

+1  A: 

Crazy thing you want to do... You could create a webservice which takes two input parameters (and returns the position of the first match):

  • selected text
  • referral page (exact url)

Via JavaScript you now query when the user selects some text (maybe have a button? "check position") and send that string of text to the webservice as well as the current URL.

The webservice now downloads the HTML as a string, e.g. like so:

using (WebClient client = new WebClient ())
    string htmlCode = client.DownloadString("http://mypage.com/page.html");
}

And then all you have to do is search within the HTML page string for your string and return the first occurance (maybe using htmlCode.indexOf(myPassedSelectedText)) back via the webservice.

moontear
That's sort of where I'm getting to. The information I want back out is generally numbers, so there's a good chance there will be several matches. I need to make a decision about which is the right match. I'm thinking somehow that a rough position within the document is going to be needed to throw out the matches that are definitely off. Like - there are 5 matches, but this one is 70% through the total length of the document, so it's a good match. +1
BombDefused