views:

246

answers:

2

Hi,

I am Looking for html string jquery parser ( parse <a> links and <img> images) or for code that will parse all links and images from html string (Html string can be very big).

Example:

input:

sdsds<div>sdd<a href='http://google.com/image1.gif'image1&lt;/a&gt;  sd</div>
sdsdsdssssssssssssssssssssssssssssssssssss <p> sdsdsdsds  </p>
sdsds<div>sdd<img src='http://google.com/image1.gif'image1 alt="car for family">  sd</div>

output links: value+href (if value empty not return such link)

output images:src + alt

Its very important for me to find the most efficient way.



Edit:

the function should looks like that return multi dimensial array.

like: arr[links][href][value] arr[images][src][alt]

function parseLinksAndImages(htmlString){
.........................
........................
return linksAndImagesArrMultiDimensial;
}

(or in other better way if you have)

Thanks

+2  A: 

You can simply go thourgh them like this:

$('a').each(function(){
   if ($(this).attr('href') && $(this).text()) { // get only links with href set
      // your code...
   }
});

$('img').each(function(){
   if ($(this).attr('src') && $(this).attr('alt')) { //get images with src and alt set
     // your code...
   }
});
Sarfraz
+1! To add a bit of 'backstory', you could take the string in the original question, say it's in a variable like `htmlStuff`, turn it into navigatable elements and operate on the stuff you're looking for like so: `jQuery( htmlStuff ).find('a').each(linkFunc).end().find('img').each(imgFunc);`
thenduks
Thanks for answer,in your solution the string should iterated 2 times - Can be done in only 1 iteration?
Yosef
@Yosef: Same iteration for what/which tag?
Sarfraz
<a> value+href + <img> src + alt - if its possible with jquery.
Yosef
@Yosef: See my updated answer plz
Sarfraz
Also I need parse external html string and not the webpage. function parseLinksAndImages($htmlString){..code....return linksAndImages;}
Yosef
does you way working more fast for very big html string than patrick way?
Yosef
@Yosef: Both ways are practically the same. The key is to run the html string through jQuery to create a DOM structure that is traversable. I'd take a nice long casual walk through the jQuery docs :)
thenduks
+2  A: 

Assuming your string is valid HTML, you could do something like this:

Try it out: http://jsfiddle.net/WsDTL/2/ (updated from original to use .each() instead of .map() in order to avoid using .split() )

var string = "sdsds<div>sdd<a href='http://google.com/image1.gif'&gt;image1&lt;/a&gt;  sd</div>sdsdsdssssssssssssssssssssssssssssssssssss <p> <a href='some/href'></a> sdsdsdsds  </p>sdsds<div>sdd<img src='http://google.com/image1.gif' alt='car for family' />  sd</div>";

var $container = $('<div/>').html(string);

var result = [];

$container.find('a,img').each(function() {
    if(this.tagName.toUpperCase() == 'A') {
        if($.trim( this.innerHTML ) != '') {
            result.push([this.tagName,this.innerHTML,this.href]);
        }
    } else {
        result.push([this.tagName,this.src,this.alt]);
    }
});

alert(result);​

EDIT:

If you meant that you don't want to process the <a> if it doesn't have an href attribute, then change the code to this:

$container.find('a[href],img').each(function() {
    if(this.tagName.toUpperCase() == 'A') {
        result.push([this.tagName,this.innerHTML,this.href]);
    } else {
        result.push([this.tagName,this.src,this.alt]);
    }
});

EDIT:

For storage as in your comment, you would make results a javascript object, and store the arrays under the links and images keys.

var string = "sdsds<div>sdd<a href='http://google.com/image1.gif'&gt;image1&lt;/a&gt;  sd</div>sdsdsdssssssssssssssssssssssssssssssssssss <p> <a href='some/href'></a> sdsdsdsds  </p>sdsds<div>sdd<img src='http://google.com/image1.gif' alt='car for family' />  sd</div>";

var $container = $('<div/>').html(string);

   // store results in a javascript object
var result = {
     links:[],
     images:[]
}; 

$container.find('a[href],img').each(function() {
    if(this.tagName.toUpperCase() == 'A') {
        result.links.push([this.tagName,this.innerHTML,this.href]);
    } else {
        result.images.push([this.tagName,this.src,this.alt]);
    }
});

alert(result.links);
alert(result.images);

You can do 2 separate loops if you prefer. Not sure which will perform better.​

http://jsfiddle.net/WsDTL/3/

patrick dw
Thanks its looks very efficient way, does your function make 1 iteration of the string like it looks or find function make separate iteration for a tag and after for img tag?
Yosef
@Yosef - What's happening is that they are being turned into DOM elements in memory (stored in $container). The `.find()` is just a typical jQuery `.find()`. Its performance will depend on the length of the string, and the browser, but overall should be very quick.
patrick dw
@Yosef ...also, when you said *"(if value empty not return such link)"* I assumed you meant the "content" of the `<a>` element. Or did you mean the value of the `<a>` element's `href` attribute? If so, it should change a little. See update.
patrick dw
<a href="link">value</a>.About find: you use 2 tags to find with find function- my question how jquery find works - option 1:: iterate once the string and looking for 2 tags or option 2:: first find looking for <a> after she finish to look inside the string she doing it again with <img>.(Html string can be also very big.)
Yosef
can be your code improved with multi array for fetch links and images separetly?My code edit proposal:result=[][];.........if....result['links'].push([this.tagName,this.innerHTML,this.href]);elseresult['images'].push([this.tagName,this.src,this.alt]);
Yosef
@Yosef - Again, the string is not being parsed by jQuery. It is being turned into actual DOM elements. The native `getElementsByTagName()` method is being called twice, but it is very fast. It is returning only the elements you want, and processing those. It may be a little quicker to do two separate loops, only because you wouldn't need to test to see which type of element you have. But I doubt it would make much difference. I'll update my answer to place the results into separate arrays, and will let you decide if you want 2 separate loops.
patrick dw
Thanks you very much for great answer with explanation, I don't need 2 loops - you solution very efficient.
Yosef
@Yosef - You're welcome. :o)
patrick dw