views:

81

answers:

1
+3  Q: 

HTML Search Engine

On a search engine, such as Google, if you want to find the pages on a site where a certain word is used, you would search for something like "thickbox site:jquery.com".

However, if you wanted to search for the presence of the jQuery ThickBox library on a website, it would be nice to be able to search for something like this:

That I know of, you can't search for HTML level elements on any popular search engine. Short of crawling a full site, I don't know of any other way to check if a particular website is using a certain CSS file, JavaScript file, or meta tag.

Can this task be done without crawling the full website? Are there HTML/metadata search engines out there? Are there any other ways to program such a tool?

+1  A: 

Well, it all depends on the internals of the search engine. I've built a small and simple one, and I only stored the words that (hopefully) actually appeared (except when the background was the same color of the font and stuff).

IMHO, at least for now, search engines are not too interested in having features like that, once they don't add much value, and would increase complexity in all steps of building one.

There are some HTML parsers out there, you could try checking only the script tags or stuff like that. Usually this stuff will be all in the head portion of it, so you don't need to get the whole page.

Samuel Carrijo