views:

1508

answers:

4

This is for research purposes on http://cssfingerprint.com

Consider the following code:

<style>
  div.csshistory a { display: none; color: #00ff00;}
  div.csshistory a:visited { display: inline; color: #ff0000;}
</style>

<div id="batch" class="csshistory">
  <a id="1" href="http://foo.com"&gt;anything you want here</a>
  <a id="2" href="http://bar.com"&gt;anything you want here</a>
  [etc * ~2000]
</div>

My goal is to detect whether foo has been rendered using the :visited styling.

  1. I want to detect whether foo.com is visited without directly looking at $('1').getComputedStyle (or in Internet Explorer, currentStyle), or any other direct method on that element.

    The purpose of this is to get around a potential browser restriction that would prevent direct inspection of the style of visited links.

    For instance, maybe you can put a sub-element in the <a> tag, or check the styling of the text directly; etc. Any method that does not directly or indierctly rely on $('1').anything is acceptable. Doing something clever with the child or parent is probably necessary.

    Note that for the purposes of this point only, the scenario is that the browser will lie to JavaScript about all properties of the <a> element (but not others), and that it will only render color: in :visited. Therefore, methods that rely on e.g. text size or background-image will not meet this requirement.

  2. I want to improve the speed of my current scraping methods.

    The majority of time (at least with the jQuery method in Firefox) is spent on document.body.appendChild(batch), so finding a way to improve that call would probably most effective.

    See http://cssfingerprint.com/about and http://cssfingerprint.com/results for current speed test results.

The methods I am currently using can be seen at http://github.com/saizai/cssfingerprint/blob/master/public/javascripts/history_scrape.js

To summarize for tl;dr, they are:

  1. set color or display on :visited per above, and check each one directly w/ getComputedStyle
  2. put the ID of the link (plus a space) inside the <a> tag, and using jQuery's :visible selector, extract only the visible text (= the visited link IDs)

FWIW, I'm a white hat, and I'm doing this in consultation with the EFF and some other fairly well known security researchers.

If you contribute a new method or speedup, you'll get thanked at http://cssfingerprint.com/about (if you want to be :-P), and potentially in a future published paper.

ETA: The bounty will be rewarded only for suggestions that

  • can, on Firefox, avoid the hypothetical restriction described in point 1 above, or
  • perform at least 10% faster, on any browser for which I have sufficient current data, than my best performing methods listed in the graph at http://cssfingerprint.com/about

In case more than one suggestion fits either criterion, the one that does best wins.

ETA 2: I've added width-based variants of two previous-best test methods (reuse_noinsert, best on Firefox/Mozilla, and mass_insert, its very close competitor). Please visit http://cssfingerprint.com several times from different browsers; I'll automatically get the speed test results, so we'll find out if it's better than the previous methods, and if so by how much. Thanks!

ETA 3: Current tests indicate a speed savings using offsetWidth (rather than getCalculatedStyle/currentStyle) of ~2ms (1.8%) in Chrome and ~24ms (4.3%) in Firefox, which isn't the 10% I wanted for a solid bounty win. Got an idea how to eke out the rest of that 10%?

+3  A: 
Gaby
-1 because I'm pretty sure it doesn't. http://api.jquery.com/category/selectors/
Andy E
@Andy, try http://jsbin.com/okina and you will be surprised ;)
Gaby
This would not meet the "don't inspect the direct element" point. Do you think it would be faster? jQuery's :visible already is based on offsetHeight.
Sai Emrys
Andy E
Gaby
@Gaby: Silly me, JS was disabled in Fx (I was testing something recently and left it off :-)). Tell ya what, I'll remove the -1 but I don't think I could stretch to a +1 because IE does still have the majority market share ;-)
Andy E
@Andy, fair enough :)
Gaby
FWIW, I am totally fine with browser-specific hacks if they fulfill either of my two goals (restriction subversion and speed).Is there really *no* way at all to access the content:? If there were, then I could have the counter be a bit-vector and extract it by dumping the resulting int back to binary. OTOH, I suspect it's not capable of handling counters of size 2^2000... :-PIt's an interesting idea, but I don't think it actually helps. :/
Sai Emrys
@Sai, i have found no way to access it .. i tried using it as a value to other css properties but it will not work.. and it is not inserted in the dom .. Maybe grab the screen and OCR :p but that would really fail the speed test.. lol about the 2^2000, would be nice to see JS handle that ;)
Gaby
@Gaby: I find the handling of CSS `content:` a bit odd. It's in the DOM, yet not. I have a niggling feeling that there might be something open to manipulation there, but I don't know how.
Sai Emrys
@Sai, yes.. it is very weird indeed. I haven't managed to get access to it with any way ... not js, not any kind of debuggers... the w3 specification mentions `Generated content does not alter the document tree. In particular, it is not fed back to the document language processor (e.g., for reparsing).`
Gaby
@Gaby: I've added two width-based variants to the main page speed tests. Please hit it a few times with various browsers so I can get better data on its performance. Thanks.
Sai Emrys
@gaby: There is a jQuery *plugin* that supports :visited. From looking at it, I think it's probably less efficient than my code. Re. width, current tests indicate a speed savings of ~2ms (1.8%) in Chrome and ~24ms (4.3%) in Firefox, which isn't the 10% I wanted for a solid bounty win. Got an idea how to eke out the rest of that 10%?
Sai Emrys
@Sai, every attempt (from either chrome/firefox 3.5.8) stays at the *Calculating results... 1/5* forever... the jobs end, but the result calculations do not..
Gaby
@Gaby: Yeah, the scraper AI has an issue I'm working on. It doesn't affect the browser tests though; those are automatic when you hit the page (results visible at the bottom) and don't require you to do the scraping.
Sai Emrys
+1  A: 
  1. add a child inside the anchor (for example a span)
  2. use color : inherit
  3. detect the color of the child (JS)

caveat: afaik it won't work on lte ie7

for lte ie7 ull have to

  • add visibility : hidden on a:visited and visibility : inherit on the child
  • check the visibility of the child using javascript (hidden = visited)
Knu
Something that works based on visibility isn't okay for my point #1; that has to rely purely on color. I'll test the color:inherit.
Sai Emrys
well then you could add ie7 js http://ie7-js.googlecode.com/svn/test/index.html - this way the inherit property will be supported
Knu
or replaceLocalStyles.js
Knu
ok here's something you could try for LTE ie7 // use direction (css) set it to RTL (and inherit for the child) and then put the DIR attribute (LTR) on the tag to revert it back and then check the direction of the child (in JS)
Knu
+1  A: 

A similar idea, but sidestepping .getComputedStyle():

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"&gt;
<html lang="en">
    <head>
        <title></title>
        <meta http-equiv="Content-Type" content="text/html;charset=utf-8">

        <style type="text/css">
            a:visited { display: inline-block; font-family: monospace; }
            body { font-family: sans-serif; }
        </style>

        <script type="text/javascript">
            function test() {
                var visited = document.getElementById("v").childNodes[1].firstChild.clientWidth;
                var unvisited = document.getElementById("u").childNodes[1].firstChild.clientWidth;
                var rows = document.getElementsByTagName("tr");

                for (var i = 1, length = rows.length; i < length; i++) {
                    var row = rows[i];
                    var link = row.childNodes[1].firstChild;
                    var width = link.clientWidth;

                    row.firstChild.appendChild(document.createTextNode(link.href));
                    row.childNodes[2].appendChild(document.createTextNode(width === visited ? "yes" : (width === unvisited ? "no" : "unknown")));
                }
            }
        </script>
    </head>

    <body onload="test()">
        <table>
            <tr><th>url</th><th>link</th><th>visited?</th></tr>
            <tr id="u"><td></td><td><a href="http://invalid_host..mplx/"&gt;l&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;
            <tr id="v"><td></td><td><a href="css-snoop.html">l</a></td><td></td>
            <tr><td></td><td><a href="http://stackoverflow.com/"&gt;l&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;
            <tr><td></td><td><a href="http://www.dell.com/"&gt;l&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;
        </table>
    </body>
</html>

The trick, of course, is ensuring that visited and unvisited links have different widths (here, by using sans-serf vs. monospace fonts) and setting them to inline-block so that their widths can be accessed via clientWidth. Tested to work on FF3.6, IE7, Chrome 4, and Opera 10.

In my tests, accessing clientWidth was consistently faster than anything which relied on computed styles (sometimes by as much as ~40%, but widely varying).

(Oh, and apologies for the <body onload="..."> nonsense; it's been too long since I tried to do events in IE without a framework and I got tired of fighting it.)

Ben Blank
Anything that depends on different fonts (or anything other than color) doesn't work for my point #1. For #2, this is essentially what I'm already doing with my method named 'jquery', which uses jQuery's :visible selector, which is based on widths (and unvisited links are display:none, so zero width). Why would your method be faster?
Sai Emrys
As I said, it's very similar; I just wanted to see if avoiding computed styles improved speed. It seems to, but I don't trust the numbers I'm getting (the margin of error is too large). If I can get some tighter sample grouping, I'll post the results.
Ben Blank
Why are you using clientWidth rather than offsetWidth like Gaby suggested above?
Sai Emrys
Habit. IE before 8 calculates a handful of offsetFoo properties incorrectly when dealing with relative elements, so when I was primarily using DOM methods I tended to use clientFoo instead. Not that those problems would have affected this test, as there *are* no relative elements. I don't see Gaby's comment, though?
Ben Blank
I'm going to give the toss to Gaby then on the 'use width' idea, since zie said it first. See the "[update]" section of zir reply.
Sai Emrys
A: 

Since all versions of IE (Yes, even version 8 if you enable quirks) support CSS expressions the color property is still unsafe. You could probably speed up IE testing with this (untested):

a:visited { color: expression( arrVisited.push(this.href) ); }

Also this isn't really covered by your question but you can of course set properties in child nodes very easily to initiate detection and any solution would have to prevent that too:

a.google:visited span { background-image: url(http://example.com/visited/google); }

You need to protect adjacent siblings too, not just descendants:

a.google:visited + span { }

Also untested but you could probably do a heavy speedup using the content property to modify the DOM and then some XPath to find the new nodes.

a.google:visited:before {content: "visited"; visibility: hidden;}

XPath:

visited links = document.evaluate('//a[text()="visited"]')
SpliFF
#1 is talking about a potential patch by Mozilla. I think IE will always be vulnerable.ISTR that CSS expressions are very frequently reevaluated, which makes me concerned that using them would be fairly CPU-inefficient. The background-image trick is well known, but would likely be a bit slow (one connection per hit, limit 2-8 in parallel).I haven't been able to find a way to access the :before content at all; please try to test that.
Sai Emrys
:before isn't supported on lte ie7
Knu
The issue isn't `:before`/`:after` being supported, it's that their `content:` is not inserted into the DOM. See comment thread on Gaby's response.
Sai Emrys