I've put together a fairly simple crawling engine that works quite well and for the most part avoids getting stuck in circular loop traps. (Ie, Page A links to Page B and Page B links to Page A).
The only time it gets stuck in this loop is when both pages link to each other with a cachebuster querystring, basically it is a unique querystring on each and every link per refresh.
This causes the pages to always look like new pages to the crawler, and the crawler gets stuck moving between the two pages.
Aside from just breaking out after N amount of bounces between two pages with the only difference being the querystring (Which I don't think is a very good approach), is there any other way to detect and break out of these traps...?