Hi!
I want to develope simple web crawler, to grabb pages from several web sites and maintain them in actual condition. Some of this sites has session ids on each link, they doesn't store sesion ids in cookies at all. So, if i will parse site several times - my parsing table will containts dublicate pages with difference only in session id.
So my question is: how can I remove session id from all links, is there some intelligent idea? I'm developing on php, but all other platforms solutions will be useful, even just alhoritm on words.