I'm writing a custom built crawler, and need to know if a specific url is crawled or not, so I won't add the same url twice. Right now I'm using mysql to store hash values of each url. But I'm wondering if this may become very slow if I have a large set of urls, say, hundreds of millions.
Is there other ways to store urls? Do people use lucene to do this? Or is there specific data structure to do this?