Assuming IPv4 addresses, there is a search space of 232. You need no more than 1 bit per IP address (0 == no visit, 1 == visit). Without considering storage overhead, this would take 512 MB (229) to store. So a simplistic implementation would allocate a 512 MB array (or a table with 229 rows, each storing a byte, or 227 rows, each storing a 32-bit integer, or 226 rows, each storing a 64-bit integer, or...)
You could optimize this for sparse population by turning it into a tree.
Define a "page" size of 2x bits. You will allocate storage for one page at a time.
Divide your search space (232) by your page size. This is the total number of pages required to represent every possible address in your search space.
Then, to determine if there is a hit in your hash, you will first determine if the page is present, and if so, whether the appropriate bit in the page is set. To cache an address, you will first determine if the page is present; if not you will create it. Next you will set the appropriate bit.
This molds fairly easily to a database table. You would need just two columns: a page index and a binary array. When you allocate a page, you will simply store a row in the table with the correct page index and an empty binary array.
For instance, for a 1024-bit page size (yielding 222 maximum pages), you might structure your table like this:
CREATE TABLE VisitedIPs(
PageIndex int NOT NULL PRIMARY KEY,
PageData binary(128) NOT NULL
)
To check whether an IP has visited, you would use code similar to (pseudocode):
uint ip = address.To32Bit();
string sql =
"SELECT PageData " +
"FROM VisitedIPs " +
"WHERE PageIndex = " + (ip >> 10);
byte[] page = (byte[])GetFromDB(sql);
byte b = page[(ip & 0x3FF) >> 3];
bool hasVisited = (b & (1 << (ip & 7)) != 0;
Setting that an IP has visited is similar:
uint ip = address.To32Bit();
string sql =
"SELECT PageData " +
"FROM VisitedIPs " +
"WHERE PageIndex = " + (ip >> 10);
byte[] page = (byte[])GetFromDB(sql);
page[(ip & 0x3FF) >> 3] |= (1 << (ip & 7));
sql =
"UPDATE VisitedIPs " +
"SET PageData = @pageData " +
"WHERE PageIndex = " + (ip >> 10);
ExecSQL(sql, new SqlParam("@pageData", page));