Hello everybody,
I have 2 databases and I need to link information between two big tables (more than 3M entries each, continuously growing). The 1st database has a table 'pages' that stores various information about web pages, and includes the URL of each one. The column 'URL' is a varchar(512) and has no index.
The 2nd database has a table 'urlHops' defined as:
CREATE TABLE urlHops
(
dest
varchar(512) NOT NULL,
src
varchar(512) DEFAULT NULL,
timestamp
timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
KEY dest_key
(dest
),
KEY src_key
(src
)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Now, I need basically to issue (efficiently) queries like this: select p.id,p.URL from db1.pages p, db2.urlHops u where u.src=p.URL and u.dest=?
At first, I thought to add an index on pages(URL). But it's a very long column, and I already issue a lot of INSERTs and UPDATEs on the same table (way more than the number of SELECTs I would do using this index).
Other possible solutions I thought are: -adding a column to pages, storing the md5 hash of the URL and indexing it; this way I could do queries using the md5 of the URL, with the advantage of an index on a smaller column. -adding another table that contains only page id and page URL, indexing both columns. But this is maybe a waste of space, having only the advantage of not slowing down the inserts and updates I execute on 'pages'.
I don't want to slow down the inserts and updates, but at the same time I would be able to do the queries on the URL efficiently. Any advice? My primary concern is performance; if needed, wasting some disk space is not a problem.
Thank you, regards
Davide