I am planing a database to store lots of text. (blog posts, news articles, etc.) The database needs to have the title, content (50k characters max), date, link and language fields. The same content can't occur on one link. Old content (older then 30 days, for example) will be deleted.
Now, the problem is the primary key. I could just set a automaticaly incrementing (SERIAL type) field and use it as a prymary key. But, it seems stupid and a waste of disc space, because the field wouldn't serve any purpose but to be a primary key. (and the field could eventually run out, or not?) And there's always the other performance issue: the content of each new row inserted need's to be checked for duplicates. So the other solution for the primary key I've come up with would be to calculate a sha256 hash of content+link value and then put that in a new 'hash' column and use that as a primary key. Two birds with one stone. Ofcorse, the problem with that are hash collisions. Is it a big threat?
I don't have any experiance with PostgreSQL, and very little experiance with DBMS's generaly, so I would appreciate a second opinion before a create a database with the performance characteristics of a snail on the highway (horrible comparison).
Please help me out here if you have any experiance with large databases. Is setting a 64 character string as a prymary key field a good idea in my situation? (because I'm under the impression that generally this is avoided)