I am trying to store in a postgresql database some unique identifiers along with the site they have been seen on. I can't really decide which of the following 3 option to choose in order to be faster and easy maintainable. The table would have to provide the following information:
- the unique identifier which unfortunately it's text
- the sites on which that unique identifier has been seen
The amount of data that would have to hold is rather large: there are around 22 millions unique identifiers that I know of.
So I thought about the following designs of the table:
id - integer
identifier - text
seen_on_site - an integer, foreign key to a sites table
This approach would require around 22 mil multiplied by the number of sites.
id - integer
identifier - text
seen_on_site_1 - boolean
seen_on_site_2 - boolean
............
seen_on_site_n - boolean
Hopefully the number of sites won't go past 10. This would require only the number of unique identifiers that I know of, that is around 20 millions, but it would make it hard to work with it from an ORM perspective.
- one table that would store only unique identifiers, like in:
id - integer
unique_identifier - text,
one table that would store only sites, like in:
id - integer
site - text
and one many to many relation, like:
id - integer,
unique_id - integer (fk to the table storing identifiers)
site_id - integer (fk to sites table)
- another approach would be to have a table that stores unique identifiers for each site
So, which one seems like a better approach to take on the long run?