I created a table to insert all the documents of my application. It is a simple table (let's call it DOC_DATA) that has 3 fields: DOC_ID, FileSize, Data. Data is varbinary(max).
I then have many tables (CUSTOMERS_DOCUMENTS, EMPLOYEES_DOCUMENTS, ...) that contain other data (like "document description", "Created by", "Customer ID" ...). My case is not exactly like this, anyway by writing this example I can express myself better. All these tables have a FK to DOC_DATA.DOC_ID).
When the user searches for a customer document he will run a query similar to this:
select CD.*, DD.FileSize
from DOC_DATA DD
join CUSTOMERS_DOCUMENTS CD ON CD.DOC_ID = DD.DOC_ID
My question is: will the performance of this query be bad because we are reading also a field from a table that is potentially huge (the DOC_DATA table can contain many GB of data) or this is not a problem?
The alternative solution is to put the FIleSize field in all the main tables (CUSTOMER_DOCUMENTS, EMPLOYEES_DOCUMENTS, ...). Of course a join has a little impact on the performance, now I am not asking about to join or not to join in general, but to join or not to join a HUGE table while I am not interested in the HUGE fields.
Please note: I am not designing a new system, I am maintaining a legacy system, so here I am not discussing which is the best design in general, but just which is the best option in this case.