tags:

views:

41

answers:

2

Our project has ~40 tables with complex relations.A colleague believes in using long join queries which enforces me to learn about tables outside of my module but I think I should not concern about tables not directly related to my module and use data access functions (written by those responsible for other modules) when I need data from them. Let me clarify:

I am responsible for the ContactVendor module which enables the customers to contact the vendor and start a conversation about some specific product. Products module has it's own complex tables and relations with functions that encapsulate details (for example i18n, activation, product availability etc ...). Now I need to show the product title of some product related to some conversation between the vendor and customers. I may either write a long query that retrieves the product info along with conversation stuff in one shot (which enforces me to learn about Product tables) OR I may pass the relevant product_id to the get_product_info(int) function.

First approach is obviously demanding and introduces many bad practices and things I normally consider fault in programming. The problem with the second approach seems to be the countless mini queries these access functions cause and performance loss is a concern when a loop tries to fetch product titles for 100 products using functions that each perform a separate query. So I'm stuck between "don't code to the implementation, code to interface" and performance. What is the right way of doing things ?

UPDATE: I'm specially concerned about possible future modifications to those tables outside of my module. What if the Products module decided to change the way they are doing things? or for some reason modify the schema? It means some other modules would break or malfunction until the change is integrated to them. The usual ripple effect problem.

A: 

I can see both sides of the question. However, given that you say there are only about 40 tables, it seems to me that the complexity of interacting with, say, 30 other tables outside your area of expertise is minimal. I'd vote for writing the queries that do the complex joins. After all, other than knowing the exact columns in the table, which should be easy, all you really need to know is any special relationships or special meanings that those tables use.

But the other fact that might make the decision for me is what are the performance requirements? If the system is currently fast enough, and the hardware and table sizes are such that it will still be fast enough using separate queries for each table, go ahead and do it using the 2nd approach if it makes everyone's life easier. But on the other hand, if anyone has ever complained about speed, or if the tables are likely to grow without bound in the future, or your number of users is likely to increase, do it using the method which has the best hope of being faster.

RESPONSE TO COMMENT: Not necessarily. Say the product module added 5 new columns. Either you were not using those already, and therefore not joining to them or retrieving them, and would not be affected regardless of which method you chose, or you need them and have to write new code for how to deal with them. At any rate, the changes to your side would be the same whichever method you chose. Say the Product module renamed or deleted 3 columns. Then your side would have to change how it handled the returned values, independent of how the interface was written. I see what you are getting at, but the bottom line, IMHO, is that if you use fields involved in the change, you have to make code changes. If the fields are not used by you, you don't. The "ripple affect" you mention should not exist if you only fetch columns you want, as opposed to using select * from tbl queries.

MJB
I see, but what if the Product module decided to modify its schema ? wouldn't the resulting ripple effect devastate many modules ? If, on the other hand, all other modules access products using the interface provided by the Product module, then nothing would go wrong (I hope! ;-) )
ashy_32bit
@ashkan.kh: My response was too long, so I added it above in the answer.
MJB
A: 

What about working with views here?

Instead of calling the get_product_info function, make every module maintainer provide views to that module, such as a product_info_view, and then use this view with your query. Like this you do not have to be conserned about the internals (tables) of such view but will still get the performance advantage, as the datbase engine will simplify the final query containing your code and the view.

inflagranti
Views appear to be the correct way of doing this. But I'm rather interested to know how these things are done in professional products. I suppose such basic facet of development should have been already well investigated by experts on the field.
ashy_32bit