views:

499

answers:

8

I am building a system that allows front-end users to define their own business objects. Defining a business object involves creating data fields for that business object and then relating it to other business objects in the system - fairly straight forward stuff. My question is, what is the most efficient storage strategy?

The requirements are:

  • Must support business objects with potentially 100+ fields (of all common data types)
  • The system will eventually support hundreds of thousands of business object instances
  • Business objects sometimes display data and aggregates from their relationships with other business objects
  • Users must be able to search for business objects by their data fields (and fields from related business objects)

The two possible solutions I can envisage are:

  • Have a dynamic schema such that when a new business object type is created a new table is created for storing instances of that object. The object's fields become columns in the storage table.
  • Have a fixed schema where instance data fields are stored as rows in basically a big long table.

I can see pros and cons to both approaches:

  • the dynamic schema allows me to index search columns
  • the dynamic tables are potentially limited in width by the max column size
  • dynamic schemas rule out / cause issues with replication
  • the static schema means less or even no dynamic sql generation
  • my guess is the static schema may perform like a dog when it comes to searching across 100,000+ objects

So what is the best soution? Is there another approach I haven't thought of?

Edit: The requirement I have been given is to build a generic system capable of supporting front-end user defined business objects. There will of course be restrictions on how these objects can be constructed and related, but the requirement itself is not up for negotiation.

My client is a service provider and requires a degree of flexibility in servicing their own clients, hence the need to create business objects.

+1  A: 

With out much understanding of your situation...

Instead of writing a general purpose one size fits all business objects system (which is the holy grail for Oracle/Microsoft/SAS/etc), why not do it the typical way, where the requirements are gathered, and a developer designs and implements the users' business objects in an effective manner.

If your users are typical, they will create a monster, which will end up running slow, and they will hate it. Most users will view the data as an excel sheet, and not understand relationships like: parent/child. As a result there will be some crazy objects built, and impossible to solve reports. You'll be forced to create scripts to manually convert many old objects to better/properly defined ones, etc...

KM
+1  A: 

Your requirements sound a little bit like an associative database with a front end to compose and edit entities.

I agree with KM above, unless you have a very compelling reason not to, you would be better off using a traditional approach. There are a lot of development tools and practices that allow you to build a robust and scalable system. Otherwise you will have to implement much of this yourself.

Alex Peck
+1  A: 

I don't know the best way to do this, because it sounds like something that has already been implemented by others. If I were asked to implement this feature, I would recommend buying a wheel instead of reinventing it.

Perhaps there are reasons you have to invent your own? If so, then you should add those reasons to the requirements you listed.


If you absolutely must be this generic, I still recommend buying a system that has been architected for this requirement. Not just the storage requirements, which are the least of the problems your customer will have; but also: how do you keep the customer from screwing up totally when given this much freedom. Some of the commercial systems already meet this challenge without going out of business because of customers messing up.

If you still need to do this on your own, then I suggest that your requirements (or perhaps those of another vendor?) must include: allow the customer to get it right, and help keep the customer from getting it wrong. You'll need some sort of UI to allow the customer to define these business objects, and the UI should validate the model that the customer builds.

I recommend a UI that works at a conceptual level. As an example, see NORMA, a Visual Studio add-in for Object-Role Modeling (the "other" ORM). Consider it as a example only, if your end users cannot afford a Visual Studio Standard license. Otherwise, you'll find that it is extensible, already produces many types of artifact (from SQL in various dialects to code), and will validate the model to see that it makes sense. End users would also be able to enter sample data that they believe should be valid, and the system will validate the data against the model.

If your customers are producing sensible (if dynamic) business objects, then the question of storage will be much simpler.

John Saunders
john - thanks for providing at least a considered and non sarcastic answer, something you wouldnt think too difficult. something i didnt make clear, users are NOT free to do anything they like - object creation will of course be through a UI and several layers of validation and abstraction from the database. my customers ARE 'producing sensible (if dynamic) business objects' - and how best to store these objects is exactly the question i was asking.
flesh
Thank you. I suggest you edit your question to say this. Otherwise, for the next several years, you'll be seeing new answers that say, "you're crazy". Also, may I presume there's a good reason why you have to do this yourself, and not base it on something commercial, like Siebel or salesforce.com?
John Saunders
A: 

"that answer is even worse. read my original question - what is the better storage approach - dynamic schema or long narrow table?"

That remark says it all. Formally defining/modeling the problem comes before defining the physical storage solution.

If you think that you can solve the problem of "physically storing just anything", or if you think that anyone alive today can assist you in finding the ultimate solution for that problem, then I just wish you the best of success.

You HAVE formally stated that your customer is not prepared to provide you with any details of his problem (AND I QUOTE) :

I am building a system that allows front-end users to define their own business objects. Defining a business object involves creating data fields for that business object and then relating it to other business objects in the system - fairly straight forward stuff. My question is, what is the most efficient storage strategy?

The requirements are: Any business object can have any number fields (of any common data types)

(that is indeed what you said because you did not provide an upper limit on the number of fields)

There can be any number of business object instances

(that is indeed what you said because you did not provide an upper limit on the number of business object instances)

Users must be able to search for business objects by their data fields (and fields from related business objects)

(I'll try and stay as polite as I still can and simply not comment on this one - but you might try and imagine by what else than the data fields users would want to search for.)

A: 

"you're being facile."

I don't think so. Taking the bother to explain to you why your "generic problem" is simply unsovable (at least as far as I can tell) with technology as it stands today, and taking the bother to reply FOUR times, is, believe me, nowhere near "facile".

Throwing a question on a forum like, "I've got this generic problem that no one else has ever solved in almost 50 years of IT science, but now I want one of you guys to solve it for me.", on the other hand, very much qualifies as "facile" to me.

" i know the problem, it has been modelled and it is generic. that's why i asked the question."

And that is precisely why I responded that a JDK is your solution. There may still be other tools around that allow some problem owner to create previously unspecified data types/object types, but a JDK is one of them.

"... users need to define and create data objects - objects that aren't known at design time. it's not a difficult concept (though for some it appears so)"

I agree that the concept isn't difficult. The concept of "nothing at all being known at design time" isn't diffcicult. But solving a problem like "nothing at allbeing known at design time" is. Has already been so for quite a number of years. Especially since "nothing at all", in this case, really seems to mean "anything at all that I, the user, might be thinking of at some time after design time, but not now".

Oh, and by the way, if you don't know your objects at design time, then how do you know that they are objects ?

@Erwin: as I stated in my edited answer, there are systems that solve this problem. Siebel is one that I've used, salesforce.com is another. They both address the question of having a pre-packaged system with built-in functionality that customers can heavily customize. They both permit customization of business objects, UI, and rules, and both even provide a web services UI. The Siebel system I used (4 years ago) required a DBA to make logical db changes, but then end user could map those to the business object level, with rules, etc.
John Saunders
@Erwin: also, this sort of back and forth discussion isn't the way SO works. I note that you're new here, so I'm taking this opportunity to point out the FAQ: http://stackoverflow.com/faq.
John Saunders
@John.By accident, I now notice that I can indeed comment. I thought I couldn't (which is the reason for the back-and-forth you complain about), because normally I don't have javascript enabled, and that seems to be required for commenting, so adding additional answers was the only available for me to respond.You might want to consider removing that javascript requirement.
@Erwin: I don't work here. :-) You should make that request at http://meta.stackoverflow.com. It will be interesting to see the response, as the site depends heavily on JavaScript.
John Saunders
John - your comment about Siebel suggests they have used a dynamic schema solution. I have not used it, but it sounds like a large scale version of what we are doing.
flesh
+3  A: 

I think your problem matches very well to a graph database like Neo4j, as it's built for the requested kind of flexibility from the beginning. It stores data as nodes and relationships/edges, and both nodes and relationships can hold arbitrary properties (in a key/value fashion). One important difference to a RDBMS is that a graph database won't need to lookup the relationships in a big long table (like in your fixed schema solution), so there should be a significant performance gain there. You can find out about language bindings for Neo4j in the wiki and read what others say about it in this stackoverflow thread. Disclaimer: I'm part of the Neo4j team.

nawroth
cheers for the suggestion ..
flesh
+2  A: 

Have you thought about an XML based solution? The requirements suggested to me "Build a system that allows users to dynamically generate an XML Schema and work with XML documents based on that schema." I don't know enough about storing and querying XML documents to comment on your original question.

Another possibility might be to leverage NHibernate's ability to generate database schemas. If you can dynamically generate business objects, then you can generate XML mappings or Fluent mappings and use that to generate a normalized database schema.

Jamie Ide
A: 

Every user that I have ever talked to has always wanted "everything" in their project. Part of the job of gathering requirements is to guide the user, not just write down everything they say.

Your only hope is to build several template objects, that they can add properties to, you could code your app to handle each type of these objects, but allow the user to still slightly modify each as necessary.

you need to inform the user upfront of the major flaws this type of design has. this will help you in the end, when it runs slow, or if they screw up and need help fixing something. I'd put this in writing

how many possible objects would they really need? perhaps you could set these up using your system first. I have developed several very customizable systems over the years and when the user is sitting at an empty screen, it is like a deer in the headlights.

in any event, good luck

racer x