views:

122

answers:

5

(Note that this situation isn't exactly how it is, but I made this as an example)

I have an entity in a table with data that is updated every 5 seconds (Kinematic Data: Speed, Heading, Lat, Long, and PositionTime), and other data that is updated hardly at all, if ever (Color, Make, OriginTime).

alt text

Now my boss wants me to partition this data into separate tables in our database (With a One to One Relationship), as so:

alt text

He makes it sound "obvious" that it should be this way, but is there really any advantages to having this data separated as so for inserting and updating (For instance if I put an index on Color or Make)?

A: 

Your boss is right. And this has nothing to do with "partitioning", its called normalization.

Read this article.

EDIT: Ok, "vertical partitioning" is a well-known term, and normalization is one method of vertical partitioning. But in this case, normalization seems to be the right answer, which explains the question (Quote: "... is there really any advantages to having this data separated as so for inserting and updating"). Advantages and disadvantages of normalization are very well-known. The wikipedia article is a good starting point.

And BTW, to keep the flames of "Erwin Smout" burning: "vertical decomposition" does not seem to be a commonly used term here. Right?

frunsi
There's nothing in the question to indicate there being more than one row per car in the original design. On the contrary, it speaks of "updating" every few seconds.
calmh
@calmh: I don't understand? Its "obvious", as the question indicates. There is room for interpretation, but to me it looks a lot like the questionier lacks knowledge of the normalization concept (no offense intended).
frunsi
I see there is now a reference to "inserting" at the end, the either wasn't there before or I missed. Removing downvote.
calmh
Now you can remove the downvote ;)
frunsi
+4  A: 

It might make sense to do vertical partitioning like this. Or it might not.

When you use a MVCC based engine, each time you update a row, it generally* copies the entire row and creates a new one with the modifications. This is so that other transactions which do not yet see the update can continue to read the original row if they need to.

This means that updating a few small columns frequently in a very wide row causes the database to do a lot more writes than it needs to.

But not that many, because generally the engine will only sync its transaction log, which will be the same size regardless of the size of non-updated columns, and also because the data rows are usually stored in blocks where a whole block needs to be written anyway, regardless of how much of it changed.

So it sounds like a potentially pointless optimisation, which like any other, should be considered on the grounds of a) IS there really a performance problem (i.e. is ANY optimisation needed) and b) Is this particular optimisation the best way of fixing it?

I think the chance of a) is unlikely, and b) is also unlikely, so the chances of this being required is approximately unlikely-squared.

* Some engines make an exception for very large columns such as big BLOBs or text columns, which are held elsewhere and not copied if other columns in the row are updated.

MarkR
I agree with MarkR. Ask your boss why it needs to be changed. If he tells you because of performance issues, you need to check if there are any performance problems, never ever optimize performance if there's no problem. If you have performance issues you need to partition not normalize. If he tells you it's because of "it just has to be like that", you can ether argue with him about sense and nonsense of db normalization or just change it to his (old school) design. (I would do the last.)
Beffa
A: 

"And this has nothing to do with "partitioning", its called normalization."

ROTFL.

As if "normalization" is about anything else but "partitioning" (or, more accurately, "vertical decomposition").

And yes, of course his boss is indeed "right".

EDIT (in response to the following - and because "add comment" does not work) :

"The term partitioning "usually" has a different in meaning when talking about databases. And you already softened your statement by trying it to give another term.. so what do you want to say?"

The term "partitioning" usually refers to a technique that is applied in the area of physical database design. I expect that you know that this technique actually boils down to a kind of horizontal partitioning ("row-wise partitioning").

The title very clearly mentioned "vertical partitioning". Therefore, the question left very little doubt that the subject was in the area of logical database design. Of course I cannot know whether you can tell the difference with physical database design.

So you are criticizing me for having understood the OP's question correctly, despite of his use of inaccurate terminology.

Oh, yes, that's another thing : I did not "try to give it another term". I gave the correct , and already existing, term for the technique.

Oh, yes, and it's not "has a different IN meaning", it's "has a different meaning".

Erwin Smout
The term partitioning "__usually__" has a different in meaning when talking about databases. And you already softened your statement by trying it to give another term.. so what do you want to say?
frunsi
And please, next time just add a comment, and avoid once-shot accounts.
frunsi
You offend my answer with "ROTFL", and now you complain that I have critized you? What is going on here, what is your point exactly? You left the arguing path before even entering it (in fact your first non-quoted word was ROTFL..). I will not further comment this.
frunsi
+1  A: 

If the point of this design is to maintain a history of the kinematic data then the design makes sense. Although there doesn't seem to be a key in the CAR_KINEMATIC table which fits that usage. If on the other there is a one-to-one relationship between these two tables the division is meritless.

APC
A: 

I'm not sure that the question is fully clear. If you want to main a history of the kinematics then the appropriate structure would be to normalise the data into the car data and the heading data. The car data can be updated independently and would probably be much smaller overall than the kinematics data.

If you want to maintain a flat record with the current state of the car, rather than maintaining a history, then leaving the data as it is is likely to be faster. The reason for this is that writing the whole record will probably entail just a single write operation in the majority of cases. Splitting it out into two tables guarantees that there will be at least two write operations.

In the first case you are just normalising the data; in the second case the current data structure is probably the most efficient.

Vertical partitioning is actually not that commonly used (except when it is, see below). Some scenarios where you might want to use vertical partitioning are:

  • The table is very wide and only some of it is used frequently. For example, if you have a table with 250 columns with 5 getting frequent state changes updated and a small subset of columns being used frequently by the application.

  • For security reasons, you may have a mix of confidential and not-so-sensitive data that lives in a 1:1 relationship. You can move the confidential data into another table with a different set of permissions. Historically, not all DBMS platforms allowed you to set permissions at column level.

  • A combination of the previous two, where changes to certain fields must be logged to an audit table, but other fields are updated very frequently without the requirement for logging. In order to avoid generating lots of spurious audit logging data the auditable fields can live in their own table with audit logging triggers.

Finally, you do get vertical partitioning behind the scenes in certain circumstances (i.e. it is not explicit in the schema but the physical storage works in this way). For example, many DBMS platforms store LOBs separately from normal table data, which results in a sort of implicit vertical partitioning of the table.

In fact, this particular situation makes tables with LOB columns quite expensive to do operations on, so moving the LOB column off into a separate table might well be a good application for vertical partitioning.

There are not many uses for vertical partitioning, and it always adds the overhead of additional I/O. You need to be avoiding a large overhead or have specific reasons such as security concerns for there to be much point in using it.

ConcernedOfTunbridgeWells