views:

130

answers:

3

I have a table containing user input which needs to be optimized. I have some ideas about how to solve this but i would really appreciate your input on this. The table that needs optimization is called Value in the structure below. All tables mentioned below has integer primary keys called Id.
Specs: Ms Sql Server 2008, Linq2Sql, asp.net website, C#.

The current structure looks as follows:
Page -> Field -> FieldControl -> ValueGroup -> Value

Page

A pages is a container for one or more Fields.

Field

A field is a container for one or more FieldControls such as a textbox or dropdown-options.
Relationships: PageId

FieldControl

If a Field is of the type 'TextBox' then a single FieldControl is created for the Field. If a Field is of the type 'DropDown' then one FieldControl per dropdown option is created for the Field containing the option text.
Relationships: FieldId

ValueGroup

Each time a user fills in Fields within a Page and saves it, a new ValueGroup (Id) is created to keep track of user input that is relevant to that save. When a user wants to look at a previously filled in form, the valuegroup is used to load the Values into the FieldControls of that previously filled in instance.
Relationships: None

Value

The actual input of a FieldControl. If the user typed 'Hello' in a TextBox then 'Hello' would be stored in a row in this table followed by a reference back to which FieldControl 'Hello' was inputted for. A ValueGroup is linked to values in order to group them to keep track of which save/instance they belong to as described in ValueGroup.
Relationships: ValueGroupId, FieldControlId

The problem

If 100.000 Pages are fully filled in, containing 10 TextBoxes each then we get 100.000 * 10 records in the Values table meaning we quickly reach one million records making it really slow as it is now. The user can create as many different pages with as many different Fields as he/she likes and all these values are stored in the Values table. The way i use this data is by either displaying a gridview with pagination that displays all records for a single Pagetype, or when looking at a specific Page instance (Values grouped by ValueGroupId).

Some ideas that i have:
Good indexing should be very important when optimizing the Values table.
Should i perhaps add a foreign key directly back to Page from Value, ending up with indexing by (Id, PageId, ValueGroup) allowing the gridview to retrieve values that are only relevant for one Page?

Should i look into partitioning the table and if so, how would you recommend that i do this? I was thinking that partitioning by Page, hence getting chunks of values that are only relevant to a certain page would be wise in this case right? How would the script/schema look for something like that where pages could be created/removed at any time by the users.

PS. There should be a badge on this forum for all people that finished reading this long post, and i hope ive made myself clear :)

A: 

This may be slightly off-topic, but why? Is this data that you need to access in real-time, or is it for some later processing? Could you perhaps pack the data into a single row and then unpack it later?

Tim Almond
The data is needed most intensively when the gridview displays all current page entries. Lets say that the admin creates a page for 'Companies' with fields such as Name, Address, phone etc and gives access to salespeople to add/edit/delete instances of. Salespeople can then log in to first list all current companies that they have created as well as create new ones which will be added to the list.
CodeSpeaker
A: 

Generic

You say it is slow now and that can be many reasons for that other than the database like low memory, high CPU, disk fragmentation, network load, sockets problems etc etc. This should show up on a system monitor

Try for instance Sysinternals (now MS) tool: http://live.sysinternals.com/procexp.exe

But if that is all under control then back to the database.

Database index

One million records is not "that much" and should not be a problem. An index should do the trick if you don't have any indexes right now. You should probably set indexes on all tables if you haven't done so already.

I tried to do a database model, is this right: http://www.freeimagehosting.net/image.php?a39cf99ae5.png

Table structure (?)

Page -> Field -> FieldControl -> ValueGroup -> Value

The table structure looks like it may not be the optimal one but it is hard to say exactly when I don't know how the application works.

Do all tables have the foreign keys of the table above ?

Is this somewhat similar to your code ?

Pseudo code:

1. Get page info. Gives key "page-id"
2. Get all "Field":s marked with that "page-id".
   Gives keys "field-id" & "fieldcontrol-id"
3. Loop trough all fields-id:s and get the FieldControl for each one
4. Loop trough all fields-id:s and get all ValueGroup:s.
   Gives a list of "valuegroup-id":s keys
5. Loop trough all ValueGroup:s and get all fields
Here is what my structure actually looks like: http://freeimagehosting.net/uploads/95d4b3767e.gifYes all tables have fk´s to the table above, and are indexed on their pk´s (Id) and by adding a non-clustered index to FieldValueGroupId in the Values table things went a LOT faster. I also noticed a bug in the paging of the gridviews which caused the query to return more than was really needed and once this has been optimized, i think the performance should be much better.
CodeSpeaker
+1  A: 

Just to close this post. Correct indexing solved all performance problems.

CodeSpeaker