views:

34

answers:

2

Hi

i have a huge database, around 1 TB in size, most of the space is consumed by a table which stores images, the tables has right now almost 800k rows.

server response time has increased, i would like to know which techniques should i use or you recomend, partitioning? o how to reorganize the table

every row is accessed by the image id column, and it has its clustered index by that column, and every two days i reorganize the index and every 7 days i rebuild it, but it seems not to be working

any suggestions?

+1  A: 

If the table is clustered by image_id and you access always by image_id then the size of the table is irrelevant, and so is the fragmentation (no need to rebuild).

If you see performance decrease, then there most be something else at play. You are doing range scans? Look in sys.dm_db_index_usage_stats, does the user_scans column differ from 0? It means you have queries that do scans.

Unless you measure where the time increase occurs, you'll be shooting blanks in the dark and never solve the problem correctly. Apply a methodological approach, like Waits and Queues to identify the problem.

One thing I can tell you right now: partitioning is never a performance improvement. It is intended for data maintenance (switch in/switch out) and for spreading the load on controlled fashion on filegroups. But you can never expect partitioning to improve performance, you can at best hope for equal performance with non-partitioned table.

Remus Rusanu
thanks, will look up for that... thought partitioning would do..
jgemedina
Any performance improvement partitioning can do (via partitions elimination), an index can do better.
Remus Rusanu
A: 

If the response time is increasing, you must be doing more with this table than just pulling images for ids?

What other data columns are stored in your images table?

If you have a clustered index on an id (probably identity), that's fine, but adding an additional nonclustered index which can be covering for search criteria will probably help.

Say you also have columns for name or tag or region or whatever in this images table (and assuming you aren't going to vertically partition this table into separate tables), then having a nonclustered index on tag, id INCLUDE(name), say or something which matches your usage patterns will help a lot.

Remember: A clustered index is not an index, it's just the way the data is organized. It will usually not help much in any kind of search operations - it primarily works well on identity lookups, when you are reading almost every column, and streaming data in the order of the clustered index.

Cade Roux
fetching rows is mainly done by the id columns, there's another field caled hash, which contains an md5 file checksum for every image, like a identifier which is also indexed, but no included column.maybe i should look at what Remus says, find out if i have scans instead of index seeks, and as you are telling me, also check the queries so i can see if my current indexes are setup correctlythanks
jgemedina
@jgemedina Is the hash used for anything? Is that the entire table (id, hash, image)? I really don't see how your performance shouldn't scale - not considering hardware and configuration or file groups etc (which is better left to serverfault.com)
Cade Roux
it contains id, hash, image_data, status, loaded_dateand there's another table called image_attributes which contains name and other properties like dimension_x, dpi, etc...those tables are inner-joined some time, using de id column from images table and a foreign key in Image_Attributes called image_id
jgemedina
hashed is used to check if any of new images loaded already exists in the table, just that
jgemedina
If hashed is used to check if new images already exists, it means it searched on before every insert, right? If the search is not choosing the index on hash, for whatever reason, you'll end up with end-to-end scans that will trash your memory cache.
Remus Rusanu
thanks i'll check that!
jgemedina