views:

272

answers:

6

Which is faster for millions of records: Permanent Table or Temp Tables?

I have to use it only for 15 million records. after Processing complete. we delete these records..

+1  A: 

Permanent table is faster if the table structure is to be 100% the same since there's no overhead for allocating space and building the table.

Temp table is faster in certain cases (e.g. when you don't need indexes that are present on permanent table which would slow down inserts/updates)

DVK
+1  A: 

Temp tables are in memory (unless they're too big), so in theory they should be REALLY fast. But it's usually not. As a rule of thumb, try to stay away from temp tables, unless that is the only solution. Can you give us some more information about what you're trying to do? It could probably be done with a derived query

Infinity
Temp Variables are stored in Memory not Temp tables.
ManishKumar1980
I didn't see the question is for MSSQL. In MySQL you can declare a temporary memory table: `CREATE TEMPORARY TABLE test ENGINE=MEMORY`
Infinity
A: 

Permanent Table is faster in most cases than temp table.

Have a look on : http://www.sql-server-performance.com/articles/per/derived_temp_tables_p1.aspx

anishmarokey
i can't use Derived tables...
ManishKumar1980
+5  A: 

In your situtation we use a permanent table called a staging table. This is a common method with large imports. In fact we generally use two staging tables one with the raw data and one with the cleaned up data which makes researching issues with the feed easier (they are almost always a result of new new and varied ways our clients find to send us junk data, but we have to be able to prove that.) Plus you avoid issues like having to grow temp db or causing issues for other users who want to use temp db but have to wait while it grows for you, etc.

You can also use SSIS and skip the staging table(s), but I find the ability to go back and research without having to reload a 50,000,000 table is very helpful.

HLGEM
SSIS is probably the best solution
Remus Rusanu
+1 for pointing out the added benefit of seeing the staged data in the event of an error -- "You can also use SSIS and skip the staging table(s), but I find the ability to go back and research without having to reload a 50,000,000 table is very helpful."
Mayo
A: 

I personally would use a permanent table and truncate it before each use. In my experience it is easier to understand/maintain. However, my best advice to you is to try both and see which one performs better.

Mayo
This will work only if the process is a singleton and there is no chance of any other process starting up in the meantime and also requiring use of that table.We have processes that import lots of data and we wouldn't be able to truncate a single table because multiple processes could be running at the same time.
Aaron Bertrand
You could address that by using a perm table with a unique column to identify the import process working with a particular set of data. We have these for user-driven file-based imports (as opposed to a nightly batch where truncate works fine). Might consider a cleanup process to keep the table's size in check.
Mayo
+1  A: 

If you don't use tempdb, make sure the recovery model of the database you are working in is not set to "Full". This will cause a lot of overhead on those 50M row inserts.

Ideally, you should use a staging database, simple recovery model, on RAID 10 if possible, and size it ahead of time to provide enough space for all your operations. Turn auto-grow off.

Use INSERT ... WITH (TABLOCK) to avoid row-level logging:

INSERT INTO StagingTable WITH (TABLOCK) (.....)
SELECT .....

Likewise for BULK INSERT. If you drop and recreate, create your clustered index prior to insert. If you can't, insert into one table first, then insert from that into another table with the right clustering, and truncate the first table. Avoid small batch sizes on BULK INSERT if possible. Read the BULK INSERT documentation closely, as you can sabotage performance with the wrong options.

Avoid INSERT ... EXEC. Every row is logged.

Avoid UPDATEs, unless you need to calculate running totals. Generally, it is cheaper to insert from one table into another, and then truncate the first table, than to update in place. Running total calculations are the exception, since they can be done with an UPDATE and variables to accumulate values between rows.

Avoid table variables for anything except control structures, since they prevent parallelization. Do not join your 50M row table to a table variable, use a temp table instead.

Don't be afraid of cursors for iteration. Use cursor variables, and declare them with the STATIC keyword against low-cardinality columns at the front of the clustered index. Use this to slice big tables into more manageable chunks.

Don't try to do too much in any one statement.

Peter
Very Nice and Satisfoctory Answer. Thanx for all
ManishKumar1980