views:

105

answers:

3

Ok heres the problem, I have 5 different tables created by god knows who each with different data, an dI need to merge them in a single table.

My first problem is that each day have a time stamp but I only need the last records add each day, so how can I search each day for the last records and save them on a new table, most of this tables are big around 400 million records.

The following query return me the last register, but I need to search for each day, last register (as someone mess with the job and it did it each 15 minutes for each day making a lot of duplicate data).

The range of the days is from 1 of january 2006 to current day (depending on the table) as some go up to 2008.

So far I have this.

SELECT * FROM [PDP].[dbo].[Existencia_WH_PRISM_BACKUP]
        WHERE dExD_Fecha = (SELECT MAX(dExD_Fecha)
        FROM dbo.Existencia_WH_PRISM_BACKUP)

This bring me a result like

NombreEntidad   dExD_Fecha  LDWHSE  LDLOCN  LDRESC  LDLOTN  LDGRDE  LDLOCQ  LDUMSR  LDRSCL  LDRSSC  LDQRUM  LDRCUM  LDPOTF  LDREVL  LDCLCD  LDOOIN  LDAVPL  LDOHIN  LDDVIN  LDDWIP  LDQYRM  LDQYRS  LDDLRC  LDTLRC  LDTRAN  LDFIFO  LDETD   LDETT   LDEXPD  LDVNNO  LDPOSQ  LDDTMT  LDACTP  LDZONE  LDAVPT  LDRTND  LDCDDT  LDLSQY  LDAGE1  LDAGE2  LDAGE3  LDAGE4  LDAGE5  LDAMIR  LDRCLS  LDPTAW  LDCSPK  LDQYOU  LDBLNK
HDZALM  2010-05-28 12:01:00.000 1T  A04 405120  K0146       864.000000  CJ  PT  40  864.000000  CJ  0.000000        STOC    N   2   Y   Y   N   0.000000    0.000000    1100527 75934   6659304 1100527 1100527 61504   1110527     5   1100527 N   A   Y           0.000000    9999999 9999999 9999999 9999999 9999999 N   Y   Y   N   0.000000    
HDZALM  2010-05-28 12:01:00.000 1T  A04 405120  K0147       1944.000000 CJ  PT  40  1944.000000 CJ  0.000000        STOC    N   2   Y   Y   N   0.000000    0.000000    1100527 120112  6665777 1100527 1100527 120051  1110527     5   1100527 N   A   Y           0.000000    9999999 9999999 9999999 9999999 9999999 N   Y   Y   N   0.000000    
HDZALM  2010-05-28 12:01:00.000 1T  A05 405120  K0146       2052.000000 CJ  PT  40  2052.000000 CJ  0.000000        STOC    N   2   Y   Y   N   0.000000    0.000000    1100527 54402   6658261 1100527 1100527 54146   1110527     5   1100527 N   A   Y           0.000000    9999999 9999999 9999999 9999999 9999999 N   Y   Y   N   0.000000    
HDZALM  2010-05-28 12:01:00.000 1T  A05 405120  K0147       2160.000000 CJ  PT  40  2160.000000 CJ  0.000000        STOC    N   2   Y   Y   N   0.000000    0.000000    1100527 153911  6671885 1100527 1100527 153714  1110527     5   1100527 N   A   Y           0.000000    9999999 9999999 9999999 9999999 9999999 N   Y   Y   N   0.000000    
HDZALM  2010-05-28 12:01:00.000 1T  A06 405120  K0146       4212.000000 CJ  PT  40  4212.000000 CJ  0.000000        STOC    N   2   Y   Y   N   0.000000    0.000000    1100527 43743   6657177 1100527 1100527 43625   1110527     5   1100526 N   A   Y           0.000000    9999999 9999999 9999999 9999999 9999999 N   Y   Y   N   0.000000    
+2  A: 

You did not mention what version of SQL Server (nor many details about how you are currently doing it), however if it is SQL Server 2005+ you do:

With NumberedData As
    (
    Select ...
        , Row_Number() Over ( Partition By DateDiff(d, 0, E.dExD_Fetcha) 
                              Order By E.dExD_Fetcha Desc ) As Num
    From PDFP.dbo.[Existencia_WH_PRISM_BACKUP] As E
    )
Select ...
From Data
Where Num = 1
Thomas
I am kinda lost with Thomas answer, at the moment I just want to delete the bad data on the database later I would merge them, for what I gather "WITH DATA" I am creating a unision without the needed of use joins.
Enrique
@Enrique - I'm using a common-table expression. I happened to have named my CTE "Data". I have revised my post to provided a better name for the CTE (in this case "NumberedData"). In short, I'm having the system number every row, such that it restarts numbering on each day and such that the latest time has the lowest number (i.e. 1 for the latest time, 2 for the next latest etc.).
Thomas
A: 

I like the solution Thomas posted and would go with that since a query without having to use a cursor is pretty much always preferable to one that uses one. (SQL is built for SET logic so it's always going to be more efficient doing what it does best.)

You also said that you needed to get potentially multiple rows per day so long as they were all stored at the same time. The CTE expression as written in the Thomas solution gives you the last single one only which may suffice. Otherwise you could easily modify his solution.

Below is a cursor solution since you had curiosity about them. I provide it with the advice that you should not use it but you should know that it is possible.

You would, of course, use a real table rather than a table variable as below if you wanted to keep the results for later analysis.

DECLARE @t1 TABLE ( DtTm datetime )

DECLARE @Dt datetime        
DECLARE c CURSOR FOR 
    SELECT CONVERT(datetime, CONVERT(char(11), dExD_Fecha, 113))
    FROM [PDP].[dbo].[Existencia_WH_PRISM_BACKUP]
    GROUP BY CONVERT(char(11), dExD_Fecha, 113)

OPEN c
FETCH NEXT FROM c INTO @Dt
WHILE @@FETCH_STATUS = 0
BEGIN
    INSERT INTO @t1 
    SELECT dExD_Fecha -- add any other fields you care to capture here and above to the table definition
    FROM [PDP].[dbo].[Existencia_WH_PRISM_BACKUP]
    WHERE dExD_Fecha = (
        SELECT MAX(dExD_Fecha)
        FROM [PDP].[dbo].[Existencia_WH_PRISM_BACKUP]
        WHERE CONVERT(datetime,CONVERT(char(11), dExD_Fecha, 113)) = @Dt)
    FETCH NEXT FROM c INTO @Dt
END

CLOSE c
DEALLOCATE c

SELECT * FROM @t1
Tahbaza
A: 

Try this:

SELECT e.*
FROM [PDP].[dbo].[Existencia_WH_PRISM_BACKUP] e
INNER JOIN (
  SELECT MAX(dExd_Fecha) AS dExd_Fecha_Max
  FROM [PDP].[dbo].[Existencia_WH_PRISM_BACKUP]
  GROUP BY DATEADD(DD, DATEDIFF(DD, 0, dExd_Fecha), 0) 
) m
  ON e.dExD_Fecha = m.dExd_Fecha_Max
ORDER BY e.dEXd_Fecha

The idea here is to create a subquery of the maximum dates grouped by day and then join against the main table.

If you want to identify the values that fall outside that range in order to delete them, you can do a left outer join against the set:

/** DELETE FROM x **/
SELECT * 
FROM (
  SELECT 
    dExd_Fecha,
    YEAR(dExd_Fecha) AS dExd_Ano, 
    MONTH(dExd_Fecha) AS dExd_Semana, 
    NTILE(10) OVER (ORDER BY dExd_Fecha) AS dExd_Groupo  
  FROM [Existencia_WH_PRISM_BACKUP]
) x 
LEFT OUTER JOIN ( 
  SELECT MAX(dExd_Fecha) AS dExd_Fecha_Max 
  FROM [PDP].[dbo].[Existencia_WH_PRISM_BACKUP]  
  GROUP BY DATEADD(DD, DATEDIFF(DD, 0, dExd_Fecha), 0)   
) y 
ON x.dExd_Fecha = y.dExd_Fecha_Max
WHERE 
  x.dExd_Ano = 2010
  AND x.dExd_Semana = 1
  AND y.dExd_Fecha_Max IS NULL
ORDER BY x.dExd_Fecha

This SELECT statement uses YEAR and MONTH values in the subquery in order to do a batch-processing approach and select / delete records in smaller amounts.

The original answer is below:

SELECT e.* 
FROM [PDP].[dbo].[Existencia_WH_PRISM_BACKUP] e 
INNER JOIN ( 
  SELECT 
    MAX(dExd_Fecha) AS dExd_Fecha_Max,

    /** get the year of the last date **/
    YEAR(MAX(DATEADD(YY, DATEDIFF(YY, 0, dExd_Fecha), 0))) AS dExd_Ano_Max,

    /** get the month of the last date **/
    MONTH(MAX(DATEADD(MM, DATEDIFF(MM, 0, dExd_Fecha), 0))) AS dExd_Mes_Max,

    /** get the week of the last date **/
    DATEPART(WEEK, MAX(dExd_Fecha)) AS dExd_Semana_Max,

    /** set a number to divide the total rows into ten groups **/
    NTILE(10) OVER (ORDER BY MAX(dExd_Fecha)) AS dExd_Groupo 

  FROM [PDP].[dbo].[Existencia_WH_PRISM_BACKUP] 
  GROUP BY DATEADD(DD, DATEDIFF(DD, 0, dExd_Fecha), 0)  
) m 
  ON e.dExD_Fecha = m.dExd_Fecha_Max 
WHERE 
  m.dExd_Ano_Max = 2010
  AND m.dExd_Mes_Max = 1
8kb
A quick question here, in case I don´t want to process everything on 1 go, how can I set intervals like 1 week or 1 month? or a number or registers.
Enrique
Yes, this can be done with new columns in the subquery. Edits above.
8kb
One last question, in case I want to delete those not equal to the search result? I get a error in the comparation
Enrique
Interesting. In that case, using the max values in the original subquery won't work. But it can be accomodate. Pls see new edit.
8kb