views:

293

answers:

5

I have some 20 excel files containing data. all the tables have same columns like id name age location etc..... each file has distinct data but i don't know if data in one file is again repeated in another file. so i want to join all the files and the result st should contain distinct values. please help me out with this problem as soon as possible. i want the result set to be stored in an access database.

A: 

In SQL, you can use JOIN or NATURAL JOIN to join tables. I would look into NATURAL JOIN since you said all tables have the same values.

After that you can use DISTINCT to get distinct values.

I'm not sure if this is what you're looking for though: your question asks about excel but you've tagged it with SQL.

Goose Bumper
+1  A: 

I would recomend either linking the sheets in acces, or importing the sheets as tabels.

Then from there try to determine using a DISTINCT select from the tables/sheets the keys required, and only selecting the records as required.

astander
thanks for your reply but what i am looking is i want all the data from two files without repetition. i mean id name loc 1 a hi 2 b hu 3 c byid name loc1 d xy2 b hu3 h bji want id name loc1 a hi2 b hu3 c by4 d xy5 h bj
abhi
A: 

If you can use all the tables in one query, you can use a union to get the distinct rows:

select id, name, age, location from Table1
union
select id, name, age, location from Table2
union
select id, name, age, location from Table3
union
...

You can insert the records directly from the result:

insert into ResultTable
select id, name, age, location from Table1
union
....

If you only can select from one table at a time, you can skip the insert of rows that are already in the table:

insert into ResultTable
select t.id, t.name, t.age, t.location from Table1 as t
left join ResultTable as r on r.id = t.id
where r.id is null

(Assuming that id is a unique field identifying the record.)

Guffa
Hi guys thanks for you support but this is what i am having problem with i have 20 excel files. each file containing distinct data so if we do any operations on the table there is no problem. but if i try to do union with other table then it show like this id name location etc etc....1 xyz bhu 1 abc hud2 bhk
abhi
thanks for your reply but what i am looking is i want all the data from two files without repetition. i mean id name loc1 a hi 2 b hu 3 c by id name loc 1 d xy 2 b hu 3 h bj i want id name loc 1 a hi 2 b hu 3 c by 4 d xy 5 h bj
abhi
If I understand you right (after reading your comments several times), you should just omit the id from the Excel files altogether so that the records are distinct on rest of the data, and have a counter field in the Access database that creates new id:s for the records as you add them.
Guffa
A: 

It seems the unique set of data you want is this:

SELECT T1.name, T1.loc
  FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db1.xls;
       ].[Sheet1$] AS T1
UNION 
SELECT T1.name, T1.loc
  FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db2.xls;
       ].[Sheet1$] AS T1

...but that you then want to arbitrarily apply a sequence of integers as id (rather than using the id values from the Excel tables).

Because Access Database Engine does not support common table expressions and Excel does not support VIEWs, you will have to repeat that UNION query as derived tables (hopefully the optimizer will recognize the repeat?) e.g. using a correlated subquery to get the row number:

SELECT (
        SELECT COUNT(*) + 1
          FROM (
                SELECT T1.name, T1.loc
                  FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db1.xls;
                       ].[Sheet1$] AS T1
                UNION 
                SELECT T1.name, T1.loc
                  FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db2.xls;
                       ].[Sheet1$] AS T1
               ) AS DT1
         WHERE DT1.name < DT2.name
       ) AS id, 
       DT2.name, DT2.loc
  FROM (
        SELECT T2.name, T2.loc
          FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db1.xls;
               ].[Sheet1$] AS T2
        UNION 
        SELECT T2.name, T2.loc
          FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db2.xls;
               ].[Sheet1$] AS T2
       ) AS DT2;

Note:

i want the result set to be stored in an access database

Then maybe you should migrate the Excel data into a staging table in your Access database and do the data scrubbing from there. At least you could put that derived table into a VIEW :)

onedaywhen
A: 

IF all files in same format you can use Advanced ETL processor to de-duplicate the data.

Watch those tutorial to get started

http://www.dbsoftlab.com/etl-tools/advanced-etl-processor/online-tutorial.html

Any solution which uses Excel ODBC driver or Jet or OLE Db does not work, eventually you realise that it is not possible to load the data without modifying Excel file.

IMEX=1 will not help

ETL Man