views:

1131

answers:

3

Hi

I have a ClientDataSet with the following data

IDX   EVENT BRANCH_ID   BRANCH
1     E1  7 B7
2     E2  5 B5
3     E3  7 B7
4     E4  1 B1
5     E5  2 B2
6     E6  7 B7
7     E7  1 B1

I need to transform this data into

IDX   EVENT BRANCH_ID   BRANCH
1     E1  7 B7
2     E2  5 B5
4     E4  1 B1
5     E5  2 B2

The only fields of importance are the BRANCH_ID and BRANCH and BRANCH_ID must be unique

As there is a lot of data I do not what two have copies of it.

QUESTION: Can you suggest a way to transfrom the data using a Cloned version of the original data ?

+1  A: 

Cloning won't allow you to actually change data in a clone and not have same change reflected in the original, so if that's what you want you might rethink the cloning idea.

Cloning does give you a separate cursor into the clone and allows you to filter and index (i.e. order) it independently of the master clientdataset. From the data you've provided it looks like you want to filter some branch data and order by branch_id. You can accomplish that by setting up a new filter and index on the clone. Here's a good article that includes examples of how to do that:

http://edn.embarcadero.com/article/29416

Taking a second look at your question, seems like all you'd need to do would be to set up a unique index on branch_id on the cloned dataset. Linked article above has info on how to set up index; check docs on clientdataset.addindex function for more details and info on setting the index to show only unique values, if I recall it may just mean you set branch_id as the primary key.

Herbert Sitz
+1  A: 

I can't think of a slick way to do this, but you could index on BRANCH_ID, add an fkInternalCalc boolean field to your dataset, then initialize that field to True on the first row of each branch (using group state or manually) and then filter the clone on the value of the field. You'd have to update the field on data changes though.

I have a feeling that a better solution would be to have a master dataset with a row for each branch.

Jozz
A: 

You don't provide many details about your use case so I'll try to give you some hints:

  1. "A lot of data" suggests that you might have it from a SQL backend. Using a 'SELECT DISTINCT...' or 'SELECT ... GROUP BY BRANCH_ID' (or similar syntax depending on what SQL backend do you) will give the desired result with ease and speed. Please confirm and I'll give you more details.

  2. As the others said a simple 'clone' wouldn't work. The most simpler (and perhaps quicker) sollution, assuming that ussually the brances are few in number WRT to data is to have an index outside of your dataset. If really you want to filter your original data then add a status field (eg boolean) on your data and put a flag (eg. 'True') on the first occurence.

PseudoCode: (Let's asume that: your ClientDataSet is cds1 your cds1 have a status field cds1Status (boolean) - this is optional, needed only if you want to sort/filter/search the cds1 you have a lIndex which is a TStringList)

lIndex.Clear;
lIndex.Sorted:=True; 

with cds1 do
try
  DisableControls;
  First;
  while not Eof do //scan the dataset
  begin
    cVal:=cds1Branch_ID.AsString;
    Edit; //we anyway update the Status field
    if lIndex.Find(cVal, nDummy) then //nDummy - we don't use it. 
    begin //already in index
      cds1Status.AsBoolean:=False; //we say here "No, isn't the 1st occurence"
    end
    else
    begin //Not found! - Well, let's add it...
      lIndex.Append(cVal); //update the index 
      cds1Status.AsBoolean:=True; //mark the first occurence
    end;
    Post; //save the changes in the status field
    Next;
  end; //scan
finally
  EnableControls; //housekeeping
end;

//WARNING! - NOT tested. I wrote it from my head but I think that you got the idea...

...Depending on what you try to accomplish (which would be the best thing that you might share with us) and what selectivity do you have on BRANCH_ID perhaps the Status engine isn't needed at all. If you have a very low selectivity on that field (selectivity = no. of unique values / no. of records) perhaps it's much faster to have a new dataset and copy there only the unique values rather than putting each record of the original cds in Edit + Post states. (Changing dataset states are costly operations. Especially if your cds is linked to a remote data storage - ie. a server).

hth,

PS: My sollution is intended to be mostly simple. Also you can test with lIndex.Sorted:=False and use lIndex.IndexOf instead of Find. In some (rare) cases is better. Depends on your data. If you want to complicate the things and the speed is really a concern you can implement a full-blown BTree index to do your searces (libraries available). Also you can use the index engine of CDS and index the BRANCH_ID and do many 'Locate' on a clone but because your selectivity is clearly < 1 scaning the cds's entire index theorethically should be slower that a scan on a unique index, especially if your custom-made index is tailored to your data type, structure, distribuition etc.

just my2c