views:

387

answers:

4

Hi

I have a stringlist with 10 000 entires. I have a shuffle routine, but accessing any of the items is taking a lot of time. Going thtought all 10k items takes a huge amount of time.

I want to save it do disk and then do a shuffle on the file using another method.

Any suggestions?

Any source code welcome, Delphi preferable.

Thanks

+1  A: 

Rearranging a stringlist in memory is slow, so I'd shuffle an index list as an initial optimization.

I'm guessing you chose stringlist for the convenience of loading from and saving to disk. One quicker approach would be to shuffle an index. Make an array of 10,000 integers, shuffle those, then use a temporary string variable to hold the swap element and rearrange your stringlist from top to bottom using the shuffled index values.

Major rewrites will provide greater improvements, but this may help if your strings aren't too big.

Argalatyr
I do use a TStringList, but the problem is accessing anything in it takes a long time, so just scanning from top to bottom takes minutes.It's just a list of numbers, so I can probably load it into a array and shuffle it.I love the TStringList features though, is there perhaps a an indexed TStringList or something?
Hein du Plessis
+5  A: 

How is your shuffle-routine implemented? Especially the exchange-routine? If you have written your own, along these lines:

vTempSrting := vStringList[I]; 
vStringList.Delete(I); 
vStringList.Insert(J,vTempString);

it will be very slow. Use the exchange-method on the stringlist.

This code took 78 ms on my pretty average (3 year old) computer:

program Project1;

{$APPTYPE CONSOLE}

uses
  SysUtils,Classes,uIntegerList,Windows,Math;

procedure Shuffle(aSL : TStringList);
var I,J : integer;
begin
  for I := 0 to aSL.Count-1 do
  begin
    J := randomrange(I,aSL.Count);
    aSL.Exchange(I,J);
  end;
end;

procedure CreateTestFile;
var
  vSL : TStringList;
  I : integer;
begin
  vSL := TStringList.Create;
  try
    for I := 1 to 100000 do vSL.Add('Sample text #'+inttostr(I));
    vSL.SaveToFile('c:\test.txt');
  finally
    vSL.Free;
  end;
end;

function TestShuffle : longword;
var
  vSL : TStringList;
  vTick0 : longword;
begin
  vSL := TStringList.Create;
  try
    vTick0 := gettickcount;
    vSL.LoadFromFile('c:\test.txt');
    Shuffle(vSL);
    vSL.SaveToFile('c:\test.txt');
    Result := gettickcount - vTick0;
  finally
    vSL.Free;
  end;
end;

begin
  CreateTestFile;
  writeln(TestShuffle,' ms');
  readln;
end.
Svein Bringsli
Wow thanks for the code svein!I do use exchange, but I've just relized my problem - the Strings is passed to my shuffeling procedure from a Memo - and obviously with each update, the Memo has to update 10 000 visual items!!I now use an intermediary AstrinList, sort that, and then assign it back to the memo: aStringLIst:= TStringList.Create; aStringList.Assign(mNumbers.Lines); Shuffle(aStringList); mNumbers.Lines.Assign(aStringList); aStringList.Free;It's instant!Many thanks
Hein du Plessis
Wow the comments really messed up my code, sorry hope you can make it out.
Hein du Plessis
No problem, I can read it :-)
Svein Bringsli
To avoid the visual updates use BeginUpdate and EndUpdate: mNumbers.Lines.BeginUpdate; Shuffle(mNumbers.Lines); mNumbers.Lines.EndUpdate;
jasonpenny
+1  A: 

An easy way is to generate a list of random numbers, sort it, and then do pairwise swaps of data later. Sorting can be done as an o(n*log(n)) algorithm, whereas swapping always is an o(n) algorithm, thus much faster.

Just in case you haven't thought of it, consider to leave the data as it is, and just save an extra shuffled index.

Lars D
+1  A: 

I asked a question before about creating a shuffled range - rather than generating a list of numbers and then shuffling them, I wanted a function that was able to iteratively return a list of shuffled numbers, without the O(n) memory cost:

http://stackoverflow.com/questions/464476/generating-shuffled-range-using-a-prng-rather-than-shuffling

If you create some kind of index for your file on disk, then you can create a shuffled version without paying the memory cost, which can be important for very large files. For an index, I suggest something simple, like a flat stream of the positions (as 32 or 64-bit integers) of every line start. That way, to extract the Nth line out of the text file, you can simply seek in the index stream to N*4 (or N*8 for 64-bit indexes) to discover the offset of the line start, and then seek to that position in the text file stream and read out a line.

Using this approach, you can shuffle extremely large files, without paying the in-memory cost. Of course, shuffling will mean extracting lines at random from the source file, which will not be as efficient as in-memory sorting unless the file is very small (fits in cache almost on first access) or very large (in which case memory thrashing will be worse than random seeks), or perhaps if you're not using a mechanical hard drive (e.g. SSD).

For your situation, 10K is really not a large number. Something in the region of 10 million lines, perhaps getting into several gigabytes of text (depending on line length of course), will be far more challenging, and that's where this approach (or something similar) would be necessary in 32-bit.

Barry Kelly
Thanks Barry, though I found my error to be the visual update of the control with the shuffle function. Interesting approach though!
Hein du Plessis