views:

143

answers:

3

Hello everybody

In a scientific application I'm writing (vb.net) I use a huge collection of object stored in a tree structure. Every object exposes a LastAccessed property (datetime) that stores the last time the node was accessed by the algorithm.

I need to find a fast way to find the N least accessed objects in the structure.

I'm using an algorithm like this (pseudo-code)

dim Olist as new list (of myobject)
....
array.sort(Olist,address of compareByDataReverse)
for index=0 to N-1
   dosomething(Olist.item(index))
end

I'm sure there is a better way to do that because the list is normally huge ( consisting of more than 10^6 objects).

Thanks
Pierluigi

+1  A: 

in best case your code will perform with O(nlogn). You should begin with dim Olist as new list (of myobject) then take a LinkedList and go over your collection and add values to your list keeping it sorted. This will be O(n*m)

Andrey
I'll try to use the LinkedList ...thank you very much :)
pierusch
That's called insertion sort and it's O(n^2) - which is worse than O(nlogn) - so it's likely to be slower on datasets that large.
Joe Gauterin
please read carefully. There are two values n - size of ENTIRE collection and m - size of list he want to have. and m is much smaller then n. then reread my answer and your comment.
Andrey
Andrey has explained better than me the situation N, size of the collection is HUGE while M, size of the elements is a short list...in my application I might expect N=10^4m
pierusch
Why not just use a sorted list/array and keep it sorted. Inserts will then take O(log(n)). Retrieval of the first n O(1).It all depends on the number of inserts compared to the number of selects. If there are a lot more selects, keep it sorted.
Carra
@pierusch - Are you sure you mean 10^4m?
Joe Gauterin
@Carra there no sorted arrays with insertion of O(log(n)). they are nlogn, because you first find the value (logn) then insert (n)
Andrey
@gauterin......excuse me..... N~10^3M just for an example in my last running test (N=100000 m=150) but it's not a constant...
pierusch
Fair point, you can do it in O(log(n)) for a sorted list.
Carra
As suggested by Andrey I used a linkedList while keeping it stored... using N=100000 and m=100:1) using my "old" approach 267ms2) using a linked list approach 23mswell.... thaks to everybody......
pierusch
+2  A: 

You mention that you use a tree structure yet you show us a list. If you use a list you can indeed sort it (O(l log(l)) and then take the last n elements in O(1). Or just iterate it and get your last n elements in (O(l)).

You however work with a tree view. You didn't mention how your tree is implemented. You can create a binary search tree with as key your LastAccessed date. Searching for your first N elements can then also be done very fast.

Alternatively you could keep an index. Just create a tree view with as key the LastAccessed date. Inserting or removing an item in your original list should also update your index. You can then quickly retrieve a link to your items based on their date. At the cost of slower inserts/removals.

Carra
you are righ :) what i wrote is a bit confusing becouse I'm not that goog in writing in english! the list a mentioned is just a list of reference to the leaves of the tree.
pierusch
+1  A: 

If you use Selection sort then after N iterations first N elements will be in sorted order.

NOTE:But I am not sure if this solves your problem completely.

TheMachineCharmer
not completely...I mean i'm sure the array.sort(...) function is really well implemented...I just think there is a better way to find the N least accessed elements inside the list WITHOUT sorting the entire list of objects... anyway thanks for your reply :)
pierusch
:) so you can stop Selection Sort after N steps!
TheMachineCharmer
+1 avoids sorting all the data (but may be slower if the number of items to find is very large).
Joe Gauterin
I don't know why but what I think is this should be fine as long as `N^2 < (TotalNumberOfItems)/2`. What do you say?
TheMachineCharmer