views:

871

answers:

5

I need to implement a large collection of Widget objects, each of which contain a unique file path string ("FilePath"). I need to be able to do the following:

  1. Retrieve a Widget object quickly given the file path
  2. Change the file path of a Widget without creating a new object (multiple other objects may contain references to a single Widget, and tracking them down would impact performance)
  3. Given a Widget reference, determine it's file path

I first thought of using a generic SortedList using the file path as a key, but duplicating the path for many thousands of objects could quickly eat up memory. I considered removing the path from the object and only storing it in the list of keys, but that would make requirement 3 above hard to accomplish.

What I'm leaning towards now is rolling my own class derived from List<> that adds the Widget objects in a sorted order, and retrieves them with a binary search. Requirement 2 can be accomplished simply by removing an object from the list, changing it's file path, and adding it back to the list.

But I'm relatively new to C# and I wanted to check with the great minds here and see if I'm missing another obvious solution.

Thanks!

+4  A: 

"Many thousands of objects"? Are you sure this structure belongs in memory at all? Sounds like a job for some type of persistent storage to me.

rp
+9  A: 

Can't you use 2 dictionarys?

Dictionary<string, Widget> WidgetsByPath;
Dictionary<Widget, string> PathsByWidget;

The handling will have a little more overhead (as you need to update both dictionaries when inserting, modifying or deleting elements) but you'll probably will just insert once lookup many times so it should be enought.

You can even construct a simple class around it:

public class Widgets
{
  public Widget Add(string Path, Widget wdg)
  {
    // Chek it doesn't already exits and all...
    WidgetsByPath.Add(Path, wdg);
    PathsByWidget.Add(wdg, Path);
  }

  public void Delete(string Path)
  {
    Widget w = WidgetsByPath[Path];
    PathsByWidget.Delete(w);
    WidgetsByPath.Delete(Path);
  }
}
Jorge Córdoba
+1 I was JUST going to post this.
SnOrfus
But if each Widget contained a Path member, then I wouldn't need the second dictionary. My main concern is the duplication of paths. If we make the assumption that each path would be on average 30 bytes, and we're dealing with (at the extreme) 30k objects, that's about a megabyte of duplication.
And if the user happens to be working with heavily nested files, that could quickly grow!
This is the classic RAM vs. CPU problem, and Bill you should ask your self, does a few extra MB of RAM usage really matter, compared to a lower retrieval time performance?
Arkain
Way too complicated. The Widget should just have a FilePath property. Other widgets should have references to the widget not it's path. This sort of thing works well when you want to be able look things up by multiple keys but I wouldn't do a reverse map to just get a value that can be a property
tvanfosson
@Bill I'm not a C# expert, but I think strings are actually references in C# (with value type semantics, due to its immutability and operator overloading). So, I doubt you'd have that much duplication. Even if you did, is an extra MB that much memory "waste" for a C# app?
Juan Pablo Califano
Don't know what kind of program you're writing, but think of this to get some perspective: If you load a 640 x 480 ARGB image, the raw image data in memory (after decompression) will be 1.2 Mb (640*480*4/1024).
Juan Pablo Califano
+2  A: 

If you do end up going with a custom data structure, I would suggest using containment rather than derivation. It's much better to define the necessary interface as part of a new class, and keep the storage details internal. If you were instead to derive from List, it would be much harder to enforce proper usage of the class, and if you changed your mind later, it would be harder to change things.

Charlie
Excellent point. Thank you!
+2  A: 

I think you only need a single Dictionary<Widget> and an appropriate Widget class that contains references to the other Widgets. It might help to make it a custom Dictionary so that you can simply add a Widget and have it derive the key from the Widget's FilePath property.

 public class WidgetDictionary : Dictionary<string,Widget>
 {
     ... provide suitable constructors ...

     public void Add( Widget widget )
     {
         if (widget != null && !this.ContainsKey( widget.FilePath ))
         {
             this.Add( widget.FilePath, widget );
         }
     }
 }

 public class Widget
 {
      public string FilePath { get; set; }

      private List<Widget> widgets = new List<Widget>();
      public IEnumerable<Widget> Widgets
      {
          get { return widgets; }
      }

      ...code to add/remove widgets from list...
 }

Then to do (1) you simply look the widget up in the widget repository by file path.

 var repository = new WidgetDictionary();
 string filePath = ...
 var widget = repository[filePath];

To do (2) you can remove and re-add the widget to the repository after changing it's file path. References to the widget held by other widgets will still be valid.

var widget = repository[filePath];
repository.Remove(filePath);
widget.FilePath = newFilePath;
repository.Add(widget);

 EDIT: this could probably be implemented as a method on the
 dictionary as well.

   public void UpdatePath( Widget widget, string newPath )
   {
       if (string.IsNullOrEmpty(newPath))
          throw new ArgumentNullException( "newPath" );

       var widget = this.ContainsKey(widget.FilePath)
                             ? this[widget.FilePath]
                             : null;

       if (widget != null)
       {           
           this.Remove(widget.FilePath);
       }
       widget.FilePath = newPath;
       this.Add( widget );
    }

To do (3) simply reference the property.

var filePath = widget.FilePath;

If you want to automatically have other widgets remove their references to a widget when it is deleted (disposed), you'll probably want to have the Widget class implement IDisposable and have the ability to add event handlers to a dispose event so that interested widgets can register a method that will remove the widget being disposed from their collection of related widgets. See this MSDN section on how to set up and use event handlers.

tvanfosson
+3  A: 

"Duplicating" the strings will not use twice the memory: Since strings are immutable objects in c#, you will just store another reference (i.e. pointer, 4 or 8 byts) per entry in the dictionary:

Dictionary<string, Widget> dict = new Dictionary<string, Widget>();
Widget myWidget = GetSomeWidget();
dict.Add(myWidget.Name, myWidget);

You will always be reusing the string object from the widget's property, so just go on with the dict and store the path as property inside the widget.

If you don't need to enumerate the widgets in sorted order, don't use the SortedList, it will be slower than the Dictionary (O(n log n) insertion/deletion/retrieval vs. O(n) average time)

Changing the widget's path will need you to remove it from the dictionary and add it with the changed path, but this is an average constant time operation, so it should be pretty fast.

And just to mention it: Even if you would have to spend one MB of additional memory for getting more performance or using a better-suited (and well-tested) data structure, I don't think that would be a great problem considering the amount of memory other applicatins are using (wasting?) these days ...

MartinStettner
This answer and others above answered parts of the question, but this is the answer that addressed the main concern and was the most succinct. Thanks!