ansaurus

Question

Answer 1

+3 A:

A better solution is to call ContainsKey to check if the key exist before adding it to the hash table instead. Throwing exception on this kind of error is a performance hit and doesn't improve the program flow.

Dror Helper 2008-09-25 16:14:07

Answer 2

+10 A:

if (myHashtable.ContainsKey(key))
    duplicates.Add(key);
else
    myHashtable.Add(key, value);

jop 2008-09-25 16:20:10

Answer 3

+3 A:

ContainsKey has a constant O(1) overhead for every item, while catching an Exception incurs a performance hit on JUST the duplicate items.

In most situations, I'd say check for the key, but in this case, its better to catch the exception.

FlySwat 2008-09-25 16:26:37

I may be wrong, but I'm pretty sure checking for the presence of an item in a list is O(N), but for a hash, its O(1).

Matt 2008-09-25 16:30:04

Your right, I was thinking of a list for some reason.

FlySwat 2008-09-25 16:32:41

Answer 4

+1 A:

Here is a solution which avoids multiple hits in the secondary list with a small overhead to all insertions:

Dictionary<T, List<K>> dict = new Dictionary<T, List<K>>();

//Insert item
if (!dict.ContainsKey(key))
   dict[key] = new List<string>();
dict[key].Add(value);

You can wrap the dictionary in a type that hides this or put it in a method or even extension method on dictionary.

Morten Christiansen 2008-09-25 16:42:29

And yes, I am aware that multiple hits in the secondary list are _very_ unlikely, but it doesn't hurt to be sure :)

Morten Christiansen 2008-09-25 16:43:49

Answer 5

A:

Thank you all. I ended up using the ContainsKey() method. It takes maybe 30 secs longer, which is fine for my purposes. I'm loading about 1.7 million lines and the program takes about 7 mins total to load up two files, compare them, and write out a few files. It only takes about 2 secs to do the compare and write out the files.

MaxGeek 2008-09-25 17:05:30

Try using StringBuilder.Append instead of string+ operator and see if it makes it any faster.

jop 2008-09-25 17:20:08

Answer 6

+1 A:

If you have more than 4 (for example) CSV values, it might be worth setting the value variable to use a StringBuilder as well since the string concatenation is a slow function.

woany 2008-09-25 17:42:13

Answer 7

+1 A:

Hmm, 1.7 Million lines? I hesitate to offer this for that kind of load.

Here's one way to do this using LINQ.

CSVReader csvReader = new CSVReader();
List<string> source = new List<string>();
using(StreamReader sr = new StreamReader(myFileName))
{
  while (!sr.EndOfStream)
  {
    source.Add(sr.ReadLine());
  }
}
List<string> ServMissing =
  source
  .Where(s => s.StartsWith(" ")
  .ToList();
//--------------------------------------------------
List<IGrouping<string, string>> groupedSource = 
(
  from s in source
  where !s.StartsWith(" ")
  let parsed = csvReader.CSVParser(s)
  where parsed.Any()
  let first = parsed.First()
  let rest = String.Join( "," , parsed.Skip(1).ToArray())
  select new {first, rest}
)
.GroupBy(x => x.first, x => x.rest)   //GroupBy(keySelector, elementSelector)
.ToList()
//--------------------------------------------------
List<string> myExtras = new List<string>();
foreach(IGrouping<string, string> g in groupedSource)
{
  myHashTable.Add(g.Key, g.First());
  if (g.Skip(1).Any())
  {
    myExtras.Add(g.Key);
  } 
}

David B 2008-09-25 17:51:27

ansaurus

tags:

views:

answers:

C# Exception Handling continue on error

related questions