tags:

views:

1031

answers:

2

I have a list<> of an "region" class with two variables, "startLocation" and "endLocation". I'd like to combine those two into a new sorted 2 dimensional array where its just Location and an integer representing whether its start or an end.

For example, if the list has three region objects with

[Region 1] : startLocation = 5, endLocation = 7

[Region 2] : startLocation = 3, endLocation = 5

[Region 3] : startLocation = 8, endLocation = 9

I'd like to get a sorted two dimensional array (or list or similar) looking like:

[3] [1]

[5] [1]

[5] [-1]

[7] [-1]

[8] [1]

[9] [-1]

(preferably i'd like the overlaps to add their second values together, so the two separate 5's in the array would be combined into [5 0]...but that's not too important)

I'm currently using a regular forloop going through each one by one and adding them to a list one at a time. This implementation is quite slow because I'm working with large datasets, and I'm guessing there's a more elegant / faster way to accomplish this through LINQ.

Any suggestions would be much appreciated.

A: 

This isn't a solution that's suitable for LINQ in anything other than an intellectual exercise. A foreach loop will be just as fast (actually likely faster) than any cobbled-together LINQ implementation.

As a side note, I'm assuming that you're using foreach rather than for. If not, then you could significantly speed up your process by switching to the foreach loop.

foreach(Region r in regionList)
{
    // add your entries using r
}

will be much faster than..

for(int i = 0; i < regionList.Count; i++)
{
    // add your entires using the indexer
}
Adam Robinson
Can someone point me to the source on this one foreach/for performance difference? I've seen it quoted as gospel several places with no source to a test scenario I can run. Every test I've run has these two ways neck and neck, one slightly faster with a list, one slightly faster with an array, but everyone else seems to be of the opinion that one is drastically faster than the other (though which it is seems to vary).
Jonathan
foreach is faster for dynamically sized collections like List<T> because the indexer on a list isn't pointing to a relative memory slot like that of an array. The indexer on an array is faster because it doesn't have the (slight) overhead of creating an Enumerator and doing bounds checking with every iteration. In his scenario he's using a List<Region>, so yes, a foreach would be faster.
Adam Robinson
+5  A: 

You'll need to define a helper method which splits a region into 2 parts and it's much easier to represent this using a new struct vs. a 2D array

struct Data {
  public int Value;
  public bool IsStart;
}

public static IEnumerable<Data> Split(this Region region) {
  yield return new Data() { Value = region.StartLocation, IsStart=true};
  yield return new Data() { Value = region.EndLocation, IsStart=false};
}

Then you can use the following LINQ query to break them up and sort them.

List<Region> list = GetTheList();
var query = list
  .SelectMany(x =>  x.Split())
  .OrderBy(x => x.Data);
JaredPar
this sounds like a good idea. let me give it a try and see if its faster than what i have now. thanks!
DarkAmgine
Why isn't this .SelectMany(x => x.Split())?
mquander
This approach yields (no pun intended!) no benefit over a foreach, and in fact introduces more overhead with the multiple enumerators being used here. It's definitely clever, but I don't see it as being any faster or more readable.
Adam Robinson
@Adam, agreed performance **may** be an issue. In terms of absolutes it's certainly slower than a single foreach block. But only a profiler would be able to tell if the difference was significant in this particular application. I tend to go for the correct, readable solution first and let the profiler tell me where to optimize. This borders on clever vs. readable but the OP asked for LINQ so I had a bit of fun ;)
JaredPar
@mquander, no reason it can't be. Will update
JaredPar
@Jared: It technically isn't even LINQ =P Just extension methods and lambdas! ;)
Adam Robinson
@Adam very true. I'm still hard wired to call anything that uses the method chaining off of Linq.Enumerable a LINQ expression. I wonder how that will evolve over time.
JaredPar