views:

161

answers:

5

I have a list of dates that a machine has worked on, but it doesn't include a date that machine was down. I need to create a list of days worked and not worked. I am not sure of the best way to do this. I have started by incrementing through all the days of a range and checking to see if the date is in the list by iterating through the entire list each time. I am looking for a more efficient means of finding the dates.

class machineday
{
 datetime WorkingDay;
}

class machinedaycollection : List<machineday>
{
public List<TimeCatEvent> GetAllByCat(string cat)
{
  _CategoryCode = cat;


  List<machineday> li = this.FindAll(delegate(machinedaydummy) { return true; });
  li.Sort(sortDate);
  return li;
}

int sortDate(machinedayevent1, machinedayevent2)
{
  int returnValue = -1;
  if (event2.date < event1.date)
  {
    returnValue = 0;
  }
  else if (event2.date == event1.date)
  {
    //descending
    returnValue = event1.date.CompareTo(event2.date);
  }
  return returnValue;
}
}
+3  A: 

Build a list of valid dates and subtract your machine day collection from it using LINQ's Enumerable.Except extension method. Something like this:

IEnumerable<DateTime> dates = get_candidate_dates();
var holidays = dates.Except(machinedays.Select(m => m.WorkingDay));

The get_candidate_dates() method could even be an iterator that generates all dates within a range on the fly, rather than a pre-stored list of all dates.

Enumerable's methods are reasonably smart and will usually do a decent job on the performance side of things, but if you want the fastest possible algorithm, it will depend on how you plan to consume the result.

Marcelo Cantos
Is LINQ an option for this question?
slugster
Maybe not (I just noticed the .Net 2 tag), but `Enumerable` is accessible from .Net 2.0, albeit with clunkier syntax. You just need to add a reference to the System.Core assembly.
Marcelo Cantos
+5  A: 

Sort the dates and iterate the resulting list in parallel to incrementing a counter. Whenever the counter does not match the current list element, you've found a date missing in the list.

List<DateTime> days = ...;
days.Sort();
DateTime dt = days[0].Date;
for (int i = 0; i < days.Length; dt = dt.AddDays(1))
{
    if (dt == days[i].Date)
    {
        Console.WriteLine("Worked: {0}", dt);
        i++;
    }
    else
    {
        Console.WriteLine("Not Worked: {0}", dt);
    }
}

(This assumes there are no duplicate days in the list.)

dtb
This is not a good solution because it it takes 15 lines what 1 or two could achive. Look at the set-based solution from Marcelo Cantos. Although the except method might not be available in you .net version you can write it yourself in a reusable fashion.
usr
@usr: Did you look at the tags? This is the *only* solution that actually works in .NET 2.0.
Aaronaught
@0XA3 I doubt if your fix would work
Veer
A: 

Assuming the list is sorted and the machine was "working" most of the time, you may be able to avoid iterating through all the dates by grouping the dates by month and skipping the dates in between. Something like this (you'll need to clean up):

int chunksize = 60; // adjust depending on data
machineday currentDay = myMachinedaycollection[0];

for (int i = 0; i < myMachinedaycollection.Count; i += chunksize)  
{  
    if (currentDay.WorkingDay.AddDays(chunksize) != myMachinedaycollection[i + chunksize].WorkingDay)  
    {
        // write code to iterate through current chunk and get all the non-working days  
    }
    currentDay = myMachinedaycollection[i + chunksize];  
}  
Zach Smith
+3  A: 

Sorry dudes, but I do not pretty much like your solutions. I think you should create a HashTable with your dates. You can do this by interating only once the working days.

Then, you interate the full range of of days and for every one you query in the hashtable if the date is there or not, by using

myHashTable.ContainsKey(day); // this is efficient

Simple, elegant and fast.

I think your solution uses an exponencial time, this one is lineal or logarithmical (which is actually a good thing).

Daniel Dolz
constant time and memory, but the memory constant is HUGE, which is a very bad thing.
Ben Voigt
@Ben: Not necessarily, depends on if performance is the more critical issue. Better performance often comes at the cost of some other resource, such as memory.
Kevin Brock
interesting, I never thought in that direction
fishhead
Huge? being conservative, using about 1mb of memory, you can store 45 years of dates.memory is not usually an issue anymore when working with simple data types. This used to be different 15 years ago, remember those days???? machines with 640k!!
Daniel Dolz
The question is how many searches you have to do in order to balance the time taken setting up the hash data structure in the first place. A sorted dense structure listing only the dates when the state changed between "in service" and "in maintenance" will be very fast with a binary search.
Ben Voigt
I think that a hash search outperforms binary search for big collections.
Daniel Dolz
A: 

I doubt you want a list of days working and not working.

Your question title suggests that you want to know whether the system was up on a particular date. It also seems reasonable to calculate % uptime. Neither of these requires building a list of all time instants in the interval.

Sort the service times. For the first question, do BinarySearch for the date you care about and check whether the preceding entry was the system being taken offline of maintenance or put back into service. For % uptime, take the (down for maintenance, service restored) pair-wise, use subtraction to find the duration of maintenance, add these up. Then use subtraction to find the length of the total interval.

If your question didn't actually mean you were keeping track of maintenance intervals (or equivalently usage intervals) then you can ignore this answer.

Ben Voigt
reason for downvote?
Ben Voigt