tags:

views:

2073

answers:

9

I have an input file that I want to sort based on timestamp which is a substring of each record. I want to store multiple attributes of the

The list is currently about 1000 records. But, I want it to be able to scale up a bit just in case.

When I did it with a Linked List by searching the entire list for insertion it took about 20 seconds. Now, just filling up a vector and outputting to file is taking 4 seconds (does that sound too long)?

I would like to use merge sort or quick sort (merge sort appears to be a little easier to me). The trouble that I'm running into is that I don't see many examples of implementing these sorts using objects rather than primitive data types.

I could use either a vector or Linked list. The feedback that I've gotten from this site has been most helpful so far. I'm hoping that someone can sprinkle on the magic pixie dust to make this easier on me :)

Any links or examples on the easiest way to do this with pretty decent performance would be most appreciated. I'm getting stuck on how to implement these sorts with objects because I'm newbie at C++ :)

Here's what my new code looks like (no sorting yet):

class CFileInfo  
{  
    public:  
    std::string m_PackLine;  
    std::string m_FileDateTime;  
    int m_NumDownloads;  
};  
void main()  
{  
    CFileInfo packInfo;  
    vector<CFileInfo> unsortedFiles;  
    vector<CFileInfo>::iterator Iter;  
    packInfo.m_PackLine = "Sample Line 1";  
    packInfo.m_FileDateTime = "06/22/2008 04:34";  
    packInfo.m_NumDownloads = 0;  
    unsortedFiles.push_back(packInfo);  
    packInfo.m_PackLine = "Sample Line 2";  
    packInfo.m_FileDateTime = "12/05/2007 14:54";  
    packInfo.m_NumDownloads = 1;  
    unsortedFiles.push_back(packInfo);  
    for (Iter = unsortedFiles.begin(); Iter != unsortedFiles.end(); ++Iter )   
    {  
        cout << " " << (*Iter).m_PackLine;  
    }  
}
+2  A: 

The stl has a sort algorithm in the header

 <algorithm>

Here's a link to the SGI manual.

Paul Nathan
+7  A: 

I'm not sure I understood your question correctly, is your problem defining the sort functor? The STL sort is generally implemented as an introspective sort which is very good for most of the cases.

struct sort_functor
{
    bool operator()(const CFileInfo & a, const CFileInfo & b) const
    {

        // may be a little bit more subtle depending on what your strings look like
        return a.m_FileDateTime < b.m_FileDateTime;
    }
}

std::sort(unsortedFiles.begin(), unsortedFile.end(), sort_functor());

or using boost::lambda

std::sort(unsortedFiles.begin(), 
    unsortedFile.end(),
    bind(&CFileInfo::m_FileDateTime, _1) < bind(&CFileInfo::m_FileDateTime, _2));

Was it the needed information?

Edouard A.
+2  A: 

Use std::sort in the algorithm header:

If you define the operator < for CFileInfo, it should work without a problem.

Alternatively, define a functor performing the comparison, and pass that as a separate argument to the sort function.

jalf
A: 

Thanks all so far.

I think Edouard rephrased my question better. Although the Boost is a little intimidating to me ;) I'll need to figure out how to integrate the sort functor he recommended into my code (how to call it and all it's implications).

I think the response from jalf is about the same thing.

Paul has a good recommendation too, but the other answers seem to get into syntax more which is what I'm really lacking. Maybe I'm in over my head with my c++ experience :(

Rich
You don't necessarily need the functor - if you implement operator< for your records, std::sort uses that by default.
Steve Jessop
A: 

I finally understood what Edouard is saying, I'll try this but keep the post open for a little, Thanks!!!!!!!!!!!

Rich
A: 

Sorting a linked-list will inherently be either O(N^2) or involve external random-access storage.

Vectors have random access storage. So do arrays. Sorting can be O(NlogN).

At 1000 elements you will begin to see a difference between O(N^2) and O(NlogN). At 1,000,000 elements you'll definitely notice the difference!

It is possible under very special situations to get O(N) sorting. (For example: Sorting a deck of playing cards. We can create a function(card) that maps each card to its sorted position.)

But in general, O(NlogN) is as good as it gets. So you might as well use STL's sort()!
Just add #include <algorithms>


All you'll need to add is an operator<(). Or a sort functor.

But one suggestion: For god's sake man, if you are going to sort on a date, either encode it as a long int representing seconds-since-epoch (mktime?), or at the very least use a "year/month/day-hour:minute:second.fraction" format. (And MAKE SURE everything is 2 (or 4) digits with leading zeros!) Comparing "6/22/2008-4:34" and "12/5/2007-14:54" will require parsing! Comparing "2008/06/22-04:34" with "2007/12/05-14:54" is much easier. (Though still much less efficient than comparing two integers!)


Rich wrote: the other answers seem to get into syntax more which is what I'm really lacking.

Ok. With basic a "int" type we have:

#define PRINT(DATA,N) for(int i=0; i<N; i++) { cout << (i>0?", ":"") << DATA[i]; } cout << endl;

int
main()  
{
    // Creating and Sorting a stack-based array.
  int d [10] = { 1, 4, 0, 2, 8, 6, 3, 5, 9, 7 };
  PRINT(d,10);
  sort( d, d+10 );
  PRINT(d,10);

  cout << endl;

    // Creating a vector.
  int eData [10] = { 1, 4, 0, 2, 8, 6, 3, 5, 9, 7 };
  vector<int> e;
  for(int i=0; i<10; i++ )
    e.push_back( eData[i] );

    // Sorting a vector.
  PRINT(e,10);
  sort(e.begin(), e.end());
  PRINT(e,10);
}

With your own type we have:

class Data
{  
public:  
  string m_PackLine;  
  string m_FileDateTime;  
  int    m_NumberDownloads;

    /* Lets simplify creating Data elements down below. */
  Data( const string & thePackLine  = "",
        const string & theDateTime  = "",
        int            theDownloads = 0 )
      : m_PackLine        ( thePackLine  ),
        m_FileDateTime    ( theDateTime  ),
        m_NumberDownloads ( theDownloads )
    { }

    /* Can't use constructor with arrays */
  void set( const string & thePackLine,
            const string & theDateTime,
            int            theDownloads = 0 )
    {
      m_PackLine        = thePackLine;
      m_FileDateTime    = theDateTime;
      m_NumberDownloads = theDownloads;
    }

    /* Lets simplify printing out down below. */ 
  ostream & operator<<( ostream & theOstream ) const
    {
      theOstream << "PackLine=\"" << m_PackLine
                 << "\"   fileDateTime=\"" << m_FileDateTime
                 << "\"   downloads=" << m_NumberDownloads;
      return theOstream;
    }


    /*
     * This is IT!  All you need to add to use sort()!
     *  Note:  Sort is just on m_FileDateTime.  Everything else is superfluous.
     *  Note:  Assumes "YEAR/MONTH/DAY HOUR:MINUTE" format.
     */
  bool operator< ( const Data & theOtherData ) const
    { return m_FileDateTime < theOtherData.m_FileDateTime; }

};

    /* Rest of simplifying printing out down below. */ 
ostream & operator<<( ostream & theOstream, const Data & theData )
  { return theData.operator<<( theOstream ); }


    /* Printing out data set. */
#define PRINT(DATA,N) for(int i=0; i<N; i++) { cout << "[" << i << "]  " << DATA[i] << endl; }  cout << endl;

int
main()
{  
    // Creating a stack-based array.
  Data d [10];
  d[0].set( "Line 1", "2008/01/01 04:34", 1 );
  d[1].set( "Line 4", "2008/01/04 04:34", 4 );
  d[2].set( "Line 0", "2008/01/00 04:34", 0 );
  d[3].set( "Line 2", "2008/01/02 04:34", 2 );
  d[4].set( "Line 8", "2008/01/08 04:34", 8 );
  d[5].set( "Line 6", "2008/01/06 04:34", 6 );
  d[6].set( "Line 3", "2008/01/03 04:34", 3 );
  d[7].set( "Line 5", "2008/01/05 04:34", 5 );
  d[8].set( "Line 9", "2008/01/09 04:34", 9 );
  d[9].set( "Line 7", "2008/01/07 04:34", 7 );

    // Sorting a stack-based array.
  PRINT(d,10);
  sort( d, d+10 );
  PRINT(d,10);

  cout << endl;

    // Creating a vector.
  vector<Data> e;
  e.push_back( Data( "Line 1", "2008/01/01 04:34", 1 ) );
  e.push_back( Data( "Line 4", "2008/01/04 04:34", 4 ) );
  e.push_back( Data( "Line 0", "2008/01/00 04:34", 0 ) );
  e.push_back( Data( "Line 2", "2008/01/02 04:34", 2 ) );
  e.push_back( Data( "Line 8", "2008/01/08 04:34", 8 ) );
  e.push_back( Data( "Line 6", "2008/01/06 04:34", 6 ) );
  e.push_back( Data( "Line 3", "2008/01/03 04:34", 3 ) );
  e.push_back( Data( "Line 5", "2008/01/05 04:34", 5 ) );
  e.push_back( Data( "Line 9", "2008/01/09 04:34", 9 ) );
  e.push_back( Data( "Line 7", "2008/01/07 04:34", 7 ) );

    // Sorting a vector.
  PRINT(e,10);
  sort(e.begin(), e.end());
  PRINT(e,10);
}
Mr.Ree
A: 

Thanks mrree! You anticipated other areas that I'm now having difficulty with.

My current date format is: "yyyy-yy-yy hh:mm" rather than what you specified "yyyy/mm/dd hh:mm" ("YEAR/MONTH/DAY HOUR:MINUTE").

I was stumbling on how to compare the dates like you indicate in the code block below. Maybe if I just parse out the dashes in the date and replace with slashes, that'll work.. I'll try this when I get home, and maybe it will all work! :) Hopefully, the below comparison works on date/time strings when in the format you recommend.

bool operator< ( const Data & theOtherData ) const
{ return m_FileDateTime < theOtherData.m_FileDateTime; }

I was attempting to do a comparison with the following function, but couldn't get it to work. I'll post my code for a good laugh ;) Probably not terribly efficient..

string stripLeadingZeroOfTwoChars(string twoChars)
{
if (twoChars.compare(0,1,"0") == 0) 
{
         return twoChars.substr(1,1);
}
return twoChars;
}

bool IsIncomingTimeStampNewer(std::string incomingTS, std::string destinationTS) 
{
    // example of incoming timestamp
    // 2008-09-02 03:00
    std::string strIncomingYear = incomingTS.substr(0,4);
    std::string strIncomingMonth = incomingTS.substr(5,2);
    std::string strIncomingDay = incomingTS.substr(8,2);
    std::string strIncomingHour = incomingTS.substr(11,2);
    std::string strIncomingMinute = incomingTS.substr(14,2);

    std::string strDestinationYear = destinationTS.substr(0,4);
    std::string strDestinationMonth = destinationTS.substr(5,2);
    std::string strDestinationDay = destinationTS.substr(8,2);
    std::string strDestinationHour = destinationTS.substr(11,2);
    std::string strDestinationMinute = destinationTS.substr(14,2);

    strIncomingMonth = stripLeadingZeroOfTwoChars(strIncomingMonth);
    strIncomingDay = stripLeadingZeroOfTwoChars(strIncomingDay);
    strIncomingHour = stripLeadingZeroOfTwoChars(strIncomingHour);
    strIncomingMinute = stripLeadingZeroOfTwoChars(strIncomingMinute);

    strDestinationMonth = stripLeadingZeroOfTwoChars(strDestinationMonth);
    strDestinationDay = stripLeadingZeroOfTwoChars(strDestinationDay);
    strDestinationHour = stripLeadingZeroOfTwoChars(strDestinationHour);
    strDestinationMinute = stripLeadingZeroOfTwoChars(strDestinationMinute);

    istringstream buffer(strIncomingYear);
    int intIncomingYear;
    buffer >> intIncomingYear;
    istringstream buffer2(strIncomingMonth);
    int intIncomingMonth;
    buffer2 >> intIncomingMonth;
    istringstream buffer3(strIncomingDay);
    int intIncomingDay;
    buffer3 >> intIncomingDay;
    istringstream buffer4(strIncomingHour);
    int intIncomingHour;
    buffer4 >> intIncomingHour;
    istringstream buffer5(strIncomingMinute);
    int intIncomingMinute;
    buffer5 >> intIncomingMinute;

    istringstream buffer6(strDestinationYear);
    int intDestinationYear;
    buffer6 >> intDestinationYear;
    istringstream buffer7(strDestinationMonth);
    int intDestinationMonth;
    buffer7 >> intDestinationMonth;
    istringstream buffer8(strDestinationDay);
    int intDestinationDay;
    buffer8 >> intDestinationDay;
    istringstream buffer9(strDestinationHour);
    int intDestinationHour;
    buffer9 >> intDestinationHour;
    istringstream buffer10(strDestinationMinute);
    int intDestinationMinute;
    buffer10 >> intDestinationMinute;

    // compare the dates to see if the incoming timestamp is newer
    if (intIncomingYear > intDestinationYear)
    {
        //WriteDebugLog("--------------Returning True for Year");
        return true;
    }
    if (intIncomingYear < intDestinationYear)
    {
        //WriteDebugLog("--------------Returning False for Year");
        return false;
    }
    if (intIncomingMonth > intDestinationMonth)
    {
        //WriteDebugLog("--------------Returning True for Month");
        return true;
    }
    if (intIncomingMonth < intDestinationMonth)
    {
        //WriteDebugLog("--------------Returning false for Month");
        return false;
    }
    if (intIncomingDay > intDestinationDay)
    {
        //WriteDebugLog("--------------Returning True for Day");
        return true;
    }
    if (intIncomingDay < intDestinationDay)
    {
        //WriteDebugLog("--------------Returning False for Day");
        return false;
    }

    if (intIncomingHour > intDestinationHour)
    {
         //WriteDebugLog("--------------Returning True for Hour");
         return true;
    }
    if (intIncomingHour < intDestinationHour)
    {
        //WriteDebugLog("--------------Returning False for Hour");
        return false;
    }

    if (intIncomingMinute > intDestinationMinute)
    {
        //WriteDebugLog("--------------Returning True for Minute");
        return true;
    }
    if (intIncomingMinute < intDestinationMinute)
    {
        //WriteDebugLog("--------------Returning false for Minute");
        return false;
    }

    // the timestamps are equal, so consider the incoming one newer for insertion purposes
    //WriteDebugLog("--------------Returning True at End");
    return true;
}

I was on my way to checking out Date datatypes, but didn't make it that far. Anyways, thanks all for your help! This c++ stuff is rough to learn!

Rich
A: 

Rich -- To answer you more recent question (and not your original question), it's probably best/simplest to just parse out the date with sscanf(). Ideally you want to store it numerically to begin with.

With a "YYYY/MM/DD-HH:MM" string, you can just compare the strings. All the strings are the same length, and you are going from largest time increment to smallest time increment as you read left-to-right.

But comparing strings is very inefficient!

Usually dates are stored as time_t (integer) values measured in seconds since the Epoch (00:00:00 UTC, January 1, 1970).

mktime() or timegm() (if you have timegm) will construct a time_t value from a "struct tm" you supply.

Sample code:

#define SHOW(X)  cout << # X " = " << (X)

int
main()
{
  const string s = "2008/12/03 12:48";
  struct tm    datetime;
  time_t       t;

  memset( & datetime, 0, sizeof(datetime) );

  if ( 5 != sscanf( s.c_str(), "%d/%d/%d %d:%d",
                    & datetime.tm_year,
                    & datetime.tm_mon,
                    & datetime.tm_mday,
                    & datetime.tm_hour,
                    & datetime.tm_min  ) )
  {
    cout << "FAILED to parse:  \"" << s << "\"" << endl;
    exit(-1);
  }

    /* tm_year - The number of years since 1900. */
  datetime.tm_year -= 1900;

    /* tm_mon - The number of months since January, in the range 0 to 11. */
  datetime.tm_mon --;

    /* tm_mday - The day of the month, in the range 1 to 31. */
    /* tm_hour - The number of hours past midnight, in the range 0 to 23. */
    /* tm_min - The number of minutes after the hour, in the range 0 to 59. */
  // No change.

  /* If using mktime, you may need these to force UTC time:
   *   setenv("TZ","",1);
   *   tzset();
   */

  t = mktime( & datetime );

  SHOW( t ) << endl;
  SHOW( asctime( & datetime ) );
  SHOW( ctime( & t ) );
}

Now given two time (date) values, e.g. *time_t t1, t2*, you can compare them with just t1<t2.

Mr.Ree
A: 

Thanks mree, wish I could credit Eduoard with answer as well :) You guys are great

Rich