views:

101

answers:

7

I need to sort a C++ std::vector<std::string> fileNames. The fileNames are labeled as such

YYDDDTTTT_Z_SITE

YY = Year (i.e 2009 = 09, 2010 = 10) DDD = Day of the year (i.e 1 January = 001, 31 December = 365) TTTT = Time of the day (i.e midnight = 0000, noon = 1200)

ZONE = Will be either E or W

SITE = Four letter site name (i.e HILL, SAMM)

I need the strings to be sorted by the following order: ZONE, SITE, YY, DDD, TTTT

A: 

You could use qsort with your own string compare function that takes into account your sorting rules, and the address of the first element in each vector where it asks for an array.
http://www.cplusplus.com/reference/clibrary/cstdlib/qsort/

But you shouldn't. Just use std::sort

Larry Wang
This would be fine for C.
Cogwheel - Matthew Orlando
Downvoted. Qsort on the contents of a container? You must be insane.
DeadMG
AFAIK, vectors are guaranteed to store their contents in a continuous array in memory. Just pass the address of the first element.You're right that there's probably no reason to do it this way over using `std::sort`, though.
Larry Wang
Upvoted, because qsort will work on std::vector just fine.
SigTerm
I'm sure it's not incorrect, but is there an actual reason qsort is unsafe or suboptimal? I'd like to know.
Larry Wang
@user379118: 1st, std::sort *may* work faster if a comparison function can be inlined. You can search for benchmarks or do them yourself. 2nd, some people are deathly afraid of using pointers in C++. 3rd, qsort may actually be less efficient than other sorting algorithms, depending on circumstances and data. Insertion sort will work faster on partially sorted data than even qsort. That depends on size of data, and amount of unsorted data in the array, though. As long as you correctly implement comparison routine for qsort, it will work.
SigTerm
Also, people tend to forget that it is unknown which sorting algorithm is being used by std::sort. So writing your own sorting routine or even using qsort can be fine.
SigTerm
@DeadMG: stackoverflow faq has a recommendation you apparently haven't noticed: *"Be nice.Treat others with the same respect you'd want them to treat you. We're all here to learn together."*.
SigTerm
@SigTerm: The primary problem with qsort is that it won't work on other containers, necessitating the use of std::sort anyway. Not just that, but the compiler may not even pick up the problem, depending on the types involved. And if I recommended qsort in C++, I would expect someone else to call me insane, just as much as if they were using longjmp, setjmp, sprintf, malloc, etc. Qsort is completely and totally deprecated in every way by std::sort. Using it in C++ is insane.
DeadMG
@DeadMG: It's not entirely insane. `std::sort` is very bloaty, and each separate instantiation adds several KB of code to your binary, which can result in poor instruction cache performance, whereas `qsort` is only ever instantiated once. I'd still recommend `std::sort` in most cases, but it is not without its faults.
Peter Alexander
@DeadMG: It is not insane, sprintf, qsort and a lot of other C functions have their uses, depending on circumstances. Stop being closed-minded. Always using same technique (like high-level stl facilities) without considering benefits of using other techniques (which you think are "inferior" for little reason) will limit number of things you can learn and do. Programming technique is not a religion. In programming you need to pick right tool(technique) for the job, instead of always blindly believing that your favorite technique is "the only right one", and other techniques are "insane".
SigTerm
@SigTerm: Whether or not `qsort` will work on the contents of a vector (which it will), it will not work on `std::string` or any other non-POD objects. It may happen to do the right thing in some particular implementation (since you end up with the same objects that you started with, just in different locations), but you're firmly in the realm of undefined behaviour.
Mike Seymour
@user379118: "is there an actual reason qsort is unsafe or suboptimal?" `qsort` will copy the objects using `memcpy`, or something equivalent. This is only supported for POD ("plain old data") types; roughly, basic types and simple structures, but not anything with a non-trivial constructor, destructor or assignment operator. `std::string` is not POD, so sorting them with `qsort` gives undefined behaviour. Also `std::sort` can inline the comparisons, while `qsort` has to call a function (via a pointer) each time, so `std::sort` can be quite a bit faster.
Mike Seymour
@Mike Seymour: Now that's what I call a nice comment. Perfectly good argument without "religious" parts. However, it will still work at least on some compilers. If sizeof(std::string) returns size of string including internal pointers and fields of std::string, then you'll get away with memcpy - no data loss, and no memory leaks (because no objects "disappear" from array). Same goes with all other non-pod data. If sizeof includes all internal object data, memcpy will work for them. It is "hackish" approach, but it will function.
SigTerm
@SigTerm: agreed; as I said in my other comment, `qsort` is likely to do the right thing by accident. But you're still relying on undefined behaviour.
Mike Seymour
@Mike Seymour: Thanks, that answer is very helpful. Apparently, although std::string is typically stored contiguously in memory, there are indeed no guarantees (although it will be a requirement in C++0x). http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#530
Larry Wang
+7  A: 

Use std::sort with a comparison function.

(The link has a nice example)

Cogwheel - Matthew Orlando
+1  A: 

Just write a method that will compare two filenames based upon your criteria, to determine which one comes first then use any standard sorting method.

KLee1
A: 

Your sort predicate which you pass to vector::sort() may create reordered temporary strings of the string which it then compares.

codymanix
+1  A: 

Use std::sort and implement a Compare Class

look into http://www.cplusplus.com/reference/stl/list/sort/ for further details

Gustavo V
+1  A: 

The easy part: write the sort itself:

// Return true if the first arg is strictly less than the second
bool compareFilenames(const std::string& rhs, const std::string& lhs);
...
std::sort(fileNames.begin(), fileNames.end(), &compareFilenames);

The harder part: writing the comparison itself. In pseudocode, for full generality:

bool compareFilenames(const std::string& lhs, const std::string& rhs)
{
    parse the filenames
    if (lhs zone != rhs zone)
        return lhs zone < rhs zone
    if (lhs site != rhs site)
        return lhs site < rhs site
    ...
    return false
}

where lhs site, etc. are the individual bits of data you need to sort by, picked out of the filename.

Given the strict file naming structure you have, though, and your specific sorting needs, you can actually get away with just splitting the string by the first '_' character and doing a lexicographical compare of the second chunk, followed the first chunk if the second chunk is equal. That will make the code to parse the filename much easier, at the potential cost of flexibility if the file naming format ever changes.

Owen S.
A: 

Here's a boost lambda functions version. This is overkill, and pretty cryptic, but it's brief and flexible in terms of how one can juggle with different fields criteria. Obviously you need boost. Also, expect increased compilation time. So, here it is:

#include <boost/lambda/lambda.hpp>
#include <boost/lambda/bind.hpp>
#include "boost/lambda/detail/operator_actions.hpp"
#include "boost/lambda/detail/operator_return_type_traits.hpp"
#include "boost/lambda/detail/control_structures_impl.hpp"
#include "boost/ref.hpp"
#include <iostream>
#include <vector>
#include <string>
#include <iterator>
#include <algorithm>
#include <cassert>

using namespace std;
using namespace boost::lambda;

//helpers: a better way would be to group them
//under a flyweight, or something...
string extract_year(string str_)
{
    return str_.substr(0,2);
}

string extract_dayofyear(string str_)
{
    return str_.substr(2,3);
}

string extract_timeofday(string str_)
{
    return str_.substr(5,4);
}

string extract_zone(string str_)
{
    return str_.substr(10,1);
}

string extract_site(string str_)
{
    return str_.substr(12,4);
}

//Uhm, just for brevity... ('cause otherwise we should stay away from macros ;-)
#define IF_THEN_ELSE_RET(op1,op2,exp) if_then_else_return(var(op1)<var(op2),true,if_then_else_return(var(op1)>var(op2),false,exp))

void sort_fnames(vector<string>& fnames)
{
    string z1,z2,s1,s2,y1,y2,d1,d2,t1,t2;

    //sort by zone-then-site-then-year-then-day-then-time:
    //Note the format of the sort(fnames.begin(),fnames.end(), (,...,boolean_expression) );
    //remember, in a sequence of comma-dellimited statements enclosed between parens, like
    //val=(.,...,boolean_expression); only the last expression, boolean_expression, gets
    //assigned to variable "val";
    //So, in the sort() call below, the statements 
    //var(z1)=bind(extract_zone,_1),var(z2)=bind(extract_zone,_2), etc.
    //are only initializing variables that are to be used in the composition 
    //of if_then_else_return(,,) lambda expressions whose composition 
    //combines the zone-then-site-then-year-then-day-then-time criteria 
    //and amounts to a boolean that is used by sort to decide the ordering
    sort(fnames.begin(),fnames.end(),
        (var(z1)=bind(extract_zone,_1),var(z2)=bind(extract_zone,_2),
         var(s1)=bind(extract_site,_1),var(s2)=bind(extract_site,_2),
         var(y1)=bind(extract_year,_1),var(y2)=bind(extract_year,_2),
         var(d1)=bind(extract_dayofyear,_1),var(d2)=bind(extract_dayofyear,_2),
         var(t1)=bind(extract_timeofday,_1),var(t2)=bind(extract_timeofday,_2),
         IF_THEN_ELSE_RET(z1,z2,IF_THEN_ELSE_RET(s1,s2,IF_THEN_ELSE_RET(y1,y2,IF_THEN_ELSE_RET(d1,d2,IF_THEN_ELSE_RET(t1,t2,true)))))
         ));
}
blue scorpion