+14  A: 
template<size_t length> int less(const char* left, const char* right) {
    return memcmp(left, right, length) < 0;
}

std::sort(array, array + array_length, less<buffer_length>);
Leon Timmermans
std::sort expects iterators as parameters. It will work on things like vectors, but I doubt working on an array.
Mehrdad Afshari
pointers are valid iterators
Leon Timmermans
My bad. You're right! Thanks for the point!
Mehrdad Afshari
are left and right contigious? The values in the array are not null terminated, but rather a hard coded length.int length = memcmp(left, right, length) < 0;
baash05
baash05, well you should have told that to us. now that requires entirely different algorithms. memcmp and bind3rd. there is no such standard function object adaptor i think :)
Johannes Schaub - litb
TRUE.. It's part of the growing question (damb scope creap).. Suppose tis why we can edit our posts :)
baash05
The thing to verify here is that we can declare char array[10000][28+1], and then read in the data. Or may be declare array[1][28+1], allocate it with correct nuber of rows, and then read in the data. Still, it's pretty fragile. GNU sort is better.
Arkadiy
This could be easily adapted with strncmphttp://www.cplusplus.com/reference/clibrary/cstring/strncmp.html
tfinniga
Updated it with the new information baash05 provided.
Leon Timmermans
thanks Leon.. I'm still a bit out because I can't do this inplace. I can accomplish this for smaller files, but larger files can't be loaded into ram. (I have 64megs of ram, including OS, datafiles, ect)
baash05
Sort it in pieces, than do a 'sorted merge' on the pieces.
Leon Timmermans
Note that this will only work if the length is known at compile time though
Johannes Schaub - litb
+2  A: 

Probably the easiest way is to used the old stdlib.h function qsort. This should work:

qsort( array, num_elements, sizeof( char* ), strcmp )

Please note that this is standard C and only works reliable with English text.

If you have a list of String objects, then other things are possible in C++.

If you are on Linux and writing a gtk or Qt application then I would propose that you have a look at these libraries beforehand.

Ralf
As with Mehrdad's answer, this is the wrong answer. qsort will be passing the const void pointer cast of a 'const char **' (double pointer), and strcmp() does not like that -- with good reason.
Jonathan Leffler
+2  A: 

If the files are large and do not fit in RAM, you can use bin/bucket sort to split the data into smaller files and finally aggregate the pieces in a result file. Other responses show you how to sort each individual bucket file.

Ovidiu Pacurar
+4  A: 

You can use the algorithms in the STL on arrays native datatypes, not just on STL containers. The other suggestion to use std::sort won't work as posted however, because strcmp returns a value that evaluates to true for all comparisons when the strings aren't the same, not just if the left hand side is less than the right hand side -- which is what std::sort wants; a binary predicate returning true of the left hand side is less than the right hand side.

This works:

struct string_lt : public std::binary_function<bool, char, char>
{
    bool operator()(const char* lhs, const char* rhs)
    {
     int ret = strcmp(lhs, rhs);
     return ret < 0;
    }
};

int _tmain(int argc, _TCHAR* argv[])
{
    char* strings [] = {"Hello", "World", "Alpha", "Beta", "Omega"};
    size_t numStrings = sizeof(strings)/sizeof(strings[0]);

    std::sort(&strings[0], &strings[numStrings], string_lt());

    return 0;
}
John Dibling
+5  A: 

Use the GNU sort program (externally) if you can't fit the data into RAM: it will sort arbitrary sized files and the larger the file, the smaller the additional cost of creating the process.

Aaron Digulla
+3  A: 

boost::bind can do it:

// ascending
std::sort(c, c + size,  boost::bind(std::strcmp, _1, _2) < 0); 

// descending
std::sort(c, c + size,  boost::bind(std::strcmp, _1, _2) > 0);

Edit: The strings are not null-terminated:

// ascending
std::sort(c, c + array_size,  boost::bind(std::memcmp, _1, _2, size) < 0); 

// descending
std::sort(c, c + array_size,  boost::bind(std::memcmp, _1, _2, size) > 0);
Johannes Schaub - litb
A: 

The canonical way to sort an array of character strings in C, and therefore an available but not necessarily recommended way to do so in C++, uses a level of indirection to strcmp():

static int qsort_strcmp(const void *v1, const void *v2)
{
    const char *s1 = *(char * const *)v1;
    const char *s2 = *(char * const *)v2;
    return(strcmp(s1, s2));
}

static void somefunc(void)   // Or omit the parameter altogether in C++
{
    char **array = ...assignment...
    size_t num_in_array = ...number of char pointers in array...
    ...
    qsort(array, num_in_array, sizeof(char *), qsort_strcmp);
    ...more code...
}
Jonathan Leffler
A: 

A few things come to mind:

  1. If your data is too big to fit into memory, you may want to just build up an index in-memory of file offsets, then memory-mapping the file to access the strings (depends on your OS).
  2. In-place is going to require a lot of memory copies. If you can, use a shell sort. Then once you know the final order, it's much easier to reorder the strings in-place in linear time.
  3. If the strings are all the same length, you really want a radix sort. If you're not familiar with a radix sort, here's the basic idea: Comparison-based sorting (which is what std::sort, qsort, and any other general-purpose sorting) always requires O(N log N) time. Radix sorting compares a single digit at a time (starting at str[0] and ending at str[K-1] for a K-lenth string), and overall can require only O(N) time to execute.

Consult the Internetfor a much better detailed description of radix sorting algorithms than I can provide. Aside from what I've said, I would avoid all of the other solutions that use standard libarary sorting facilities. They just aren't designed your particular problem, unfortunately.

Tom
A: 

You probably want to look into memory mapped files (see http://en.wikipedia.org/wiki/Memory-mapped_file), mmap() function (http://en.wikipedia.org/wiki/Mmap) on POSIX-complaint OSes. You'll essentially get a pointer to contiguous memory representing the file's contents.

The good side is that the OS will take care of loading parts of the file into memory and unloading them again, as needed.

One downside is that you'll need to resolve to some form of file locking to avoid corruption if more than one process is likely to access the file.

Another downside is that this doesn't guarantee good performance - to do that, you'll need a sorting algorithm that tries to avoid constantly loading and unloading pages (unless of course you have enough memory to load the entire file into memory).

Hope this has given you some ideas!