views:

142

answers:

2

Okay, now this is more a rant about Linux than a question, but maybe someone knows how to do what I want. I know this can be achieved using the sort command, but I want a better solution because getting that to work is about as easy as writing a C program to do the same thing.

I have files, for arguments sake, lets say I have these files: (my files are the same I just have many more)

  • file-10.xml
  • file-20.xml
  • file-100.xml
  • file-k10.xml
  • file-k20.xml
  • file-k100.xml
  • file-M10.xml
  • file-M20.xml
  • file-M100.xml

Now this turns out to be the order I want them sorted in. Incidentally, this is the order in Windows that they are by default sorted into. That's nice. Windows groups consecutive numerical characters into one effective character which sorts alphabetically before letters.

If I type ls at the linux command line, I get the following garbage. Notice the 20 is displaced. This is a bigger deal when I have hundreds of these files that I want to view in a report, in order.

  • file-100.xml
  • file-10.xml
  • file-20.xml
  • file-k100.xml
  • file-k10.xml
  • file-k20.xml
  • file-M100.xml
  • file-M10.xml
  • file-M20.xml

I can use ls -1 | sort -n -k 1.6 to get the ones without 'k' or 'M' correct...

  • file-k100.xml
  • file-k10.xml
  • file-k20.xml
  • file-M100.xml
  • file-M10.xml
  • file-M20.xml
  • file-10.xml
  • file-20.xml
  • file-100.xml

I can use ls -1 | sort -n -k 1.7 to get none of it correct

  • file-100.xml
  • file-10.xml
  • file-20.xml
  • file-k10.xml
  • file-M10.xml
  • file-k20.xml
  • file-M20.xml
  • file-k100.xml
  • file-M100.xml

Okay, fine. Let's really get it right. ls -1 | grep "file-[0-9]*\.xml" | sort -n -k1.6 && ls -1 file-k*.xml | sort -n -k1.7 && ls -1 file-M*.xml | sort -n -k1.7

  • file-10.xml
  • file-20.xml
  • file-100.xml
  • file-k10.xml
  • file-k20.xml
  • file-k100.xml
  • file-M10.xml
  • file-M20.xml
  • file-M100.xml

Whew! Boy glad the "power of the linux command line" saved me there. (This isn't practical for my situation, because instead of ls -1 I have a command that is another line or two long)

Now, the Windows behavior is simple, elegant, and does what you want it to do 99% of the time. Why can't I have that in linux? Why oh why does sort not have a "automagic sort numbers in a way that doesn't make me bang head into wall" switch?

Here's the pseudo-code for C++:

bool compare_two_strings_to_avoid_head_injury(string a, string b)
{
    string::iterator ai = a.begin();
    string::iterator bi = b.begin();
    for(; ai != a.end() && bi != b.end(); ai++, bi++)
    {
        if (*ai is numerical)
            gobble up the number incrementing ai past numerical chars;
        if (*bi is numerical)
            gobble up the number incrementing bi past numerical chars;
        actually compare *ai and *bi and/or the gobbled up number(s) here
            to determine if we need to compare more chars or can return the 
            answer now;
    }
    return something here;
}

Was that so hard? Can someone put this in sort and send me a copy? Please?

A: 

ls -1v will get you pretty close. It just sorts all capital letters before lower case.

Karl Bielefeldt
+1  A: 

This would be my first thought:

ls -1 | sed 's/\-\([kM]\)\?\([0-9]\{2\}\)\./-\10\2./' | sort | sed 's/0\([0-9]\{2\}\)/\1/'

Basically I just use sed to pad the number with zeros and then use it again afterwards to strip off the leading zero.

I don't know if it might be quicker in Perl.

David Zaslavsky
This is what I ended up doing, based on your suggestion. I have this since I needed up to 4 digits `for f in \`ls -1 $1*.xml | sed -r 's/-([kM]?)([0-9]{4})\./-\10\2./; s/-([kM]?)([0-9]{3})\./-\100\2./; s/-([kM]?)([0-9]{2})\./-\1000\2./; s/-([kM]?)([0-9]{1})\./-\10000\2./' | sort | sed -r 's/0+([1-9])/\1/'\`; do` which I find to be thoroughly ridiculous for such a simple task. It's a large failing of `sort` IMO.
Scott