views:

2648

answers:

5

I'd like some kind of string comparison function that preserves natural sort order1. Is there anything like this built into Java? I can't find anything in the String class, and the Comparator class only knows of two implementations.

I can roll my own (it's not a very hard problem), but I'd rather not re-invent the wheel if I don't have to.

In my specific case, I have software version strings that I want to sort. So I want "1.2.10.5" to be considered greater than "1.2.9.1".


1 By "natural" sort order, I mean it compares strings the way a human would compare them, as opposed to "ascii-betical" sort ordering that only makes sense to programmers. In other words, "image9.jpg" is less than "image10.jpg", and "album1set2page9photo1.jpg" is less than "album1set2page10photo5.jpg", and "1.2.9.1" is less than "1.2.10.5"

+3  A: 

String implements Comparable, and that is what natural ordering is in Java (comparing using the comparable interface). You can put the strings in a TreeSet or sort using the Collections or Arrays classes.

However, in your case you don't want "natural ordering" you really want a custom comparator, which you can then use in the Collections.sort method or the Arrays.sort method that takes a comparator.

In terms of the specific logic you are looking for implementing within the comparator, (numbers separated by dots) I'm not aware of any existing standard implementations of that, but as you said, it is not a hard problem.

EDIT: In your comment, your link gets you here, which does a decent job if you don't mind the fact that it is case sensitive. Here is that code modified to allow you to pass in the String.CASE_INSENSITIVE_ORDER:

    /*
     * The Alphanum Algorithm is an improved sorting algorithm for strings
     * containing numbers.  Instead of sorting numbers in ASCII order like
     * a standard sort, this algorithm sorts numbers in numeric order.
     *
     * The Alphanum Algorithm is discussed at http://www.DaveKoelle.com
     *
     *
     * This library is free software; you can redistribute it and/or
     * modify it under the terms of the GNU Lesser General Public
     * License as published by the Free Software Foundation; either
     * version 2.1 of the License, or any later version.
     *
     * This library is distributed in the hope that it will be useful,
     * but WITHOUT ANY WARRANTY; without even the implied warranty of
     * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
     * Lesser General Public License for more details.
     *
     * You should have received a copy of the GNU Lesser General Public
     * License along with this library; if not, write to the Free Software
     * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
     *
     */

    import java.util.Comparator;

    /**
     * This is an updated version with enhancements made by Daniel Migowski,
     * Andre Bogus, and David Koelle
     *
     * To convert to use Templates (Java 1.5+):
     *   - Change "implements Comparator" to "implements Comparator<String>"
     *   - Change "compare(Object o1, Object o2)" to "compare(String s1, String s2)"
     *   - Remove the type checking and casting in compare().
     *
     * To use this class:
     *   Use the static "sort" method from the java.util.Collections class:
     *   Collections.sort(your list, new AlphanumComparator());
     */
    public class AlphanumComparator implements Comparator<String>
    {
        private Comparator<String> comparator = new NaturalComparator();

        public AlphanumComparator(Comparator<String> comparator) {
            this.comparator = comparator;
        }

        public AlphanumComparator() {

        }

        private final boolean isDigit(char ch)
        {
            return ch >= 48 && ch <= 57;
        }

        /** Length of string is passed in for improved efficiency (only need to calculate it once) **/
        private final String getChunk(String s, int slength, int marker)
        {
            StringBuilder chunk = new StringBuilder();
            char c = s.charAt(marker);
            chunk.append(c);
            marker++;
            if (isDigit(c))
            {
                while (marker < slength)
                {
                    c = s.charAt(marker);
                    if (!isDigit(c))
                        break;
                    chunk.append(c);
                    marker++;
                }
            } else
            {
                while (marker < slength)
                {
                    c = s.charAt(marker);
                    if (isDigit(c))
                        break;
                    chunk.append(c);
                    marker++;
                }
            }
            return chunk.toString();
        }

        public int compare(String s1, String s2)
        {

            int thisMarker = 0;
            int thatMarker = 0;
            int s1Length = s1.length();
            int s2Length = s2.length();

            while (thisMarker < s1Length && thatMarker < s2Length)
            {
                String thisChunk = getChunk(s1, s1Length, thisMarker);
                thisMarker += thisChunk.length();

                String thatChunk = getChunk(s2, s2Length, thatMarker);
                thatMarker += thatChunk.length();

                // If both chunks contain numeric characters, sort them numerically
                int result = 0;
                if (isDigit(thisChunk.charAt(0)) && isDigit(thatChunk.charAt(0)))
                {
                    // Simple chunk comparison by length.
                    int thisChunkLength = thisChunk.length();
                    result = thisChunkLength - thatChunk.length();
                    // If equal, the first different number counts
                    if (result == 0)
                    {
                        for (int i = 0; i < thisChunkLength; i++)
                        {
                            result = thisChunk.charAt(i) - thatChunk.charAt(i);
                            if (result != 0)
                            {
                                return result;
                            }
                        }
                    }
                } else
                {
                    result = comparator.compare(thisChunk, thatChunk);
                }

                if (result != 0)
                    return result;
            }

            return s1Length - s2Length;
        }

        private static class NaturalComparator implements Comparator<String> {
            public int compare(String o1, String o2) {
                return o1.compareTo(o2);
            }
        }
    }
Yishai
"Natural sort" is the widely-used term for sorting "image9.jpg" as being less than "image10.jpg". it is "natural" because it is how a human being would sort them, as opposed to the unnatural "ascii-betical" sorting that computers do by default. http://www.codinghorror.com/blog/archives/001018.html
Kip
I've updated the question to be more clear in this regard
Kip
In the codinghorror link you post, it has a link which gets to this, which has a Java implementation: http://www.davekoelle.com/alphanum.html
Yishai
+1  A: 

I know Apache Commons has a natural sort order in the ComparatorUtils. That might be what you are looking for.

ComparatorUtils

Ben
Natural order in Java (which is what this commons-collections comparator is using) is not the same as what the OP is talking about.
sk
+8  A: 

In java the "natural" order meaning is "lexicographical" order, so there is no implementation in the core like the one you're looking for.

There are open source implementations.

Here's one:

NaturalOrderComparator.java

Make sure you read the:

Cougaar Open Source License

I hope this helps!

OscarRyz
thanks. i revised the question to clarify what i meant by "natural"
Kip
I love SO - just randomly poking around, and here's a quick solution to something I've been putting off for the past week. That's a very useful link - thanks! (And the license is compatible with our project)
Kevin Day
+1  A: 

I have tested three Java implementations mentioned here by others and found that their work slightly differently but none as I would expect.

Both AlphaNumericStringComparator and AlphanumComparator do not ignore whitespaces so that pic2 is placed before pic 1.

On the other hand NaturalOrderComparator ignores not only whitespaces but also all leading zeros so that sig[1] precedes sig[0].

Regarding performance AlphaNumericStringComparator is ~x10 slower then the other two.

Mikhail
A: 

which collection's are natural sorted? a. Sorted map b.hash map c.hash table d.tree map

savio
i don't think you understood the question...
Kip