views:

195

answers:

2

I am looking for an existign path truncation algorithm (similar to what the Win32 static control does with SS_PATHELLIPSIS) for a set of paths that should focus on the distinct elements.

For example, if my paths are like this:

 Unit with X/Test 3V/
 Unit with X/Test 4V/
 Unit with X/Test 5V/
 Unit without X/Test 3V/
 Unit without X/Test 6V/
 Unit without X/2nd Test 6V/

When not enough display space is available, they should be truncated to something like this:

 ...with X/...3V/
 ...with X/...4V/
 ...with X/...5V/
 ...without X/...3V/
 ...without X/...6V/
 ...without X/2nd ...6V/

(Assuming that an ellipsis generally is shorter than three letters).

This is just an example of a rather simple, ideal case (e.g. they'd all end up at different lengths now, and I wouldn't know how to create a good suggestion when a path "Thingie/Long Test/" is added to the pool).

There is no given structure of the path elements, they are assigned by the user, but often items will have similar segments. It should work for proportional fonts, so the algorithm should take a measure function (and not call it to heavily) or generate a suggestion list.

Data-wise, a typical use case would contain 2..4 path segments anf 20 elements per segment.

I am looking for previous attempts into that direction, and if that's solvable wiht sensible amount of code or dependencies.

+3  A: 

I'm assuming you're asking mainly about how to deal with the set of folder names extracted from the same level of hierarchy, since splitting by rows and path separators and aggregating by hierarchy depth is simple.

Your problem reminds me a lot of the longest common substring problem, with the differences that:

  1. You're interested in many substrings, not just one.
  2. You care about order.

These may appear substantial, but if you examine the dynamic-programming solution in the article you can see that it revolves around creating a table of "character collisions" and then looking for the longest diagonal in this table. I think that you could instead enumerate all diagonals in the table by the order in which they appear, and then for each path replace, by order, all appearances of these strings with ellipses.

Enforcing a minimal substring length of 2 will return a result similar to what you've outlined in your question.

It does seem like it requires some tinkering with the algorithm (for example, ensuring a certain substring is first in all strings), and then you need to invoke it over your entire set... I hope this at least gives you a possible direction.

Oak
A: 

Well, the "natural number" ordering part is actually easy, simply replace all numbers with formatted number where there is enough leading zeroes, eg. Test 9V -> Test 000009V and Test 12B -> Test 000012B. These are now sortable by standard methods.

For the actual ellipsisizing. Unless this is actually a huge system, I'd just add manual ellipsisizing "list" (of regexes, for flexibility and pain) that'd turn certain words into ellipses. This does requires continuous work, but coming up with the algorithm eats your time too; there are myriads of corner cases.

I'd probably try a "Floodfill" approach. Arrange first level of directories as you would a bitmap, every letter is a pixel. iterate over all characters that are in names of directories. with all of them, "paint" this same character, then "paint" the next character from first string such that it follows this previous character (and so on etc.) Then select the longest painted string that you find.

Example (if prefixed with *, it's painted)

Foo
BarFoo

*Foo
Bar*Foo

*F*oo
Bar*F*oo

...

note that:

*ofoo
b*oo

*o*foo
b*oo
.. painting of first 'o' stops since there are no continuing characters.

of*oo
b*oo
...

And then you get to to second "o" and it will find a substring of at least 2. So you will have to iterate over most possible character instances (one optimization is to stop in each string at position Length-n, where n is the longest already found common substring. But then there is yet another problem (here with "Beta Beta")

          | <- visibility cutout
Alfa Beta Gamma Delta 1
Alfa Beta Gamma Delta 2
Alfa Beta Beta 1
Alfa Beta Beta 2
Beta Beta 1
Beta Beta 2
Beta Beta 3
Beta Beta 4

What do you want to do? Cut Alfa Beta Gamma Delta or Alfa Beta or Beta Beta or Beta?

This is a bit rambling, but might be entertaining :).

Pasi Savolainen