views:

169

answers:

5

I working on a combinatorial optimization problem that I suspect is NP-hard, and a genetic algorithm has been working well with our dataset. We're a research group and plan to publish our results in our field (not in math or CS), and I'd like to explore the NP-hard question before sending the manuscript out for review.

There are two main questions:

1) I'd like to know whether this particular optimization problem has been studied. I've heavily searched the lit but haven't seen anything exactly the same.

2) If the problem hasn't been studied, I might take a crack at doing a reducibility proof, and would like some pointers to good NP-complete candidates for the reduction.

The problem can be described in two ways, as a subsequence variant, and as a bipartite graph problem.

In the subsequence flavor, I want to find a "relaxed" subsequence that allows permutations, and optimize to minimize the permutation count. For example: (. = any other char)

Query: abc, Target: ..b.a.b.c., Result: abc (normal subsequence)

Query: abc, Target: ..b.a.c.a., Result: bac (subsequence with one permutation)

The bipartite formulation is a matching problem or linear assignment problem, with the graph partitioned into query character nodes and target character nodes. The edges connect the query characters to the target characters, such that there is exactly one edge from each query char to a target char. The objective function is to minimize the number of edge crossings (also called "crossing number" in the lit). This is similar to bipartite graph layout algorithms that reorder node placement to minimize edge crossings, but my problem requires the that both node orders stay fixed.

Any thoughts from the experts on questions 1 or 2?

Thanks in advance!

A: 

Just some idea: Does it somehow equivalent to finding the minimal number of swap needed to sort an array (MIN-SBR)? If yes, this is NP-Hard.

(btw, are you working on something similar to this?)

J-16 SDiZ
A: 

Thanks J-16 SDiZ. MIN-SBR seems related but distinct. We don't have any notion of block reversals, just independent (unlikely to be adjacent) misorderings. The authors also avoid "Word problems," which is what we have: multiple identical chars in the target string.

Our problem is biological but unrelated to genome rearrangement mapping. Thanks very much for your pointer.

fred
The problem with "word problem" should be harder, right?
J-16 SDiZ
This is not an answer. Stackoverflow is not a forum. Put comments in comments, answers in answers, and if you want to update the question, edit the question.
Svante
Good point. So why doesn't the software prevent self-answers?
piccolbo
@piccolbo, because you *can* answer your own question(s). Also see the [FAQ](http://stackoverflow.com/faq): *It's also perfectly fine to ask and answer your own question, as long as you pretend you're on Jeopardy: phrase it in the form of a question.*
Bart Kiers
A: 

The problem with "word problem" should be harder, right? – J-16 SDiZ 14

Yes, having multiple occurrences of the char in the target seems to make my problem harder than MIN-SBR, so from that angle my problem would be at least as hard as NP-complete. On the other hand, I'm not yet clear on how their central notion of block reversals would affect my claim of NP-completeness.

I sure would be nice to know whether my optimization can be solved in polynomial time. Put another way, it sure would be embarrassing if a reviewer came back with five lines of pseudocode that finds the global maximum in O(n)..

fred
+1  A: 

To piccolbo:

If you are not publishing in math or CS, a NP-completeness result will be irrelevant and just irritate the biologist or MD doing the review. Been there.

You bet. The primary report will be on the wet results, but we might choose a journal that's more interdisciplinary. Also, wanting to know about the NP-ness is partly for my own edification. It would be pretty slobby to use a genetic algorithm if it's totally unjustified, and if there's a way to find the guaranteed global maximum in polynomial time. Right now the GA is finding good solutions but it's obviously difficult to know whether it's finding the best solution.

What is your meaning of permutation? One involving only two chars? Or only two adjacent ones? A permutation I think in its general meaning allows you to rearrange the whole string, but then the problem becomes trivial?

It's an arbitrary number of permutations in the target string, and minimizing the number of permutations (i.e., edge crossings in the bipartite formulation) is the objective function. Permutations can be anywhere and they're independently distributed, so adjacency would occur (infrequently) by chance. The ordering of the query and target strings is fixed, so I can't do any rearrangement.

If I prove it NP-hard, do I get co-authorship?

Let's see the proof :-)

fred
OK, if you can't define permutation without using the word permutation I give up.
piccolbo
A: 

Would, Query: abc Target: ..c.b.a.a Result: cba, be three permutations (as per your use of the term) then? If so, then maybe you mean transpositions rather than permutations. A transposition is the swapping of two adjacent characters.

Good question. We're interested in a mapping from Query -> Target that has as few crossings as possible. This is very much the motivation for mentioning the bipartite edge crossings in the original post. Alternatively, you can think of maximizing a rank statistic, like Spearman's Rho, over the mapping.

Also, out of curiousity, how many unique characters are there in the query/target? – Justin Peel 18

Typical query: 100, typical target: 1000. Combinatorially, it's a huge solution space.

fred