This question is slightly different from the kind of finding longest sequence or substring from two strings.
Given two string of the same size N, find the longest substrings from each string such that the substrings contains the same bag of chars.
The two substrings may not necessarily have the same sequence. But they must have the same bag of chars.
For example,
a = ABCDDEGF b = FPCDBDAX
The longest matching bag of chars are ABCDD (ABCDD from a, CDBDA from b)
How to solve this problem?
UPDATE
The goal is to find substrings from each input string, such that they have the same bag of chars. By saying "substring", they must be consecutive chars.
Update: Initially I thought of a dynamic programming approach. It works as below.
To compare two bags of chars of the same length K, it would take O(K) time to achieve so. Convert each string into a shorten form:
ABCDDEGF -> A1B1C1D2E1G1F1
FPCDBDAX -> A1B1C1D2F1P1X1
The shorten form is sorted alphabets followed by number of frequencies in the string. Construct, sort, and compare the shorten forms would take O(K) time in total. (Implementation can be achieved by using an array of chars though)
Two bags of chars are equal iif their shorten forms have the same chars and respective frequencies.
In addition, it takes O(logK) time to find the difference chars between the two string.
Now, for two input strings:
- If their shorten forms are identical, then this is the longest common bag of chars.
- Find chars in string1 such that they do not appear in string2. Tokenize string1 into several substrings based on those chars.
- Find chars in string2 such that they do not appear in string1. Tokenize string2 into several substrings based on those chars.
- Now, we have two list of strings. Compare each pair (which in turn is the same problem with a smaller size of input) and find the longest common bag of chars.
The worst case would be O(N3), and best case would be O(N). Any better idea?