In Python3
>>> from collections import Counter
>>> count_hash=Counter()
>>> T=(3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 65, 4, 7, 13, 32)
>>> for i in range(2,len(T)+1):
... for j in range(len(T)+1-i):
... count_hash[T[j:j+i]]+=1
...
>>> for k,v in count_hash.items():
... if v >= 2:
... print(k,v)
...
(3, 5) 2
(4, 7, 13) 2
(7, 13) 2
(4, 7) 2
Do you need to filter the (7,13) and the (4,7) out? What happens if there was also (99, 7, 14) in the sequence?
a Counter
is just like a hash used to keep track of the number of times we see each substring
The two nested for loops produce all the substrings of T
, using count_hash
to accumulate the count of each substring.
The final for loop filters all those substrings that only occurred once
Here is a version with a filter
from collections import Counter
def substrings(t, minlen=2):
tlen = len(t)
return (t[j:j+i] for i in range(minlen, tlen+1) for j in range(tlen+1-i))
def get_freq(*t):
counter = Counter(substrings(t))
for k in sorted(counter, key=len):
v=counter[k]
if v < 2:
del counter[k]
continue
for t in substrings(k):
if t in counter:
if t==k:
continue
counter[k]+=counter[t]-v
del counter[t]
return counter
print(get_freq(3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 65, 4, 7, 13, 32, 4, 7))
print(get_freq(1,2,3,4,1,2,3,4,1,2,7,8,7,8,3,4,3,4,1,2))
the output is
Counter({(4, 7, 13): 3, (3, 5): 2})
Counter({(1, 2, 3, 4, 1, 2): 8, (7, 8): 2}) # Is this the right answer?
Which is why I asked how the filtering should work for the sequence I gave in the comments