How do you detect repetitions in a list of strings?

views:

198

answers:

+3 Q:

How do you detect repetitions in a list of strings?

I have a sequence of SQL calls, which I want to use to detect loops (and hence unnecessary duplicate sql calls), but it got me thinking of this more general problem.

Given a list, say [a,b,c,b,c,a,b,c,b,c,a,b,b]

Is there some way I can turn that into a,[[b,c]*2,a]*2,b*2

or, [a,[b,c]*2]*2,a,b*2

That is, detect repetitions (possibly nested ones).

If you can sort it first, then it's easy to go through one more time to find duplicate runs. Of course, sorting something as free-form as SQL queries sounds a bit scary.

unwind 2008-12-08 15:18:54

I’m no expert in that field but you might want to check out some compression algorithms, it seems to me that this is quite exactly what they do.

Bombe 2008-12-08 15:19:09

+4 A:

Look into the Lempel-Ziv-Welsh compression algorithm. It is built on detecting repetitions in strings and utilizing them for compression. I believe you can use a Trie for it.

Yuval F 2008-12-08 15:19:14

If the string is sufficiently large, an interesting approach is to run a compression tool (like gzip, bzip, or 7zip) on it. These tools work by locating repetitions (at various levels), and substituting them by pointers to the first instance of the text (or a dictionary). The compression you achieve is a measure of the repetition. Dumping the file (you'll have to write code to do that) will give you the repeated content.

Diomidis Spinellis 2008-12-08 15:20:15

Doubt this will work, as the compression programs will happily use substrings, and will ignore the SQL command boundaries.

derobert 2008-12-08 15:45:01

ansaurus

tags:

views:

answers:

How do you detect repetitions in a list of strings?

related questions