views:

28

answers:

2

I have built a blog platform in VB.NET where the audience are very young, and for some reason like to express their commitment by repeating sequences of characters in their comments.

Examples:

Hi!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! <3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3 LOLOLOLOLOLOLOLOLOLOLOLOLLOLOLOLOLOLOLOLOLOLOLOLOL

..and so on.

I don't want to filter this out completely, however, I would like to shorten it down to a maximum of 5 repeating characters or sequences in a row. I have no problem writing a function to handle a single repeating character. But what is the most effective way to filter out a repeating sequence as well?

This is what I used earlier for the single repeating characters

Private Shared Function RemoveSequence(ByVal str As String) As String
    Dim sb As New System.Text.StringBuilder
    sb.Capacity = str.Length
    Dim c As Char
    Dim prev As Char = String.Empty
    Dim prevCount As Integer = 0

    For i As Integer = 0 To str.Length - 1
        c = str(i)
        If c = prev Then
            If prevCount < 10 Then
                sb.Append(c)
            End If
            prevCount += 1
        Else
            sb.Append(c)
            prevCount = 0
        End If
        prev = c
    Next

    Return sb.ToString
End Function

Any help would be greatly appreciated

A: 

You should be able to recursively use the 'Longest repeated substring problem' to solve this. On the first pass you will get two matching sub-strings, and will need to check if they are contiguous. Then repeat the step for one of the sub-strings. Cut off the algo, if the strings are not contiguous, or if the string size become less than a certain number of characters. Finally, you should be able to keep the last match, and discard the rest. You will need to dig around for an implementation :(

Also have a look at this previously asked question: http://stackoverflow.com/questions/398811/finding-long-repeated-substrings-in-a-massive-string

tathagata