views:

66

answers:

4

Hi All,

I'm getting out of memory exceptions from the following function when RowCollection is 50000+ and thus i need to make it more memory efficient. The function is simply needs to construct a comma separated string of the row indexes stored in RowCollection. Can anyone spot any obvious memory hungry operations in the following?

N.B RowCollection just contains a list of row indexes stored as integers.

 Private Function GetCommaSeparatedString(ByRef RowIndexes As ArrayList) As String
        Dim RowString As String = String.Empty

        'Build a string of the row indexes 
        'Add one onto each index value so our indexes begin at 1 
        For Each Row In RowIndexes 
            RowString += CInt(Row.ToString) + 1 & ","
        Next

        'Remove the last comma
        If RowString.Length > 0 Then
            RowString = RowString.Substring(0, RowString.Length - 1)
        End If

        Return RowString
    End Function

Thanks in advance.

+1  A: 

This is because in each iteration, behind the scenes you are creating 2 strings, and they are getting big nearer the end.

"1,2,3,4,5,....499,500" "1,2,3,4,5,....499,500,"

at the end of only 500 iterations, you are creating 2 strings nearly 2000 characters long, only to have them discarded in the next iteration (but the runtime may be keeping them around).

In the last iteration, your string (from 1 to 50000) would be 100,000 characters long, assuming your row indexes are even sequential. This would mean you have allocated ~ 10,000,000,000 characters or (I believe 2 bytes/char) 20 gigabytes of strings.

You can start by using StringBuilder instead of += on a string (RowString).
Ex

Dim RowString As StringBuilder = new StringBuilder( 100000 )

For Each Row In RowIndexes 
    RowString.Append( CInt(Row.ToString) + 1).Append( "," )
Next

'...'

Return RowString.ToString

You can try the next one as well, but you should profile the two and pick the best for you.

Private Function GetCommaSeperatedString(ByRef RowIndexes As ArrayList) As String
    Dim indexArray as String[] = RowIndexes
                                 .Select(Function(r)=> (CInt(r.ToString) + 1)ToString)
                                 .ToArray
    return String.Join( ',', indexArray)
End Function



* note: these are the first lines of VB I've ever written, so I may have made a basic mistake (especially in the linq/lambda stuff), but the point is there.

BioBuckyBall
+3  A: 

Update: while this is a straight change to use stringbuilder, look at the better approaches by Strilanc or Steven Sudit

Well, you may still run out of memory (memory is finite, after all), but you should be using a StringBuilder, not concatenating strings. Each time, you are creating a new string object rather than changing it (As strings are immutable)

Private Function GetCommaSeparatedString(ByRef RowIndexes As ArrayList) As String
    Dim RowString As New StringBuilder()

    'Build a string of the row indexes 
    'Add one onto each index value so our indexes begin at 1 
    For Each Row In RowIndexes 
        RowString.AppendFormat("{0},",  CInt(Row.ToString) + 1)
    Next

    'Remove the last comma
    If RowString.Length > 0 Then
        RowString.Append(RowString.Substring(0, RowString.Length - 1))
    End If

    Return RowString
End Function
Philip Rieck
@Steven Sudit Has the answer I would go with - stream it, or create and throw away each row as you need it. I can't immediately think of a valid design that should requires the entire contents as a string.
Philip Rieck
+2  A: 

StringBuilder is a good idea, but why not just avoid the problem by streaming the output out instead of trying to hold it all in memory at once?

Steven Sudit
+4  A: 

I'm not sure why you're getting out of memory errors, unless the string representation of your rows is extremely large, because you never have more than one or two non-garbage-collectible strings.

However, your method is horribly inefficient because it spends so much time copying the contents of half-built strings. A StringBuilder is more appropriate when building large strings, because it can be modified without re-creating the contents each time.

HOWEVER, in this case even a StringBuilder is a bad idea, because you are joining strings and there is already a method to do that: String.Join. Just use a LINQ query to do the add-one-to-index-stuff and you get a one-liner:

Private Function GetCommaSeparatedString(ByVal RowIndexes As ArrayList) As String
    Return String.Join(",", From index In RowIndexes Select CInt(index) + 1)
End Function

I would also recommend not passing by reference unless you actually need it. You aren't modifying RowIndexes, so pass it by value. I'm also not sure why you are ToString()-ing the index then immediately parsing it. Aren't they already integers? Just use CInt.

Strilanc
I agree with you completely, although I probably would stream it out if possible to redesign the code using this method.
Philip Rieck
@PhilipRieck What do you mean by 'stream it out'?
Strilanc
@Strilanc either create an IEnumerable<string> method to get lines, or read it through a textreader that appends commas, etc - that is, not try to process the entire file from memory at one time.
Philip Rieck
@Strilanc Thanks for all the comments so far...I can't stream the result, so using a string builder or String.Join seems the best method. What does the above line become with Option Strict On? I've yet to get my head around using Linq. Thanks
Simon B
Problem Solved. Thanks for the help.
Simon B
You should always have Option Strict On. Really understanding the line requires understanding linq, so look that up. The line gives String.Join an IEnumerable which enumerates the results of adding 1 to the items in RowIndexes. String.Join applies ToString to each enumerated element, using a StringBuilder to append them together with the separator.
Strilanc