views:

199

answers:

4

hey guys, here's the issue,

Essentially, I have a very large List containing in turn relatively large Dictionaries

So basically, I have a very big in memory collection.

I then serialize this collection manually to XML, and send it over http. needless to say, the XML is too large, sometimes so large I get an OutOfMemory exception before even trying to send it.

In .NET, how would I go about calculating potential memory usage. For example, in this case, I have to break down the XML into chunks, by processing only small amounts of the Collection at a time.

How do I efficiently calculate the size of each "chunk" on-the-fly. I don't want to pick an arbitrary number like, "process 100 items at any one time", I want to know, approximately, how big each chunk should be for a case by case basis.

cheers

UPDATE

Although @Jacob provided the best solution for this particular problem, the conceptual structure of the app is itself flawed.

Indeed, the solution is to execute a fraction of your message, in order to calculate how potentially big the message will be, when working with a collection.

You then send each acceptably sized unit, one by one.

But this is only a hack. The real solution is to either find a way to not pass large messages, or use a completely different protocol altogether.

There's an interesting post on the subject here though if you want to use SOAP, however I decided to find a way around sending so much data.

+6  A: 

Why don't you just stream the data so you just convert to XML on the fly avoiding having a huge XML file in memory?

AlbertEin
@albertein. hey albert, thanks. I've never streamed, could you clarify, and break it down more? thanks
andy
@andy first i need to know how are planing to send it over http, tell me, if you had a huge in memory string with the xml what would you do with that?
AlbertEin
@albert: good question. I'm sending it over SOAP. I guess I don't want to change the whole thing now, but there's definitely a flaw here in my app. I guess my question should really be about how to handle, in general, large amounts of data over http?
andy
How are you sending it? Using WebRequest?
AlbertEin
@alber: hey man, just over standard SOAP
andy
I would need to see some code.
AlbertEin
+1  A: 

How are you sending it? You should do so through WCF, which can do streaming. It would also give you a choice, through configuration, of whether to use XML or binary, or whatever.

John Saunders
@john. thanks John. Its too late now to implement WCF. As I've said to @Albert, I didn't foresee this situation, but I see the problem now. Perhaps I'm asking the wrong question. Even if I used binary, it might still be to big. Is the solution to all these problems streaming?
andy
One big difference with WCF is it's a lot more efficient, and won't keep multiple copies in memory at once. In addition, streaming is much easier. I don't think I even _know_ how to do real streaming with ASMX web services.
John Saunders
@john: thanks john. So, if I were to use WCF, and then stream, are you saying, regardless of the size, it would transfer, because its only transferring bits at a time? what about timeout issues, how long can it stream for? hours? Not that I would send such a large amount of data, but for arguments sake...?
andy
@andy: I'm sorry, but I don't know enough about the underlying implementation. Even if it were as simple as sending a buffer at a time, as it's available, then it shouldn't run out of memory. And that's not the best implementation I can think of off the top of my head, so I assume Microsoft did better than that.
John Saunders
+2  A: 

I think you might have a conceptual problem more than anything else. "Calculate potential memory usage" is at odds with "efficiently calculate the size of each chunk". The only way to really get at your memory usage to the degree of accuracy where you can predict adequate chunk size is to actually make your conversion.

It sounds like the best way to come at this efficiently might be to tackle it progressively--essentially what those who are suggesting streaming objects are saying. If you can't leverage actual streaming, you'll probably want to structure your serialization so that you progress one conceptual unit at a time (i.e. one item in your list with it's attendant dictionary children).

Jacob Proffitt
@jacob: yes, I see your point. You're right.
andy
A: 

If it is a problem to send, isn't it also a problem to receive? You sound like you're trying to solve half a problem. XML is a big no-no for large data.

Stephan Eggermont