views:

276

answers:

2

I can't find an answer on their website. Do you have any idea if HtmlCleaner is thread safe or not?
I have a multiple threads that need to use it but I don't know if I can safetly reuse an instance of the HtmlCleaner object.
Have you ever used it? any idea?

+1  A: 

Looking at the source code, no - an HtmlCleaner object is not thread safe. Use one object per thread.

nos
+1  A: 

My experience is no. I have used HTMLCleaner for several applications that routinely parse batches of hundreds of thousands of URLS and or parse intermittently on demand.

I have found anomalies in parsing and exceptions getting thrown under load with multiple threads using a single HtmlCleaner and DomSerializer.

I prefer to reuse objects whenever possible, it does require a little more code to reuse across thread life but if you care about speed and or resource usage, as I think we all do, then object reuse just makes sense.

Reuse at the thread level without a pool may make sense for you if your worker threads are always alive, under load, and there are not too many of them.

Reuse with a pool makes sense if you are constantly creating threads (I do not recommend this), your threads are not always under load, there are lots of threads, and or the reusable objects are heavy weight either on instantiation time or running resource consumption.

Basically a pool approach lets the application scale up the number of reusable objects, ensures that you only have as many objects as your system needs at any one point in time, handles release of resources, and if a min-size is set then you can avoid any startup lag associated with the object creation...to a point.

Anyway, I tend to work at large scale so this type of optimization may not be worth your time. My theory is when in doubt use a pool.

BML