I'm using a simple PHP library to add documents to a SOLR index, via HTTP.
There are 3 servers involved, currently:
- The PHP box running the indexing job
- A database box holding the data being indexed
- The solr box.
At 80 documents/sec (out of 1 million docs), I'm noticing an unusually high interrupt rate on the network interfaces on the PHP and solr boxes (2000/sec; what's more, the graphs are nearly identical -- when the interrupt rate on the PHP box spikes, it also spikes on the Solr box), but much less so on the database box (300/sec). I imagine this is simply because I open and reuse a single connection to the database server, but every single Solr request is currently opening a new HTTP connection via cURL, thanks to the way the Solr client library is written.
So, my question is:
- Can cURL be made to open a keepalive session?
- What does it take to reuse a connection? -- is it as simple as reusing the cURL handle resource?
- Do I need to set any special cURL options? (e.g. force HTTP 1.1?)
- Are there any gotchas with cURL keepalive connections? This script runs for hours at a time; will I be able to use a single connection, or will I need to periodically reconnect?