views:

150

answers:

1

hi,

I have the specific scenario for which we want to use Coherence as sitributed cache. Which I am gonna describe here.

  1. I have 20+ standalone processes which are going to put the data in cache continuously. the frequency of all of them differs, though thats not a concern.
  2. And 2 procesess which will be reading data from those cache.
  3. I dont need any underlying db except for the way which coherence provide. Data will be written to the cache and read from the cache.
  4. I have 4 node cluster at my disposal (cost constraint whatever) and the coherence cluster will be on different boxes (infra constraint whatever) and both the populating portion of the cache and the reading part will be on differnt nmachines.
  5. The peak memory size of the cache daily will hover around 6 GB max, min being 2 GB. Cache will have daily data only and I will have separate archiving processes to simulatneosuly keep archiving it also. the point is that cache size for now will have this size only. Lets say I am gonna keep the date out of key equation.
  6. Though Would like to explore if I can store more into those 4 nodes. Right now its simple serialization, can explore other nbinary formats. Or should I definietly at this size of the cache?
  7. My read and write operations are fairly spread out in the day. Meaning the read and write will keep on happening by those 2 reading clients and 20+ writing clients. Its not like one of them is more. Though there is a startup batch process in all of the background process which push more to the cache than the continuous pushing afterwards. But continuous pushing pushes fair amount of data too.

Now my questions regarding those above points (and because of some confusion also)

  1. The biggest one is somebody told me that I an have limited number of connection depending on the nodes we have bought. so he said if its 4, you ideally should have 4 connections only at the max. So, develop a gatekeeper kind of application and what not. Even if we use TCP Extend. Now from my reading so far, I dont think so. Is it? The point is dont wanna go that way if its really is not a constraint.

In other words is there limit on connection through Proxy Service dependeing on the nodes in the cluster?

  1. Soemwhat related to above only. at the very max, I am going to get some penalty on the performance while pushing to cache only if I go the Extend way, right?

  2. Partioned cache/near cache. As the reading time as well as the most update cache both are extremely critical. (the most imp question i have).

  3. Really want to see the benefit which can be obtained from going to POF instead of lets say serialization/externalizatble/protobuf. Can coherence support protobuf out of the box? (may be for later on)

A: 

There's no technical limitation to the number of connections a Coherence Extend proxy can support except normal network and hardware resource constraints. You will have to ask an Oracle sales person if there are licensing limitations.

There is some performance impact from using a proxy because you are adding an additional network hop (client to proxy to cluster). If you use POF serialization then the proxy does not have to serialize/deserialize values. It can just pass the object through in its serialized form. In most applications the performance impact of using a proxy is tiny because Coherence is highly optimized for network speed. You are not required to use a proxy unless your clients are .NET or C++, but there are advantages of isolating client performance from impacting the cache.

Near cache will improve retrieval performance dramatically if there a number of frequently retrieved items for a client since they will be found in-process.

POF offers performance improvements based on faster serialization/deserialization and more compact storage. It is always best to try with test data based on your real production data and measure the difference yourself. Coherence does not support protobuf out of the box.

David G